{ "info": { "author": "Federico A. Galatolo", "author_email": "galatolo.federico@gmail.com", "bugtrack_url": null, "classifiers": [ "Development Status :: 4 - Beta", "Intended Audience :: Science/Research", "License :: OSI Approved :: GNU General Public License v3 (GPLv3)", "Operating System :: OS Independent", "Programming Language :: Python :: 3.6", "Topic :: Scientific/Engineering :: Artificial Intelligence" ], "description": "# torchreinforce\n\nA pythonic implementation of the REINFORCE algorithm that is actually fun to use\n\n## Installation\nYou can install it with pip as you would for any other python package\n```\npip install torchreinforce\n```\n\n## Quickstart\n\nIn order to use the REINFORCE algorithm with your model you only need to do two things:\n* Use the ``ReinforceModule`` class as your base class\n* Decorate your ``forward`` function with ``@ReinforceModule.forward``\n\nThat's it!\n\n```python\nclass Model(ReinforceModule):\n def __init__(self, **kwargs):\n super(Model, self).__init__(**kwargs)\n self.net = torch.nn.Sequential(\n torch.nn.Linear(20, 128),\n torch.nn.ReLU(),\n torch.nn.Linear(128, 2),\n torch.nn.Softmax(dim=-1),\n )\n\n @ReinforceModule.forward\n def forward(self, x):\n return self.net(x)\n```\n\nYour model will now output ``ReinforceOutput`` objects.\n\nThis objects have two important functions\n\n* ``get()``\n* ``reward(value)``\n\nYou can use ``output.get()`` to get an actual sample of the overlaying distribution and ``output.reward(value)`` to set a reward for the specific output.\n\nBeing ``net`` your model you have to do something like that\n\n```python\naction = net(observation)\nobservation, reward, done, info = env.step(action.get())\naction.reward(reward)\n```\n\n## Wait, did you just said distribution?\n\nYes! As the REINFORCE algorithm states the outputs of your model will be used as parameters for a probability distribution function.\n\nActually you can use whatever probability distribution you want, the ``ReinforceModule`` constructor accepts indeed the following parameters:\n\n* ``gamma`` the *gamma* parameter of the REINFORCE algorithm (default: ``Categorical``)\n* ``distribution`` every ``ReinforceDistribution`` or ``pytorch.distributions`` distribution (default: 0.99)\n\nlike that\n\n```python\nnet = Model(distribution=torch.distributions.Beta, gamma=0.99)\n```\n\nKeep in mind that the outputs of your **decorated** ``forward(x)`` outputs will be used as the parameters for the ``distribution``. If your ``distribution`` needs more than one parameters just return a list.\n\nI've added the possibility to distribution to have a **deterministic** behavior in **testing** and I've implemented it only for the ``Categorical`` distribution, if you want to implement your own deterministic logic check the file ``distributions/categorical.py`` it is pretty straightforward\n\nIf you want to use the ``torch.distributions.Beta`` distribution for example you will need to do something like\n\n```python\nclass Model(ReinforceModule):\n def __init__(self, **kwargs):\n super(Model, self).__init__(**kwargs)\n ...\n\n @ReinforceModule.forward\n def forward(self, x):\n return [self.net1(x), self.net2(x)] # the Beta distribution accepts two parameters\n\nnet = Model(distribution=torch.distributions.Beta, gamma=0.99)\n\naction = net(inp)\nenv.step(action.get())\n```\n\n## Nice! What about training?\n\nYou can compute the REINFORCE loss by calling the ``loss()`` function of ``ReinforceModule`` and than treat it as you would do with any other pytorch loss function\n\n```python\nnet = ...\noptmizer = ...\n\nwhile training:\n net.reset()\n for steps:\n ....\n\n loss = net.loss(normalize=True)\n\n optimizer.zero_grad()\n loss.backward()\n optmizer.step()\n```\n\nYou **have to** call the ``reset()`` function of ``ReinforceModule`` **before** the beginning of each episode. You can also pass the argument ``normalize`` to ``loss()`` if you want to normalize the rewards\n\n## Putting all together\n\nA complete example looks like this:\n\n```python\nclass Model(ReinforceModule):\n def __init__(self, **kwargs):\n super(Model, self).__init__(**kwargs)\n self.net = torch.nn.Sequential(\n torch.nn.Linear(4, 128),\n torch.nn.ReLU(),\n torch.nn.Linear(128, 2),\n torch.nn.Softmax(dim=-1),\n )\n\n @ReinforceModule.forward\n def forward(self, x):\n return self.net(x)\n\n\nenv = gym.make('CartPole-v0')\nnet = Model()\noptimizer = torch.optim.Adam(net.parameters(), lr=0.001)\n\nfor i in range(EPISODES):\n done = False\n net.reset()\n observation = env.reset()\n while not done:\n action = net(torch.tensor(observation, dtype=torch.float32))\n\n observation, reward, done, info = env.step(action.get())\n action.reward(reward)\n\n loss = net.loss(normalize=False)\n\n optimizer.zero_grad()\n loss.backward()\n optimizer.step()\n```\n\nYou can find a running example in the ``examples/`` folder.\n\n", "description_content_type": "text/markdown", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/galatolofederico/torchreinforce", "keywords": "", "license": "", "maintainer": "", "maintainer_email": "", "name": "torchreinforce", "package_url": "https://pypi.org/project/torchreinforce/", "platform": "", "project_url": "https://pypi.org/project/torchreinforce/", "project_urls": { "Homepage": "https://github.com/galatolofederico/torchreinforce" }, "release_url": "https://pypi.org/project/torchreinforce/0.1.0/", "requires_dist": [ "torch", "numpy" ], "requires_python": "", "summary": "A pythonic implementation of the REINFORCE algorithm that is actually fun to use", "version": "0.1.0" }, "last_serial": 4713345, "releases": { "0.1.0": [ { "comment_text": "", "digests": { "md5": "b18372214156e1546ba1485301275401", "sha256": "d7023facdea8f79409c5e58526bec9e182540af3f392b83257f258b8766f31f5" }, "downloads": -1, "filename": "torchreinforce-0.1.0-py3.6.egg", "has_sig": false, "md5_digest": "b18372214156e1546ba1485301275401", "packagetype": "bdist_egg", "python_version": "3.6", "requires_python": null, "size": 12782, "upload_time": "2019-01-18T19:05:30", "url": "https://files.pythonhosted.org/packages/fe/a6/04fb485d82a7ba41711190a0da8fb246fd0f7812d09ce58e5f6c15daa86b/torchreinforce-0.1.0-py3.6.egg" }, { "comment_text": "", "digests": { "md5": "d021cfa11e21ff3ea825f44717ef9d92", "sha256": "69b205c21b0044f82991d550f950ed1e99ee79aef7f981c18753815823fe4c83" }, "downloads": -1, "filename": "torchreinforce-0.1.0-py3-none-any.whl", "has_sig": false, "md5_digest": "d021cfa11e21ff3ea825f44717ef9d92", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 18964, "upload_time": "2019-01-18T19:05:28", "url": "https://files.pythonhosted.org/packages/47/82/5d3c2eb2dad9f2a5cb65b74ba29b23ae21322832c8c8c167dab05c8c7128/torchreinforce-0.1.0-py3-none-any.whl" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "b18372214156e1546ba1485301275401", "sha256": "d7023facdea8f79409c5e58526bec9e182540af3f392b83257f258b8766f31f5" }, "downloads": -1, "filename": "torchreinforce-0.1.0-py3.6.egg", "has_sig": false, "md5_digest": "b18372214156e1546ba1485301275401", "packagetype": "bdist_egg", "python_version": "3.6", "requires_python": null, "size": 12782, "upload_time": "2019-01-18T19:05:30", "url": "https://files.pythonhosted.org/packages/fe/a6/04fb485d82a7ba41711190a0da8fb246fd0f7812d09ce58e5f6c15daa86b/torchreinforce-0.1.0-py3.6.egg" }, { "comment_text": "", "digests": { "md5": "d021cfa11e21ff3ea825f44717ef9d92", "sha256": "69b205c21b0044f82991d550f950ed1e99ee79aef7f981c18753815823fe4c83" }, "downloads": -1, "filename": "torchreinforce-0.1.0-py3-none-any.whl", "has_sig": false, "md5_digest": "d021cfa11e21ff3ea825f44717ef9d92", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 18964, "upload_time": "2019-01-18T19:05:28", "url": "https://files.pythonhosted.org/packages/47/82/5d3c2eb2dad9f2a5cb65b74ba29b23ae21322832c8c8c167dab05c8c7128/torchreinforce-0.1.0-py3-none-any.whl" } ] }