{
    "info": {
        "author": "Federico A. Galatolo",
        "author_email": "galatolo.federico@gmail.com",
        "bugtrack_url": null,
        "classifiers": [
            "Development Status :: 4 - Beta",
            "Intended Audience :: Science/Research",
            "License :: OSI Approved :: GNU General Public License v3 (GPLv3)",
            "Operating System :: OS Independent",
            "Programming Language :: Python :: 3.6",
            "Topic :: Scientific/Engineering :: Artificial Intelligence"
        ],
        "description": "# torchreinforce\n\nA pythonic implementation of the REINFORCE algorithm that is actually fun to use\n\n## Installation\nYou can install it with pip as you would for any other python package\n```\npip install torchreinforce\n```\n\n## Quickstart\n\nIn order to use the REINFORCE algorithm with your model you only need to do two things:\n* Use the ``ReinforceModule`` class as your base class\n* Decorate your ``forward`` function with ``@ReinforceModule.forward``\n\nThat's it!\n\n```python\nclass Model(ReinforceModule):\n    def __init__(self, **kwargs):\n        super(Model, self).__init__(**kwargs)\n        self.net = torch.nn.Sequential(\n            torch.nn.Linear(20, 128),\n            torch.nn.ReLU(),\n            torch.nn.Linear(128, 2),\n            torch.nn.Softmax(dim=-1),\n        )\n\n    @ReinforceModule.forward\n    def forward(self, x):\n        return self.net(x)\n```\n\nYour model will now output ``ReinforceOutput`` objects.\n\nThis objects have two important functions\n\n* ``get()``\n* ``reward(value)``\n\nYou can use ``output.get()`` to get an actual sample of the overlaying distribution and ``output.reward(value)`` to set a reward for the specific output.\n\nBeing ``net`` your model you have to do something like that\n\n```python\naction = net(observation)\nobservation, reward, done, info = env.step(action.get())\naction.reward(reward)\n```\n\n## Wait, did you just said distribution?\n\nYes! As the REINFORCE algorithm states the outputs of your model will be used as parameters for a probability distribution function.\n\nActually you can use whatever probability distribution you want, the ``ReinforceModule`` constructor accepts indeed the following parameters:\n\n* ``gamma`` the *gamma* parameter of the REINFORCE algorithm (default: ``Categorical``)\n* ``distribution`` every ``ReinforceDistribution`` or ``pytorch.distributions`` distribution (default: 0.99)\n\nlike that\n\n```python\nnet = Model(distribution=torch.distributions.Beta, gamma=0.99)\n```\n\nKeep in mind that the outputs of your **decorated** ``forward(x)`` outputs will be used as the parameters for the ``distribution``. If your ``distribution`` needs more than one parameters just return a list.\n\nI've added the possibility to distribution to have a **deterministic** behavior in **testing** and I've implemented it only for the ``Categorical`` distribution, if you want to implement your own deterministic logic check the file ``distributions/categorical.py`` it is pretty straightforward\n\nIf you want to use the ``torch.distributions.Beta`` distribution for example you will need to do something like\n\n```python\nclass Model(ReinforceModule):\n    def __init__(self, **kwargs):\n        super(Model, self).__init__(**kwargs)\n        ...\n\n    @ReinforceModule.forward\n    def forward(self, x):\n        return [self.net1(x), self.net2(x)] # the Beta distribution accepts two parameters\n\nnet = Model(distribution=torch.distributions.Beta, gamma=0.99)\n\naction = net(inp)\nenv.step(action.get())\n```\n\n## Nice! What about training?\n\nYou can compute the REINFORCE loss by calling the ``loss()`` function of ``ReinforceModule`` and than treat it as you would do with any other pytorch loss function\n\n```python\nnet = ...\noptmizer = ...\n\nwhile training:\n    net.reset()\n    for steps:\n        ....\n\n    loss = net.loss(normalize=True)\n\n    optimizer.zero_grad()\n    loss.backward()\n    optmizer.step()\n```\n\nYou **have to** call the ``reset()`` function of ``ReinforceModule`` **before** the beginning of each episode. You can also pass the argument ``normalize`` to ``loss()`` if you want to normalize the rewards\n\n## Putting all together\n\nA complete example looks like this:\n\n```python\nclass Model(ReinforceModule):\n    def __init__(self, **kwargs):\n        super(Model, self).__init__(**kwargs)\n        self.net = torch.nn.Sequential(\n            torch.nn.Linear(4, 128),\n            torch.nn.ReLU(),\n            torch.nn.Linear(128, 2),\n            torch.nn.Softmax(dim=-1),\n        )\n\n    @ReinforceModule.forward\n    def forward(self, x):\n        return self.net(x)\n\n\nenv = gym.make('CartPole-v0')\nnet = Model()\noptimizer = torch.optim.Adam(net.parameters(), lr=0.001)\n\nfor i in range(EPISODES):\n    done = False\n    net.reset()\n    observation = env.reset()\n    while not done:\n        action = net(torch.tensor(observation, dtype=torch.float32))\n\n        observation, reward, done, info = env.step(action.get())\n        action.reward(reward)\n\n    loss = net.loss(normalize=False)\n\n    optimizer.zero_grad()\n    loss.backward()\n    optimizer.step()\n```\n\nYou can find a running example in the ``examples/`` folder.\n\n",
        "description_content_type": "text/markdown",
        "docs_url": null,
        "download_url": "",
        "downloads": {
            "last_day": -1,
            "last_month": -1,
            "last_week": -1
        },
        "home_page": "https://github.com/galatolofederico/torchreinforce",
        "keywords": "",
        "license": "",
        "maintainer": "",
        "maintainer_email": "",
        "name": "torchreinforce",
        "package_url": "https://pypi.org/project/torchreinforce/",
        "platform": "",
        "project_url": "https://pypi.org/project/torchreinforce/",
        "project_urls": {
            "Homepage": "https://github.com/galatolofederico/torchreinforce"
        },
        "release_url": "https://pypi.org/project/torchreinforce/0.1.0/",
        "requires_dist": [
            "torch",
            "numpy"
        ],
        "requires_python": "",
        "summary": "A pythonic implementation of the REINFORCE algorithm that is actually fun to use",
        "version": "0.1.0"
    },
    "last_serial": 4713345,
    "releases": {
        "0.1.0": [
            {
                "comment_text": "",
                "digests": {
                    "md5": "b18372214156e1546ba1485301275401",
                    "sha256": "d7023facdea8f79409c5e58526bec9e182540af3f392b83257f258b8766f31f5"
                },
                "downloads": -1,
                "filename": "torchreinforce-0.1.0-py3.6.egg",
                "has_sig": false,
                "md5_digest": "b18372214156e1546ba1485301275401",
                "packagetype": "bdist_egg",
                "python_version": "3.6",
                "requires_python": null,
                "size": 12782,
                "upload_time": "2019-01-18T19:05:30",
                "url": "https://files.pythonhosted.org/packages/fe/a6/04fb485d82a7ba41711190a0da8fb246fd0f7812d09ce58e5f6c15daa86b/torchreinforce-0.1.0-py3.6.egg"
            },
            {
                "comment_text": "",
                "digests": {
                    "md5": "d021cfa11e21ff3ea825f44717ef9d92",
                    "sha256": "69b205c21b0044f82991d550f950ed1e99ee79aef7f981c18753815823fe4c83"
                },
                "downloads": -1,
                "filename": "torchreinforce-0.1.0-py3-none-any.whl",
                "has_sig": false,
                "md5_digest": "d021cfa11e21ff3ea825f44717ef9d92",
                "packagetype": "bdist_wheel",
                "python_version": "py3",
                "requires_python": null,
                "size": 18964,
                "upload_time": "2019-01-18T19:05:28",
                "url": "https://files.pythonhosted.org/packages/47/82/5d3c2eb2dad9f2a5cb65b74ba29b23ae21322832c8c8c167dab05c8c7128/torchreinforce-0.1.0-py3-none-any.whl"
            }
        ]
    },
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "md5": "b18372214156e1546ba1485301275401",
                "sha256": "d7023facdea8f79409c5e58526bec9e182540af3f392b83257f258b8766f31f5"
            },
            "downloads": -1,
            "filename": "torchreinforce-0.1.0-py3.6.egg",
            "has_sig": false,
            "md5_digest": "b18372214156e1546ba1485301275401",
            "packagetype": "bdist_egg",
            "python_version": "3.6",
            "requires_python": null,
            "size": 12782,
            "upload_time": "2019-01-18T19:05:30",
            "url": "https://files.pythonhosted.org/packages/fe/a6/04fb485d82a7ba41711190a0da8fb246fd0f7812d09ce58e5f6c15daa86b/torchreinforce-0.1.0-py3.6.egg"
        },
        {
            "comment_text": "",
            "digests": {
                "md5": "d021cfa11e21ff3ea825f44717ef9d92",
                "sha256": "69b205c21b0044f82991d550f950ed1e99ee79aef7f981c18753815823fe4c83"
            },
            "downloads": -1,
            "filename": "torchreinforce-0.1.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "d021cfa11e21ff3ea825f44717ef9d92",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 18964,
            "upload_time": "2019-01-18T19:05:28",
            "url": "https://files.pythonhosted.org/packages/47/82/5d3c2eb2dad9f2a5cb65b74ba29b23ae21322832c8c8c167dab05c8c7128/torchreinforce-0.1.0-py3-none-any.whl"
        }
    ]
}