{ "info": { "author": "Josiah Laivins", "author_email": "jokellum@northstate.net", "bugtrack_url": null, "classifiers": [ "Development Status :: 3 - Alpha", "License :: OSI Approved :: Apache Software License", "Operating System :: OS Independent", "Programming Language :: Python :: 3" ], "description": "# Fast Reinforcement Learning\nThis repo is not affiliated with Jeremy Howard or his course which can be found here: [here](https://www.fast.ai/about/)\nWe will be using components from the Fastai library however for building and training our reinforcement learning (RL) \nagents.\n\nAs a note, here is a run down of existing RL frameworks:\n- [Intel Coach](https://github.com/NervanaSystems/coach) \n- [Tensor Force](https://github.com/tensorforce/tensorforce)\n- [OpenAI Baselines](https://github.com/openai/baselines)\n- [Tensorflow Agents](https://github.com/tensorflow/agents)\n- [KerasRL](https://github.com/keras-rl/keras-rl)\n\nHowever there are also frameworks in PyTorch most notably Facebook's Horizon:\n- [Horizon](https://github.com/facebookresearch/Horizon)\n- [DeepRL](https://github.com/ShangtongZhang/DeepRL)\n\nMy motivation is that existing frameworks commonly use tensorflow, which nothing against tensorflow, but I have \naccomplished more in shorter periods of time using PyTorch. \n\nFastai for computer vision and tabular learning has been amazing. One would wish that this would be the same for RL. \nThe purpose of this repo is to have a framework that is as easy as possible to start, but also designed for testing\nnew agents. \n\n# Table of Contents\n1. [Installation](#installation)\n2. [Beta TODO](#beta-todo)\n3. [Code](#code)\n5. [Versioning](#versioning)\n6. [Contributing](#contributing)\n7. [Style](#style)\n\n\n## Installation\nVery soon I would like to add some form of scripting to install some complicated dependencies. We have 2 steps:\n\n**1.a FastAI**\n[Install Fastai](https://github.com/fastai/fastai/blob/master/README.md#installation)\nor if you are Anaconda (which is a good idea I would use Anaconda) you can do: \\\n`conda install -c pytorch -c fastai fastai`\n\n\n**1.b Optional / Extra Envs**\nOpenAI all gyms: \\\n`pip install gym[all]`\n\nMazes: \\\n`git clone https://github.com/MattChanTK/gym-maze.git` \\\n`cd gym-maze` \\\n`python setup.py install`\n\n\n**2 Actual Repo** \\\n`git clone https://github.com/josiahls/fast-reinforcement-learning.git` \\\n`cd fast-reinforcement-learning` \\\n`python setup.py install`\n\n## Beta TODO\nAt the moment these are the things I personally urgently need, and then the nice things that will make this repo\nsomething akin to valuable. These are listed in kind of the order I am planning on executing them.\n\n**Critical**\n- [X] MDPDataBunch: Finished to the point of being useful. Please reference: `tests/test_Envs`\nExample:\n```python\nfrom fast_rl.core.Envs import Envs\nfrom fast_rl.core.MarkovDecisionProcess import MDPDataBunch\n\n# At present will try to load OpenAI, box2d, pybullet, atari, maze.\n# note \"get_all_latest_envs\" has a key inclusion and exclusion so if you don't have some of these envs installed, \n# you can avoid this here.\nfor env in Envs.get_all_latest_envs():\n max_steps = 50 # Limit the number of per episode iterations for now.\n print(f'Testing {env}')\n mdp_databunch = MDPDataBunch.from_env(env, max_steps=max_steps, num_workers=0)\n if mdp_databunch is None:\n print(f'Env {env} is probably Mujoco... Add imports if you want and try on your own. Don\\'t like '\n f'proprietary engines like this. If you have any issues, feel free to make a PR!')\n else:\n epochs = 1 # N episodes to run\n for epoch in range(epochs):\n for state in mdp_databunch.train_dl:\n # Instead of random action, you would have your agent here\n mdp_databunch.train_ds.actions = mdp_databunch.train_ds.get_random_action()\n\n for state in mdp_databunch.valid_dl:\n # Instead of random action, you would have your agent here and have exploration to 0\n mdp_databunch.valid_ds.actions = mdp_databunch.valid_ds.get_random_action()\n```\n- [X] DQN Agent: Reference `tests/test_Learner/test_basic_dqn_model_maze`. This test is\nkind of a hell-scape. You will notice I plan to use Learner callbacks for a fit function. Also note, the gym_maze envs\nwill be important for at least discrete testing because you can heatmap the maze with the model's rewards. \nDQN Agent basic learning / optimization is done. It is undoubtedly unstable / buggy. Please note the next step.\n\nOne of the biggest issues with basic DQNs is the fact that Q values are often always moving. The actual basic DQN should\nbe a fixed targeting DQN, however lets us move to some debugging tools so we are more effective.\n\nTestable code:\n```python\nfrom fast_rl.agents.DQN import DQN\nfrom fast_rl.core.Learner import AgentLearner\nfrom fast_rl.core.MarkovDecisionProcess import MDPDataBunch\n\ndata = MDPDataBunch.from_env('maze-random-5x5-v0', render='human')\nmodel = DQN(data)\nlearn = AgentLearner(data, model)\n\nepochs = 450\n\ncallbacks = learn.model.callbacks # type: Collection[LearnerCallback]\n[c.on_train_begin(max_episodes=epochs) for c in callbacks]\nfor epoch in range(epochs):\n [c.on_epoch_begin(episode=epoch) for c in callbacks]\n learn.model.train()\n for element in learn.data.train_dl:\n learn.data.train_ds.actions = learn.predict(element)\n\n [c.on_step_end(learn=learn) for c in callbacks]\n [c.on_epoch_end() for c in callbacks]\n[c.on_train_end() for c in callbacks]\n``` \nResult:\n\n| ![](res/pre_interpretation_maze_dqn.gif) |\n|:---:|\n| *Fig 1: We are now able to train an agent using some Fastai API* |\n\n\nI believe that the agent explodes after the first episode. Not to worry! We will make a RL interpreter to see whats \ngoing on!\n\n- [X] AgentInterpretation: First method will be heatmapping the image / state space of the \nenvironment with the expected rewards for super important debugging. In the code above, we are testing with a maze for a\ngood reason. Heatmapping rewards over a maze is pretty easy as opposed to other environments.\n\nUsage example:\n```python\nfrom fast_rl.agents.DQN import DQN\nfrom fast_rl.core.Interpreter import AgentInterpretationv1\nfrom fast_rl.core.Learner import AgentLearner\nfrom fast_rl.core.MarkovDecisionProcess import MDPDataBunch\n\ndata = MDPDataBunch.from_env('maze-random-5x5-v0', render='human')\nmodel = DQN(data)\nlearn = AgentLearner(data, model)\n\nepochs = 10\n\ncallbacks = learn.model.callbacks # type: Collection[LearnerCallback]\n[c.on_train_begin(max_episodes=epochs) for c in callbacks]\nfor epoch in range(epochs):\n [c.on_epoch_begin(episode=epoch) for c in callbacks]\n learn.model.train()\n for element in learn.data.train_dl:\n learn.data.train_ds.actions = learn.predict(element)\n [c.on_step_end(learn=learn) for c in callbacks]\n [c.on_epoch_end() for c in callbacks]\n\n # For now we are going to avoid executing callbacks here.\n learn.model.eval()\n for element in learn.data.valid_dl:\n learn.data.valid_ds.actions = learn.predict(element)\n\n if epoch % 1 == 0:\n interp = AgentInterpretationv1(learn)\n interp.plot_heatmapped_episode(epoch)\n[c.on_train_end() for c in callbacks]\n```\n\n| ![](res/heatmap_1.png) |\n|:---:|\n| *Fig 2: Cumulative rewards calculated over states during episode 0.* |\n| ![](res/heatmap_2.png) |\n| *Fig 3: After 1-3 episodes the rewards die out meaning we still need to debug and improve our agent.* |\n\n\nIf we change:\n```python\ninterp = AgentInterpretationv1(learn)\ninterp.plot_heatmapped_episode(epoch)\n```\nto:\n```python\ninterp = AgentInterpretationv1(learn)\ninterp.plot_episode(epoch)\n```\nWe can get the following plots for specific episodes:\n\n| ![](res/reward_plot_1.png) |\n|:----:|\n| *Fig 4: Rewards estimated by the agent during episode 0.* |\n| ![](res/reward_plot_2.png) |\n| *Fig 5: Rewards later estimated by the agent during episode 1.* |\n\nAs determined by our AgentInterpretation object, we need to either debug or improve our agent. \nWe will do this is parallel with creating our Learner fit function. \n\n- [ ] **Working on** Learner Basic: After DQN and adding DDQN, Fixed targeting, DDDQN, we need to convert this (most likely) messy test\ninto a suitable object. Will be similar to the basic learner.\n- [ ] DDPG Agent: We need to have at least one agent able to perform continuous environment execution. As a note, we \ncould give discrete agents the ability to operate in a continuous domain via binning. \n- [ ] Learner Refactor: DDPG will probably screw up everything lol. We will need to rethink the learner / maybe try to\neliminate some custom methods for native Fastai library methods. \n\n**Additional**\n\n- [ ] Single Global fit function like Fastai's. Better yet, actually just use their universal fit function.\n\n\n## Code \nSome of the key take aways is Fastai's use of callbacks. Not only do callbacks allow for logging, but in fact adding a\ncallback to a generic fit function can change its behavior drastically. My goal is to have a library that is as easy\nas possible to run on a server or on one's own computer. I am also interested in this being easy to extend. \n\nI have a few assumptions that the code / support algorithms I believe should adhere to:\n- Environments should be pickle-able, and serializable. They should be able to shut down and start up multiple times\nduring run time.\n- Agents should not need more information than images or state values for an environment per step. This means that \nenvironments should not be expected to allow output of contact points, sub-goals, or STRIPS style logical outputs. \n\nRational:\n- Shutdown / Startup: Some environments (pybullet) have the issue of shutting down and starting different environments.\nLuckily, I have a fork of pybullet, so these modifications will be forced. \n- Pickling: Being able to encapsulate an environment as a `.pkl` can be important for saving it and all the information\nit generated.\n- Serializable: If we want to do parallel processing, environments need to be serializable to transport them between \nthose processes.\n\nSome extra assumptions:\n- Environments can easier be goal-less, or have a single goal in which OpenAI defines as `Env` and `GoalEnv`. \n\nThese assumptions are necessary for us to implement other envs from other repos. We do not want to be tied to just\nOpenAI gyms. \n\n## Versioning\nAt present the repo is in alpha stages being. I plan to move this from alpha to a pseudo beta / working versions. \nRegardless of version, we will follow Python style versioning\n\n_Alpha Versions_: #.#.# e.g. 0.1.0.\n\n## Git + Workflow\n\n\n## Style\nFastai does not follow closely with [google python style guide](https://github.com/google/styleguide/blob/gh-pages/pyguide.md#3164-guidelines-derived-from-guidos-recommendations),\nhowever in this repo we will use this guide. \nSome exceptions however (typically found in Fastai):\n- \"PEP 8 Multiple statements per line violation\" is allowed in the case of if statements as long as they are still \nwithin the column limit.\n\n\n", "description_content_type": "text/markdown", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/josiahls/fast-reinforcement-learning", "keywords": "", "license": "", "maintainer": "", "maintainer_email": "", "name": "fast-rl", "package_url": "https://pypi.org/project/fast-rl/", "platform": "", "project_url": "https://pypi.org/project/fast-rl/", "project_urls": { "Homepage": "https://github.com/josiahls/fast-reinforcement-learning" }, "release_url": "https://pypi.org/project/fast-rl/0.1.1/", "requires_dist": [ "numpy", "tqdm", "pillow", "pandas", "fastai", "gym[atari,box2d]", "jupyter", "namedlist", "pytest-asyncio", "pytest" ], "requires_python": "", "summary": "Fastai for computer vision and tabular learning has been amazing. One would wish that this would be the same for RL. The purpose of this repo is to have a framework that is as easy as possible to start, but also designed for testing new agents.", "version": "0.1.1" }, "last_serial": 5685442, "releases": { "0.1.0": [ { "comment_text": "", "digests": { "md5": "22413f79074b48b5969ffac6f1910543", "sha256": "ab92a3b77402c4704dc9b8610db16c59f65a65088939c573b36f94b3e6d13d14" }, "downloads": -1, "filename": "fast_rl-0.1.0-py3-none-any.whl", "has_sig": false, "md5_digest": "22413f79074b48b5969ffac6f1910543", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 9767, "upload_time": "2019-08-16T03:10:09", "url": "https://files.pythonhosted.org/packages/ad/47/13e413ca2d4af206ca146d987a7af6a68f0e0c5a7ab77e8e1ec824d65b11/fast_rl-0.1.0-py3-none-any.whl" } ], "0.1.1": [ { "comment_text": "", "digests": { "md5": "88854ed3a52d2456b0c736046f58b8b5", "sha256": "ce7d49cda2e2b42eb9823bb7d9030b2b7cd25009ac1f93542d24128a963915c2" }, "downloads": -1, "filename": "fast_rl-0.1.1-py3-none-any.whl", "has_sig": false, "md5_digest": "88854ed3a52d2456b0c736046f58b8b5", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 9767, "upload_time": "2019-08-16T03:22:11", "url": "https://files.pythonhosted.org/packages/82/8e/89c8dc30c5bc5f48f7545df029bb595915db0420b95440b6a640e629ed2c/fast_rl-0.1.1-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "559da6c1a649cc08de6e9988cf9d4f04", "sha256": "9ac037ff184bce0d9a62b0098285eb5b40a3c9e1326ef8cfa4936fa0f958323a" }, "downloads": -1, "filename": "fast_rl-0.1.1.tar.gz", "has_sig": false, "md5_digest": "559da6c1a649cc08de6e9988cf9d4f04", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 6420, "upload_time": "2019-08-16T03:22:13", "url": "https://files.pythonhosted.org/packages/16/18/6b09960201f862006e4767f2a66a797c92aa716978e6fd949a9e37d698bb/fast_rl-0.1.1.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "88854ed3a52d2456b0c736046f58b8b5", "sha256": "ce7d49cda2e2b42eb9823bb7d9030b2b7cd25009ac1f93542d24128a963915c2" }, "downloads": -1, "filename": "fast_rl-0.1.1-py3-none-any.whl", "has_sig": false, "md5_digest": "88854ed3a52d2456b0c736046f58b8b5", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 9767, "upload_time": "2019-08-16T03:22:11", "url": "https://files.pythonhosted.org/packages/82/8e/89c8dc30c5bc5f48f7545df029bb595915db0420b95440b6a640e629ed2c/fast_rl-0.1.1-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "559da6c1a649cc08de6e9988cf9d4f04", "sha256": "9ac037ff184bce0d9a62b0098285eb5b40a3c9e1326ef8cfa4936fa0f958323a" }, "downloads": -1, "filename": "fast_rl-0.1.1.tar.gz", "has_sig": false, "md5_digest": "559da6c1a649cc08de6e9988cf9d4f04", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 6420, "upload_time": "2019-08-16T03:22:13", "url": "https://files.pythonhosted.org/packages/16/18/6b09960201f862006e4767f2a66a797c92aa716978e6fd949a9e37d698bb/fast_rl-0.1.1.tar.gz" } ] }