{ "info": { "author": "Baruch Tabanpour", "author_email": "baruch@tabanpour.info", "bugtrack_url": null, "classifiers": [], "description": "|Build Status|\n\nyarlp\n-----\n\n**Yet Another Reinforcement Learning Package**\n\nImplementations of ```CEM`` `__,\n```REINFORCE`` `__,\n```TRPO`` `__,\n```DDQN`` `__,\n```A2C`` `__ with reproducible benchmarks.\nExperiments are templated using ``jsonschema`` and are compared to\npublished results. This is meant to be a starting point for working\nimplementations of classic RL algorithms. Unfortunately even\nimplementations from OpenAI baselines are `not always\nreproducible `__.\n\nA working Dockerfile with ``yarlp`` installed can be run with:\n\n- ``docker build -t \"yarlpd\" .``\n- ``docker run -it yarlpd bash``\n\nTo run a benchmark, simply:\n\n``python yarlp/experiment/experiment.py --help``\n\nIf you want to run things manually, look in ``examples`` or look at\nthis:\n\n.. code:: python\n\n from yarlp.agent.trpo_agent import TRPOAgent\n from yarlp.utils.env_utils import NormalizedGymEnv\n\n env = NormalizedGymEnv('MountainCarContinuous-v0')\n agent = TRPOAgent(env, seed=123)\n agent.train(max_timesteps=1000000)\n\nBenchmarks\n----------\n\nWe benchmark against published results and Openai\n```baselines`` `__ where available\nusing\n```yarlp/experiment/experiment.py`` `__.\nBenchmark scripts for Openai ``baselines`` were made ad-hoc, such as\n`this\none `__.\n\nAtari10M\n~~~~~~~~\n\n+---------------+--------------+-------------------+\n| |BeamRider| | |Breakout| | |Pong| |\n+---------------+--------------+-------------------+\n| |QBert| | |Seaquest| | |SpaceInvaders| |\n+---------------+--------------+-------------------+\n\nDDQN with dueling networks and prioritized replay\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n``python yarlp/experiment/experiment.py run_atari10m_ddqn_benchmark``\n\nI trained 6 Atari environments for 10M time-steps (**40M frames**),\nusing 1 random seed, since I only have 1 GPU and limited time on this\nEarth. I used DDQN with dueling networks, but no prioritized replay\n(although it's implemented). I compare the final mean 100 episode raw\nscores for yarlp (with exploration of 0.01) with results from `Hasselt\net al, 2015 `__ and `Wang et al,\n2016 `__ which train for **200M\nframes** and evaluate on 100 episodes (exploration of 0.05).\n\nI don't compare to OpenAI baselines because the OpenAI DDQN\nimplementation is **not** currently able to reproduce published results\nas of 2018-01-20. See `this github\nissue `__, although I\nfound `these benchmark\nplots `__\nto be pretty helpful.\n\n+------+------+------+------+\n| env | yarl | Hass | Wang |\n| | p | elt | et |\n| | DUEL | et | al |\n| | 40M | al | DUEL |\n| | Fram | DDQN | 200M |\n| | es | 200M | Fram |\n| | | Fram | es |\n| | | es | |\n+======+======+======+======+\n| Beam | 8705 | 7654 | 1216 |\n| Ride | | | 4 |\n| r | | | |\n+------+------+------+------+\n| Brea | 423. | 375 | 345 |\n| kout | 5 | | |\n+------+------+------+------+\n| Pong | 20.7 | 21 | 21 |\n| | 3 | | |\n+------+------+------+------+\n| QBer | 5410 | 1487 | 1922 |\n| t | .75 | 5 | 0.3 |\n+------+------+------+------+\n| Seaq | 5300 | 7995 | 5024 |\n| uest | .5 | | 5.2 |\n+------+------+------+------+\n| Spac | 1978 | 3154 | 6427 |\n| eInv | .2 | .6 | .3 |\n| ader | | | |\n| s | | | |\n+------+------+------+------+\n\n+------+------+------+------+\n| |Bea | |Bre | |Pon | |Qbe |\n| mRid | akou | gNoF | rtNo |\n| erNo | tNoF | rame | Fram |\n| Fram | rame | skip | eski |\n| eski | skip | -v4| | p-v4 |\n| p-v4 | -v4| | | | |\n| | | | | |\n+------+------+------+------+\n| |Sea | |Spa | | |\n| ques | ceIn | | |\n| tNoF | vade | | |\n| rame | rsNo | | |\n| skip | Fram | | |\n| -v4| | eski | | |\n| | p-v4 | | |\n| | | | | |\n+------+------+------+------+\n\nA2C\n^^^\n\n``python yarlp/experiment/experiment.py run_atari10m_a2c_benchmark``\n\nA2C on 10M time-steps (**40M frames**) with 1 random seed. Results\ncompared to learning curves from `Mnih et al,\n2016 `__ extracted at 10M\ntime-steps from Figure 3. You are invited to run for multiple seeds and\nthe full 200M frames for a better comparison.\n\n+-----------------+-----------------+---------------------------------+\n| env | yarlp A2C 40M | Mnih et al A3C 40M 16-threads |\n+=================+=================+=================================+\n| BeamRider | 3150 | ~3000 |\n+-----------------+-----------------+---------------------------------+\n| Breakout | 418 | ~150 |\n+-----------------+-----------------+---------------------------------+\n| Pong | 20 | ~20 |\n+-----------------+-----------------+---------------------------------+\n| QBert | 3644 | ~1000 |\n+-----------------+-----------------+---------------------------------+\n| SpaceInvaders | 805 | ~600 |\n+-----------------+-----------------+---------------------------------+\n\n+------+------+------+------+\n| |Bea | |Bre | |Pon | |Qbe |\n| mRid | akou | gNoF | rtNo |\n| erNo | tNoF | rame | Fram |\n| Fram | rame | skip | eski |\n| eski | skip | -v4| | p-v4 |\n| p-v4 | -v4| | | | |\n| | | | | |\n+------+------+------+------+\n| |Sea | |Spa | | |\n| ques | ceIn | | |\n| tNoF | vade | | |\n| rame | rsNo | | |\n| skip | Fram | | |\n| -v4| | eski | | |\n| | p-v4 | | |\n| | | | | |\n+------+------+------+------+\n\nHere are some `more\nplots `__\nfrom OpenAI to compare against.\n\nMujoco1M\n~~~~~~~~\n\nTRPO\n^^^^\n\n``python yarlp/experiment/experiment.py run_mujoco1m_benchmark``\n\nWe average over 5 random seeds instead of 3 for both ``baselines`` and\n``yarlp``. More seeds probably wouldn't hurt here, we report 95th\npercent confidence intervals.\n\n+-------------------------------+--------------------+-------------------------+----------------+\n| |Hopper-v1| | |HalfCheetah-v1| | |Reacher-v1| | |Swimmer-v1| |\n+-------------------------------+--------------------+-------------------------+----------------+\n| |InvertedDoublePendulum-v1| | |Walker2d-v1| | |InvertedPendulum-v1| | |\n+-------------------------------+--------------------+-------------------------+----------------+\n\nCLI scripts\n-----------\n\nCLI convenience scripts will be installed with the package:\n\n- Run a benchmark:\n\n - ``python yarlp/experiment/experiment.py --help``\n\n- Plot ``yarlp`` compared to Openai ``baselines`` benchmarks:\n\n - ``compare_benchmark ``\n\n- Experiments:\n\n - Experiments can be defined using json, validated with\n ``jsonschema``. See `here `__ for sample\n experiment configs. You can do a grid search if multiple\n parameters are specified, which will run in parallel.\n - Example:\n ``run_yarlp_experiment --spec-file experiment_configs/trpo_experiment_mult_params.json``\n\n- Experiment plots:\n\n - ``make_plots ``\n\n.. |Build Status| image:: https://travis-ci.org/btaba/yarlp.svg?branch=master\n :target: https://travis-ci.org/btaba/yarlp\n.. |BeamRider| image:: /assets/atari10m/ddqn/beamrider.gif\n.. |Breakout| image:: /assets/atari10m/ddqn/breakout.gif\n.. |Pong| image:: /assets/atari10m/ddqn/pong.gif\n.. |QBert| image:: /assets/atari10m/ddqn/qbert.gif\n.. |Seaquest| image:: /assets/atari10m/ddqn/seaquest.gif\n.. |SpaceInvaders| image:: /assets/atari10m/ddqn/spaceinvaders.gif\n.. |BeamRiderNoFrameskip-v4| image:: /assets/atari10m/ddqn/BeamRiderNoFrameskip-v4.png\n.. |BreakoutNoFrameskip-v4| image:: /assets/atari10m/ddqn/BreakoutNoFrameskip-v4.png\n.. |PongNoFrameskip-v4| image:: /assets/atari10m/ddqn/PongNoFrameskip-v4.png\n.. |QbertNoFrameskip-v4| image:: /assets/atari10m/ddqn/QbertNoFrameskip-v4.png\n.. |SeaquestNoFrameskip-v4| image:: /assets/atari10m/ddqn/SeaquestNoFrameskip-v4.png\n.. |SpaceInvadersNoFrameskip-v4| image:: /assets/atari10m/ddqn/SpaceInvadersNoFrameskip-v4.png\n.. |BeamRiderNoFrameskip-v4| image:: /assets/atari10m/a2c/BeamRiderNoFrameskip-v4.png\n.. |BreakoutNoFrameskip-v4| image:: /assets/atari10m/a2c/BreakoutNoFrameskip-v4.png\n.. |PongNoFrameskip-v4| image:: /assets/atari10m/a2c/PongNoFrameskip-v4.png\n.. |QbertNoFrameskip-v4| image:: /assets/atari10m/a2c/QbertNoFrameskip-v4.png\n.. |SeaquestNoFrameskip-v4| image:: /assets/atari10m/a2c/SeaquestNoFrameskip-v4.png\n.. |SpaceInvadersNoFrameskip-v4| image:: /assets/atari10m/a2c/SpaceInvadersNoFrameskip-v4.png\n.. |Hopper-v1| image:: /assets/mujoco1m/trpo/Hopper-v1.png\n.. |HalfCheetah-v1| image:: /assets/mujoco1m/trpo/HalfCheetah-v1.png\n.. |Reacher-v1| image:: /assets/mujoco1m/trpo/Reacher-v1.png\n.. |Swimmer-v1| image:: /assets/mujoco1m/trpo/Swimmer-v1.png\n.. |InvertedDoublePendulum-v1| image:: /assets/mujoco1m/trpo/InvertedDoublePendulum-v1.png\n.. |Walker2d-v1| image:: /assets/mujoco1m/trpo/Walker2d-v1.png\n.. |InvertedPendulum-v1| image:: /assets/mujoco1m/trpo/InvertedPendulum-v1.png", "description_content_type": "", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/btaba/yarlp", "keywords": "reinforcement learning,deep reinforcement learning,experiment,benchmark", "license": "MIT", "maintainer": "", "maintainer_email": "", "name": "yarlp", "package_url": "https://pypi.org/project/yarlp/", "platform": "", "project_url": "https://pypi.org/project/yarlp/", "project_urls": { "Homepage": "https://github.com/btaba/yarlp" }, "release_url": "https://pypi.org/project/yarlp/0.1.0/", "requires_dist": null, "requires_python": "", "summary": "yarlp", "version": "0.1.0" }, "last_serial": 3724773, "releases": { "0.0.6": [ { "comment_text": "", "digests": { "md5": "0481f118b1d9b7a58ba0c13155027041", "sha256": "38cdc12ab2ccee10a821677b4c971ad5ed00273c834444f3e3e2222f757c9f21" }, "downloads": -1, "filename": "yarlp-0.0.6-py3.5.egg", "has_sig": false, "md5_digest": "0481f118b1d9b7a58ba0c13155027041", "packagetype": "bdist_egg", "python_version": "3.5", "requires_python": null, "size": 539011, "upload_time": "2018-04-01T19:00:13", "url": "https://files.pythonhosted.org/packages/77/79/a2958aceadfa5c64ef31790ee7664ee95c5ba85700460217ef4211cd93e0/yarlp-0.0.6-py3.5.egg" } ], "0.0.7": [ { "comment_text": "", "digests": { "md5": "939a1d1f753c7d250a278ddf73a60a71", "sha256": "7a49f7123ee7a26c9f7d9a6252f6656efdfd4772d06ff9ff2bec942242c6651d" }, "downloads": -1, "filename": "yarlp-0.0.7-py3.5.egg", "has_sig": false, "md5_digest": "939a1d1f753c7d250a278ddf73a60a71", "packagetype": "bdist_egg", "python_version": "3.5", "requires_python": null, "size": 539028, "upload_time": "2018-04-01T19:10:23", "url": "https://files.pythonhosted.org/packages/92/2b/f6baa7b64e2911b9dd505712f6b9df7d518ea4edfc81d2eddf6b1fdc579a/yarlp-0.0.7-py3.5.egg" } ], "0.0.8": [ { "comment_text": "", "digests": { "md5": "a3441eda70b714b0ab2b6a5bbd77e678", "sha256": "fd148a209edd6d5b14924e4be9008a3c6ba01d5fbd0dac5c1faf54d7a9f6e4ba" }, "downloads": -1, "filename": "yarlp-0.0.8-py3.5.egg", "has_sig": false, "md5_digest": "a3441eda70b714b0ab2b6a5bbd77e678", "packagetype": "bdist_egg", "python_version": "3.5", "requires_python": null, "size": 539092, "upload_time": "2018-04-01T19:14:23", "url": "https://files.pythonhosted.org/packages/2d/da/948fc0020fce1bd3d3aabb19dba92b95d11cf2d23d2919512b80969c75df/yarlp-0.0.8-py3.5.egg" } ], "0.0.9": [ { "comment_text": "", "digests": { "md5": "f4efceb685a9d8040fa07303e718767c", "sha256": "7a1f8ab721f1557fd7ccbec26edae362a6041d2127903ba3459d165365f7c801" }, "downloads": -1, "filename": "yarlp-0.0.9-py3.5.egg", "has_sig": false, "md5_digest": "f4efceb685a9d8040fa07303e718767c", "packagetype": "bdist_egg", "python_version": "3.5", "requires_python": null, "size": 541339, "upload_time": "2018-04-01T19:15:54", "url": "https://files.pythonhosted.org/packages/df/2c/0476e63f9fbee1ca0083379b66de9350220fd4ab6c951a9fdaee5d9a432d/yarlp-0.0.9-py3.5.egg" } ], "0.1.0": [ { "comment_text": "", "digests": { "md5": "3ffeb2c64a48c3bebe39192b5e196c2c", "sha256": "3f6d9940d908b98e52f0408d483e2619b02e4813846b2ab716ff331ed8205af8" }, "downloads": -1, "filename": "yarlp-0.1.0-py3.5.egg", "has_sig": false, "md5_digest": "3ffeb2c64a48c3bebe39192b5e196c2c", "packagetype": "bdist_egg", "python_version": "3.5", "requires_python": null, "size": 541887, "upload_time": "2018-04-01T19:20:32", "url": "https://files.pythonhosted.org/packages/40/da/1bbb9e67d724f597f8fbc61d53c939287a70eb035fdd4164545897bde35b/yarlp-0.1.0-py3.5.egg" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "3ffeb2c64a48c3bebe39192b5e196c2c", "sha256": "3f6d9940d908b98e52f0408d483e2619b02e4813846b2ab716ff331ed8205af8" }, "downloads": -1, "filename": "yarlp-0.1.0-py3.5.egg", "has_sig": false, "md5_digest": "3ffeb2c64a48c3bebe39192b5e196c2c", "packagetype": "bdist_egg", "python_version": "3.5", "requires_python": null, "size": 541887, "upload_time": "2018-04-01T19:20:32", "url": "https://files.pythonhosted.org/packages/40/da/1bbb9e67d724f597f8fbc61d53c939287a70eb035fdd4164545897bde35b/yarlp-0.1.0-py3.5.egg" } ] }