{ "info": { "author": "Eric Steinberger", "author_email": "ericsteinberger.est@gmail.com", "bugtrack_url": null, "classifiers": [ "Intended Audience :: Developers", "Intended Audience :: Education", "Intended Audience :: Science/Research", "License :: OSI Approved :: MIT License", "Operating System :: OS Independent", "Programming Language :: Python :: 3", "Programming Language :: Python :: 3.6", "Programming Language :: Python :: 3.7" ], "description": "# PokerRL\n\nFramework for Multi-Agent Deep Reinforcement Learning in Poker games.\n\n\n## Background\nResearch on solving imperfect information games has largely revolved around methods that traverse the full game-tree\nuntil very recently (see\n[[0]](http://martin.zinkevich.org/publications/regretpoker.pdf),\n[[1]](http://poker.cs.ualberta.ca/publications/2015-ijcai-cfrplus.pdf),\n[[2]](https://papers.nips.cc/paper/3713-monte-carlo-sampling-for-regret-minimization-in-extensive-games.pdf),\nfor examples).\nNew algorithms such as Neural Fictitious Self-Play (NFSP)\n[[3]](http://discovery.ucl.ac.uk/1549658/1/Heinrich_phd_FINAL.pdf),\nRegret Policy Gradients (RPG) [[4]](https://arxiv.org/pdf/1810.09026.pdf),\nDeep Counterfactual Regret Minimization (Deep CFR) [[5]](https://arxiv.org/pdf/1811.00164.pdf),\nand Single Deep CFR [[8]](https://arxiv.org/pdf/1901.07621.pdf)\nhave recently combined Deep (Reinforcement) Learning with conventional methods like CFR and Fictitious-Play to learn\napproximate Nash equilibria while only ever visiting a fraction of the game's states.\n\n\n## PokerRL Framework\n\n### Components of a PokerRL Algorithm\n![FrameWork](img/PokerRL_AgentArchitecture.png)\nYour algorithm consists of workers (green) that interact with each other. Arguments for a training run are \npassed through an instance of a _TrainingProfile_ (`.../rl/base_cls/TrainingProfileBase`).\nCommon metrics like best-response or head-to-head performance can be measured periodically by separate\nworkers (red). Your trained agent is wrapped in an EvalAgent (`.../rl/base_cls/EvalAgentBase`).\nYour EvalAgent can battle other AIs in an AgentTournament (`.../game/AgentTournament`)\nor play against humans in an InteractiveGame (`.../game/InteractiveGame`). All local workers (just classes)\ncan be wrapped with ~4 lines of code to work as an independent distributed worker. \n\nSome parts of PokerRL work only for 2-player games since they don't make sense in other settings. However,\nthe game engine itself and the agent modules are general to N>1 players.\n\n### Evaluation of Algorithms\nWe provide four metrics to evaluate algorithms:\n\n- **Best Response (BR)** - Computes the exact exploitability (for small games).\n- **Local Best Response (LBR) [[7]](https://arxiv.org/pdf/1612.07547.pdf)** approximates a lower bound of BR.\n- **RL Best Response (RL-BR)** approximates BR by training DDQN [[9]](https://arxiv.org/pdf/1511.06581.pdf)\nagainst the AI.\n- **Head-To-Head (H2H)** - Let's two modes of an Agent play against each other.\n\nOur current implementation of Best Response is only meant to be run in small games, but LBR and QBR are optimized\nfor distributed computing in (very) large games. As a baseline comparison in small games, there are (unoptimized)\nimplementations of\nvanilla CFR [[10]](http://martin.zinkevich.org/publications/regretpoker.pdf),\nCFR+ [[11]](https://arxiv.org/abs/1407.5042)\nand Linear CFR [[12]](https://arxiv.org/pdf/1809.04040.pdf)\nthat can be run just like a Deep RL agent and will plot their exploitability to TensorBoard.\n\n### Performance & Scalability\nWhilst past algorithms had performance concerns mostly related to computations on the game-tree, these sampling \nbased approaches have most of their overhead in querying neural networks. PokerRL provides an RL environment and a \nframework ontop of which algorithms based on Deep Learning can be built and run to solve poker games. PokerRL provides\na wrapper for ray [[6]](https://github.com/ray-project/ray) to allow the same code to run locally, on many cores, or\neven on a cluster of CPU and potentially GPU workers.\n\n\n\n\n\n\n## Installation\nThese instructions will guide you through getting PokerRL up and running on your local machine and\nexplain how to seamlessly deploy the same code you developed and tested locally onto an AWS cluster.\n\n### Prerequisites\nThis codebase is OS agnostic for local runs but supports only Linux for distributed runs due to limitations of\n [ray](https://github.com/ray-project/ray).\n\n### Installation on your Local Machine\nFirst, please install Anaconda/Miniconda and Docker. Then run the following commands (insert details where needed):\n```\nconda create -n CHOOSE_A_NAME python=3.6 -y\nsource activate THE_NAME_YOU_CHOSE\npip install requests\nconda install pytorch=0.4.1 -c pytorch\n```\nand then\n```\npip install PokerRL\n```\nNote: For distributed runs you would need Linux and also `pip install PokerRL[distributed]`. This is not required for\nlocal-only usage.\n\nThis framework uses [PyCrayon](https://github.com/torrvision/crayon), a language-agnostic wrapper around\nTensorboard. Please follow the instructions on their GitHub page to set it up. After you have installed PyCrayon, you\ncan run and start the log server via\n```\ndocker run -d -p 8888:8888 -p 8889:8889 --name crayon alband/crayon\ndocker start crayon\n```\nNow try to access Tensorboard in your browser at `localhost:8888`.\n\n#### Running some Tests\nRun this command in the directory containing PokerRL to check whether all unittests pass. \n```\npython -m unittest discover PokerRL/test\n``` \nA more fun way to test whether your installation was successful, is running `examples/interactive_user_v_user.py` to\nplay poker against yourself and `examples/run_cfrp_example.py` to train a CFR+ agent in a small poker game.\n\n## Cloud & Cluster\nPokerRL provides an interface that allows the exact same code to run locally and on a cluster by utilizing\n[ray](https://github.com/ray-project/ray). PokerRL supports two modes:\n1. _Distributed_: Run many worker processes on a single machine with many cores\n2. _Cluster_: Run many workers on many machines\n\nYou can enable/disable distributed and cluster mode by switching a boolean in the TrainingProfile.\nThis section assumes you developed your algorithm using a pip-installed version of PokerRL.\n\nExamples of algorithms compatible with distributed PokerRL are this\n[implementation of Neural Fictitious Self-Play](https://github.com/TinkeringCode/Neural-Fictitous-Self-Play) [8].\nand this\n[implementation of Single Deep CFR](https://github.com/TinkeringCode/Single-Deep-CFR) [3]\n\n### Local or Distributed Mode on an AWS instance\n1. Fire up any AWS instance that suits your needs over the management console. This tutorial assumes your base AMI is\n\"Amazon Linux 2 AMI (HVM), SSD Volume Type\". Note: It is important that you add the following allowance to your security group to be able to view logs:\n ```\n Custom TCP Rule | TCP | 8888 | Your_IP | Tensorboard\n ```\n1. Run the following commands on the instance:\n ```\n sudo yum update -y\n sudo yum install git gcc g++ polkit -y\n sudo amazon-linux-extras install docker -y\n sudo service docker start\n sudo docker pull alband/crayon \n wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh\n bash Miniconda3-latest-Linux-x86_64.sh -b -p /home/ec2-user/miniconda\n export PATH=/home/ec2-user/miniconda/bin:$PATH\n conda create -n PokerAI python=3.6 -y\n source activate PokerAI\n pip install requests\n conda install pytorch=0.4.1 -c pytorch -y\n pip install PokerRL[distributed]\n ```\n1. Grant your instance access to your codebase so that it can `git pull` later on.\n\n1. Create an AMI (i.e. Image of your instance) to be able to skip the past steps in the future.\n\n1. Every time you fire up a new instance with your AMI, execute\n ```\n sudo service docker start\n sudo docker inspect -f {{.State.Running}} crayon || sudo docker run -d -p 8888:8888 -p 8889:8889 --name crayon alband/crayon\n sudo docker start crayon\n\n screen\n export OMP_NUM_THREADS=1\n export PATH=/home/ec2-user/miniconda/bin:$PATH\n source activate PokerAI\n source deactivate\n source activate PokerAI\n ```\n You have to set `OMP_NUM_THREADS=1` because of a bug in PyTorch 0.4.1 that ignores core/process limits. This is\n fixed in PyTorch 1.0, but 1.0 is actually slower for the recurrent networks than 0.4.1 in many cases.\n\n1. The usual syntax to start any algorithm run should be something like\n ```\n cd PATH/TO/PROJECT\n git pull\n python YOUR_SCRIPT.py\n ```\n\n1. In your browser (locally), please go to `AWS_INSTANCE_PUBLIC_IP:8888` to view logs and results.\n\n### Deploy on a cluster\nThe step from distributed to cluster only requires changes as documented by [ray](https://github.com/ray-project/ray).\nOnce you have your cluster specification file (`.yaml`) and your AWS account is set up, just enable the `cluster`\noption in your TrainingProfile and start your cluster via ray over the command line.\n\n## Notes\nAn optional debugging tool that can plot the full game-tree with an agent's strategy in tiny games.\nThe code for that (authored by Sebastian De Ro) can be downloaded from\n[here](https://drive.google.com/file/d/1Oo4OyKZuO46GGnTQgWTqvX9z-BW_iShJ/view?usp=sharing).\nTo install it, just drag the PokerViz folder directly onto your `C:/` drive (Windows) or in your `home` directory\n(Linux). PokerRL will then detect that it is installed and export visualizations when you run Best Response on small\ngames. To view the trees, go into the `data` directory and rename the tree you want to view to `data.js` and then open\n`index.html`.\n\n\nNote that the Python code imports small bits of functionality exported from C++ to .dll and .so files, for Win and Linux\nrespectively. Only the binaries are included with this repository.\n\n## Citing\nIf you use PokerRL in your research, you can cite it as follows:\n```\n@misc{steinberger2019pokerrl,\n author = {Eric Steinberger},\n title = {PokerRL},\n year = {2019},\n publisher = {GitHub},\n journal = {GitHub repository},\n howpublished = {\\url{https://github.com/TinkeringCode/PokerRL}},\n}\n```\n\n\n\n\n\n## Authors\n* **Eric Steinberger**\n\n\n\n\n\n## License\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.\n\n\n\n\n\n## Acknowledgments\nI want to thank Alexander Mandt for getting ray to run on our local cluster of 64 workers and HTL Spengergasse for\nproviding it. Sebastian De Ro developed a game tree visualisation\n[tool](https://github.com/sebastiandero/js-visualize-best-response-tree) that we integrated into PokerRL and\ncontributed to our batched poker hand evaluator written in C++.\n\n## References\n[0] Zinkevich, Martin, et al. \"Regret minimization in games with incomplete information.\" Advances in neural information\nprocessing systems. 2008.\n\n[1] Tammelin, Oskari, et al. \"Solving Heads-Up Limit Texas Hold'em.\" IJCAI.dc 2015.\n\n[2] Lanctot, Marc, et al. \"Monte Carlo sampling for regret minimization in extensive games.\" Advances in neural\ninformation processing systems. 2009.\n\n[3] Heinrich, Johannes, and David Silver. \"Deep reinforcement learning from self-play in imperfect-information games.\"\narXiv preprint arXiv:1603.01121 (2016).\n\n[4] Srinivasan, Sriram, et al. \"Actor-critic policy optimization in partially observable multiagent environments.\"\nAdvances in Neural Information Processing Systems. 2018.\n\n[5] Brown, Noam, et al. \"Deep Counterfactual Regret Minimization.\" arXiv preprint arXiv:1811.00164 (2018).\n\n[6] https://github.com/ray-project/ray\n\n[7] Lisy, Viliam, and Michael Bowling. \"Equilibrium Approximation Quality of Current No-Limit Poker Bots.\" arXiv\npreprint arXiv:1612.07547 (2016).\n\n[8] Steinberger, Eric. \"Single Deep Counterfactual Regret Minimization.\" arXiv preprint arXiv:1901.07621 (2019).\n\n[9] Wang, Ziyu, et al. \"Dueling network architectures for deep reinforcement learning.\"\narXiv preprint arXiv:1511.06581 (2015).\n\n[10] Zinkevich, Martin, et al. \"Regret minimization in games with incomplete information.\" Advances in neural\ninformation processing systems. 2008.\n\n[11] Tammelin, Oskari. \"Solving large imperfect information games using CFR+.\" arXiv preprint arXiv:1407.5042 (2014).\n\n[12] Brown, Noam, and Tuomas Sandholm. \"Solving Imperfect-Information Games via Discounted Regret Minimization.\"\narXiv preprint arXiv:1809.04040 (2018).\n\n\n", "description_content_type": "text/markdown", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/TinkeringCode/PokerRL", "keywords": "", "license": "MIT", "maintainer": "", "maintainer_email": "", "name": "PokerRL", "package_url": "https://pypi.org/project/PokerRL/", "platform": "", "project_url": "https://pypi.org/project/PokerRL/", "project_urls": { "Homepage": "https://github.com/TinkeringCode/PokerRL" }, "release_url": "https://pypi.org/project/PokerRL/0.0.3/", "requires_dist": [ "requests", "gym (==0.10.9)", "pycrayon (==0.5)", "psutil", "ray (==0.6.1); extra == 'distributed'", "redis (==2.10.6); extra == 'distributed'", "boto3; extra == 'distributed'" ], "requires_python": "", "summary": "A framework for Reinforcement Learning in Poker.", "version": "0.0.3" }, "last_serial": 5164748, "releases": { "0.0.1": [ { "comment_text": "", "digests": { "md5": "2b39c5d33bf437a032d69cef00b67415", "sha256": "cc312fb8326c37bbc834cb09eef20edb5ba3dbe66dbf8d8c67d49a78fae8c2fa" }, "downloads": -1, "filename": "PokerRL-0.0.1-py3-none-any.whl", "has_sig": false, "md5_digest": "2b39c5d33bf437a032d69cef00b67415", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 176429, "upload_time": "2019-02-06T15:37:23", "url": "https://files.pythonhosted.org/packages/b5/73/a6e7d5b590b6f5178b032d1af98fbabab50a686c9bfd4f55afbae7a461ad/PokerRL-0.0.1-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "2bf7da78e17e94bc8933c6ac026079b9", "sha256": "c7d89e04a76b1c657447a4ba963f81390d2a6a52d781ddeb525b43741f613243" }, "downloads": -1, "filename": "PokerRL-0.0.1.tar.gz", "has_sig": false, "md5_digest": "2bf7da78e17e94bc8933c6ac026079b9", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 135064, "upload_time": "2019-02-06T15:37:25", "url": "https://files.pythonhosted.org/packages/7f/d9/3022f752c8c911f029d1b78e717f2f94577c8690e91e34eda2e753629edf/PokerRL-0.0.1.tar.gz" } ], "0.0.2": [ { "comment_text": "", "digests": { "md5": "ba6c917e3c59bc55b1afa8cbfecf1b02", "sha256": "de055600523fc8278862702085953d32c9a557721f4c1a295eba722191cf1987" }, "downloads": -1, "filename": "PokerRL-0.0.2-py3-none-any.whl", "has_sig": false, "md5_digest": "ba6c917e3c59bc55b1afa8cbfecf1b02", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 177274, "upload_time": "2019-03-14T03:39:54", "url": "https://files.pythonhosted.org/packages/5d/8b/e05407b7b2693f6faee7698d9e0908bafdeb82ca26fc9020772404b9eb77/PokerRL-0.0.2-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "ab4aaaa80a8d433ca0b18ed8682ed64f", "sha256": "d0982393d6ad1d41b7be28d42756bec3201cad93e9932b4462d13c3df08f1203" }, "downloads": -1, "filename": "PokerRL-0.0.2.tar.gz", "has_sig": false, "md5_digest": "ab4aaaa80a8d433ca0b18ed8682ed64f", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 135295, "upload_time": "2019-03-14T03:39:55", "url": "https://files.pythonhosted.org/packages/96/e4/ed98f16e0b512292f0bc8ea39312bb223b5a078ad736e31b8a5d2ac43bb6/PokerRL-0.0.2.tar.gz" } ], "0.0.3": [ { "comment_text": "", "digests": { "md5": "6a93ccf8e9f4586827b95ed55093bedf", "sha256": "7ba5be31c4d36cf476f8d3bbec873b4213460258a4c89459965488a68aff6dba" }, "downloads": -1, "filename": "PokerRL-0.0.3-py3-none-any.whl", "has_sig": false, "md5_digest": "6a93ccf8e9f4586827b95ed55093bedf", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 177326, "upload_time": "2019-04-19T13:16:52", "url": "https://files.pythonhosted.org/packages/93/44/a3d21b818b19731c720fbed9782b4696d2117a19d85f03398c9e65d65d19/PokerRL-0.0.3-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "7812ec87dc324c07bc6236eddae88cd6", "sha256": "456af3ab254ee0c46ee0338f03e81a76dc7094a390cbc30433ab57ff82ed3f1d" }, "downloads": -1, "filename": "PokerRL-0.0.3.tar.gz", "has_sig": false, "md5_digest": "7812ec87dc324c07bc6236eddae88cd6", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 135368, "upload_time": "2019-04-19T13:16:54", "url": "https://files.pythonhosted.org/packages/4d/3b/3f6f5f088133649f216709be7a3ee5f0355c34e567202839bc3f70626cc3/PokerRL-0.0.3.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "6a93ccf8e9f4586827b95ed55093bedf", "sha256": "7ba5be31c4d36cf476f8d3bbec873b4213460258a4c89459965488a68aff6dba" }, "downloads": -1, "filename": "PokerRL-0.0.3-py3-none-any.whl", "has_sig": false, "md5_digest": "6a93ccf8e9f4586827b95ed55093bedf", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 177326, "upload_time": "2019-04-19T13:16:52", "url": "https://files.pythonhosted.org/packages/93/44/a3d21b818b19731c720fbed9782b4696d2117a19d85f03398c9e65d65d19/PokerRL-0.0.3-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "7812ec87dc324c07bc6236eddae88cd6", "sha256": "456af3ab254ee0c46ee0338f03e81a76dc7094a390cbc30433ab57ff82ed3f1d" }, "downloads": -1, "filename": "PokerRL-0.0.3.tar.gz", "has_sig": false, "md5_digest": "7812ec87dc324c07bc6236eddae88cd6", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 135368, "upload_time": "2019-04-19T13:16:54", "url": "https://files.pythonhosted.org/packages/4d/3b/3f6f5f088133649f216709be7a3ee5f0355c34e567202839bc3f70626cc3/PokerRL-0.0.3.tar.gz" } ] }