{ "info": { "author": "Rabbit Noname", "author_email": "faithoptimistic@gmail.com", "bugtrack_url": null, "classifiers": [ "Development Status :: 3 - Alpha", "Intended Audience :: Developers", "License :: OSI Approved :: MIT License", "Programming Language :: Python :: 2.7", "Topic :: Software Development :: Build Tools" ], "description": "# Designed to simplify the reinforcement learning.\n# It can do the following:\n- simplify the construction of neural networks \n- automatically construct the shadow neural network with NO human efforts.\n- simplify the building of the gradient functions\n- offer a simple interface to different game environments, e.g., openAI gym, Flappybird.\n\n\n# Quick Start \n```\npython DDPG/DDPG.py \n```\nIn this example, I implemented the DDPG [1] algorithm and run it against an openAI gym game. \n```\npython DQN/DQN.py\n```\nIn this example, I implemented the DQN [2] algorithm and run it against the Flappybird game.\n\n# Technical Details\nBefore using the library:\n```\nclass Actor:\n # variables into a bag\n # ema wrapper\n # gradient wrapper\n def __init__(self, session, state_size, action_size):\n hidden1Size = 200\n hidden2Size = 200\n\n self.session = session\n\n self.W1 = self.weight_variable(shape = [state_size, hidden1Size])\n self.B1 = self.bias_variable(shape=[hidden1Size])\n self.W2 = self.weight_variable(shape=[hidden1Size, hidden2Size])\n self.B2 = self.bias_variable(shape=[hidden2Size])\n self.W3 = self.weight_variable(shape=[hidden2Size, action_size])\n self.B3 = self.bias_variable(shape=[action_size])\n\n self.InputStates = tf.placeholder(\"float\", shape=[None, state_size])\n self.H1 = tf.nn.relu(tf.matmul(self.InputStates, self.W1) + self.B1)\n self.H2 = tf.nn.relu(tf.matmul(self.H1, self.W2) + self.B2)\n self.out = tf.matmul(self.H2, self.W3) + self.B3\n\n\n self.Qgradient = tf.placeholder(tf.float32, [None, action_size])\n\n self.gradient = tf.gradients(self.out, [self.W1, self.B1, self.W2, self.B2, self.W3, self.B3], -self.Qgradient)\n zipped = zip(self.gradient, [self.W1, self.B1, self.W2, self.B2, self.W3, self.B3])\n self.apply = tf.train.AdamOptimizer(1e-6).apply_gradients(zipped)\n\n ema = tf.train.ExponentialMovingAverage(0.999)\n self.target_update = ema.apply([self.W1, self.B1, self.W2, self.B2, self.W3, self.B3])\n self.target_net = [ema.average(var) for var in [self.W1, self.B1, self.W2, self.B2, self.W3, self.B3]]\n self.target_InputStates = tf.placeholder(\"float\", shape=[None, state_size])\n self.target_H1 = tf.nn.relu(tf.matmul(self.target_InputStates, self.target_net[0]) + self.target_net[1])\n self.target_H2 = tf.nn.relu(tf.matmul(self.target_H1, self.target_net[2]) + self.target_net[3])\n self.target_predict = tf.matmul(self.target_H2, self.target_net[4]) + self.target_net[5]\n \n```\nAfter using the library:\n```\nclass Actor(NN):\n def __init__(self, session, hasShadowNet, state_size, action_size, hidden_state_size):\n NN.__init__(self, session, hasShadowNet)\n # nothing special:\n inputLayer = self.buildInputLayer(\"inputStates\", shape=[None, state_size])\n h1 = self.buildLinearReluWire(inputLayer, [state_size, hidden_state_size])\n #for i in range(numOfHiddenLayers-1): # repeat (numOfHiddenLayers-1) times\n h1 = self.buildLinearReluWire(h1, [hidden_state_size, hidden_state_size])\n out = self.buildLinearWire(h1, [hidden_state_size, action_size])\n self.setOutLayer(out)\n```\n## simplify the construction of neural networks\nThe idea is that we automatically define the weight and bias variables for you based on the shape of the weight (shape of bias is determined by the last dimension of weight). \n\n## automatically construct the shadow neural network with NO human efforts.\nFor better stability, we often need to construct a shadown neural network by applying ExponentialMovingAverage to (the weight/bias variables of) the original neural network. However, construction of the shadow network is not easy: (1) you need to use the tf.train.ExponentialMovingAverage APIs to calculate the smoothed weight/bias variables, (2) you need to reconstruct, with these variables, a network that has the same structure as the original network. Our library lifts this burden by simply removing it. All you need to do is to set the parameter hasShadowNet of the NN.__init__ as True. We will automatically handle all the rest for you.\n\n## simplify the building of the gradient functions\nWe automatically detect the weight/bias variables and calculate the gradient of the output over them.\nAll you need to do is specify the neural network with our APIs. Remember to specify the output layer:\n```\n self.setOutLayer(out)\n```\n\n## offer a simple interface to different game environments, e.g., openAI gym, Flappybird.\n```\nclass GameEngine:\n def __init__(self):\n print(\"game engine initialized\")\n def initialState(self):\n raise NotImplementedError( \"Should have implemented this, return R, S, T\" )\n def step(self, action, state=None):\n raise NotImplementedError( \"Should have implemented this\" )\nclass OpenAIGameEngine(GameEngine):\n ...\nclass FlappyBirdGameEngine(GameEngine):\n ... \n```\n## for details about the algorithm & results, please refer to the Blog.\n\n\n# References\n- 1 CONTINUOUS CONTROL WITH DEEP REINFORCEMENT LEARNING\n- 2 Playing Atari with Deep Reinforcement Learning\n", "description_content_type": null, "docs_url": null, "download_url": "https://github.com/rabbitnoname/rlsimple/archive/master.zip", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/rabbitnoname/rlsimple", "keywords": "reinforcement learning,DDPG,DQN,Actor Critic,simple library", "license": "UNKNOWN", "maintainer": null, "maintainer_email": null, "name": "rlsimple", "package_url": "https://pypi.org/project/rlsimple/", "platform": "UNKNOWN", "project_url": "https://pypi.org/project/rlsimple/", "project_urls": { "Download": "https://github.com/rabbitnoname/rlsimple/archive/master.zip", "Homepage": "https://github.com/rabbitnoname/rlsimple" }, "release_url": "https://pypi.org/project/rlsimple/1.1/", "requires_dist": null, "requires_python": null, "summary": "reinforcement learning simple library", "version": "1.1" }, "last_serial": 2802781, "releases": { "0.2": [ { "comment_text": "", "digests": { "md5": "27650e1d947a3783ab87cb873faca271", "sha256": "01c455d6ad640ef6818e633d5705b8ac3def32988ac522801e058d71e6a5040e" }, "downloads": -1, "filename": "rlsimple-0.2.tar.gz", "has_sig": false, "md5_digest": "27650e1d947a3783ab87cb873faca271", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 5834472, "upload_time": "2017-04-10T03:50:38", "url": "https://files.pythonhosted.org/packages/80/15/b58026ed32adb63057721c448530313451085a7c70d04ea407767c825252/rlsimple-0.2.tar.gz" } ], "0.3": [ { "comment_text": "", "digests": { "md5": "e45def0750e79cb916849fac1fb416c8", "sha256": "d8b6a19dfbf59812881ad27f5a44fcc2ba7896bd1265500842e309f56031d58f" }, "downloads": -1, "filename": "rlsimple-0.3.tar.gz", "has_sig": false, "md5_digest": "e45def0750e79cb916849fac1fb416c8", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 5834449, "upload_time": "2017-04-10T04:08:49", "url": "https://files.pythonhosted.org/packages/ad/3f/5700bd5cefe6066abc4a34ac4948b5d7727db679cb24cac8a034a5f1e9eb/rlsimple-0.3.tar.gz" } ], "0.4": [ { "comment_text": "", "digests": { "md5": "18048ec8518c2b4a416e5cb8a8386823", "sha256": "1d215f07b076e065b7d757f73ccc3ce16fafffcacca5d64c50b5ab3b8ad54b65" }, "downloads": -1, "filename": "rlsimple-0.4.tar.gz", "has_sig": false, "md5_digest": "18048ec8518c2b4a416e5cb8a8386823", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 5837511, "upload_time": "2017-04-11T01:38:06", "url": "https://files.pythonhosted.org/packages/0a/37/7fd78b3a5673f86b5c55052fac8d8878c13875ccc401aab3acb4d524d0cd/rlsimple-0.4.tar.gz" } ], "0.5": [ { "comment_text": "", "digests": { "md5": "04df9551be18d9aed2fed6f172e25bca", "sha256": "0f933d7768e9fbf366624d5bfdffd262a505b2a3734a4082ce69acd04839cecb" }, "downloads": -1, "filename": "rlsimple-0.5.tar.gz", "has_sig": false, "md5_digest": "04df9551be18d9aed2fed6f172e25bca", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 5837504, "upload_time": "2017-04-13T18:15:28", "url": "https://files.pythonhosted.org/packages/d5/51/c7550270ff3a2ce441514964e23bdd8e226a1d85dd773f3e3f704d140f86/rlsimple-0.5.tar.gz" } ], "0.6": [ { "comment_text": "", "digests": { "md5": "d182cbf0e40c4acc75b3470ea7cbc081", "sha256": "df60a18cc0ff1f95ed3206ecc5a8edc17f3f0bf1adb5b0f3ef214e44de8dc2cd" }, "downloads": -1, "filename": "rlsimple-0.6.tar.gz", "has_sig": false, "md5_digest": "d182cbf0e40c4acc75b3470ea7cbc081", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 5840521, "upload_time": "2017-04-13T19:47:47", "url": "https://files.pythonhosted.org/packages/4a/b0/3f5e739128a2ee2003200896834f601af8cb2079212a3d982505863a04f0/rlsimple-0.6.tar.gz" } ], "0.7": [ { "comment_text": "", "digests": { "md5": "a256731637a4f671ffd11557842b7bf4", "sha256": "0679814bfdb76db62cd908a3d6c46cb00f1fbe5e607244c656535d7793ba5e68" }, "downloads": -1, "filename": "rlsimple-0.7.tar.gz", "has_sig": false, "md5_digest": "a256731637a4f671ffd11557842b7bf4", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 5840133, "upload_time": "2017-04-13T19:59:08", "url": "https://files.pythonhosted.org/packages/4e/ef/f993c6149ef19b7ea8355b8997027f6f3436b4173a04bced33bfda0dbbdc/rlsimple-0.7.tar.gz" } ], "0.8": [ { "comment_text": "", "digests": { "md5": "d10c9e1c6a61f02044121231b93753a6", "sha256": "f39a793a3d337d21e3caca31fdbf54cbfa14d0ca6df9d3333a5844d5c6992977" }, "downloads": -1, "filename": "rlsimple-0.8.tar.gz", "has_sig": false, "md5_digest": "d10c9e1c6a61f02044121231b93753a6", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 5838364, "upload_time": "2017-04-13T20:16:49", "url": "https://files.pythonhosted.org/packages/74/59/d169a4f50da59723847b03919976bee776c229db6fb95d536c7a26598b62/rlsimple-0.8.tar.gz" } ], "1.0": [ { "comment_text": "", "digests": { "md5": "063bec624b01379456c1f42fa3579ae6", "sha256": "87667a2a647e06a6ccae819778a3108deed399495da93851e32f1b74832b5e52" }, "downloads": -1, "filename": "rlsimple-1.0.tar.gz", "has_sig": false, "md5_digest": "063bec624b01379456c1f42fa3579ae6", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 11671364, "upload_time": "2017-04-14T00:28:01", "url": "https://files.pythonhosted.org/packages/f1/53/2f4d255357e687eb9e2dd76aeeefc6a1d4280dc1894483ead083fda0e990/rlsimple-1.0.tar.gz" } ], "1.1": [ { "comment_text": "", "digests": { "md5": "3a296d324215fb31027ce98e0146a3ec", "sha256": "1a64ac3e44c4a27b49b5e20d3357efbe9dbb0ff88d71f646e88eaa2071e68594" }, "downloads": -1, "filename": "rlsimple-1.1.tar.gz", "has_sig": false, "md5_digest": "3a296d324215fb31027ce98e0146a3ec", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 5838488, "upload_time": "2017-04-14T00:42:20", "url": "https://files.pythonhosted.org/packages/55/58/4b353e2176ba19709d73579b1ef3a572e15c9972a3324e2f8f74e12ad204/rlsimple-1.1.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "3a296d324215fb31027ce98e0146a3ec", "sha256": "1a64ac3e44c4a27b49b5e20d3357efbe9dbb0ff88d71f646e88eaa2071e68594" }, "downloads": -1, "filename": "rlsimple-1.1.tar.gz", "has_sig": false, "md5_digest": "3a296d324215fb31027ce98e0146a3ec", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 5838488, "upload_time": "2017-04-14T00:42:20", "url": "https://files.pythonhosted.org/packages/55/58/4b353e2176ba19709d73579b1ef3a572e15c9972a3324e2f8f74e12ad204/rlsimple-1.1.tar.gz" } ] }