{
    "info": {
        "author": "chimera0",
        "author_email": "ai-brain-lab@accel-brain.com",
        "bugtrack_url": null,
        "classifiers": [
            "Development Status :: 5 - Production/Stable",
            "Framework :: Robot Framework",
            "Intended Audience :: Information Technology",
            "Intended Audience :: Science/Research",
            "License :: OSI Approved :: GNU General Public License v2 (GPLv2)",
            "Programming Language :: Python :: 3",
            "Topic :: Scientific/Engineering :: Artificial Intelligence"
        ],
        "description": "# Reinforcement Learning Library: pyqlearning\n\n`pyqlearning` is Python library to implement Reinforcement Learning and Deep Reinforcement Learning, especially for Q-Learning, Deep Q-Network, and Multi-agent Deep Q-Network which can be optimized by Annealing models such as Simulated Annealing, Adaptive Simulated Annealing, and Quantum Monte Carlo Method.\n\nThis library makes it possible to design the information search algorithm such as the Game AI, web crawlers, or Robotics. But this library provides components for designers, not for end-users of state-of-the-art black boxes. Briefly speaking the philosophy of this library, *give user hype-driven blackboxes and you feed him for a day; show him how to design algorithms and you feed him for a lifetime.* So algorithm is power.\n\n<div align=\"center\">\n    <table style=\"border: none;\">\n        <tr>\n            <td width=\"45%\" align=\"center\">\n            <p><a href=\"https://github.com/chimera0/accel-brain-code/blob/master/Reinforcement-Learning/demo/search_maze_by_deep_q_network.ipynb\" target=\"_blank\"><img src=\"https://storage.googleapis.com/accel-brain-code/Reinforcement-Learning/img/DQN_single_agent_goal_compressed-loop.gif\" /></a></p>\n            <p>Deep Reinforcement Learning (Deep Q-Network: DQN) to solve Maze.</p>\n            </td>\n            <td width=\"45%\" align=\"center\">\n            <p><a href=\"https://github.com/chimera0/accel-brain-code/blob/master/Reinforcement-Learning/demo/search_maze_by_deep_q_network.ipynb\" target=\"_blank\"><img src=\"https://storage.googleapis.com/accel-brain-code/Reinforcement-Learning/img/DQN_multi_agent_demo_goal_enemy_2-compressed.gif\" /></a></p>\n            <p>Multi-agent Deep Reinforcement Learning to solve the pursuit-evasion game.</p>\n            </td>\n        </tr>\n    </table>\n</div>\n\n## Installation\n\nInstall using pip:\n\n```sh\npip install pyqlearning\n```\n\n### Source code\n\nThe source code is currently hosted on GitHub.\n\n- [accel-brain-code/Reinforcement-Learning](https://github.com/chimera0/accel-brain-code/tree/master/Reinforcement-Learning)\n\n### Python package index(PyPI)\n\nInstallers for the latest released version are available at the Python package index.\n\n- [pyqlearning : Python Package Index](https://pypi.python.org/pypi/pyqlearning/)\n\n### Dependencies\n\n- numpy: v1.13.3 or higher.\n- pandas: v0.22.0 or higher.\n\n#### Option\n\n- [pydbm](https://github.com/chimera0/accel-brain-code/tree/master/Deep-Learning-by-means-of-Design-Pattern): v1.4.3 or higher.\n    * Only if you want to implement the *Deep* Reinforcement Learning.\n\n## Documentation\n\nFull documentation is available on [https://code.accel-brain.com/Reinforcement-Learning/](https://code.accel-brain.com/Reinforcement-Learning/) . This document contains information on functionally reusability, functional scalability and functional extensibility.\n\n## Description\n\n`pyqlearning` is Python library to implement Reinforcement Learning and Deep Reinforcement Learning, especially for Q-Learning, Deep Q-Network, and Multi-agent Deep Q-Network which can be optimized by Annealing models such as Simulated Annealing, Adaptive Simulated Annealing, and Quantum Monte Carlo Method.\n\nThis library provides components for designers, not for end-users of state-of-the-art black boxes. Reinforcement learning algorithms are highly variable because they must design single or multi-agent behavior depending on their problem setup. Designers of algorithms and architectures are required to design according to the situation at each occasion. Commonization and commoditization for end users who want easy-to-use tools is not easy. Nonetheless, commonality / variability analysis and object-oriented analysis are not impossible. I am convinced that a designer who can *practice* *abstraction* of concepts by *drawing a distinction* of concepts related to his/her *own concrete problem settings* makes it possible to distinguish commonality and variability of various Reinforcement Learning algorithms.\n\n### The commonality/variability of Epsilon Greedy Q-Leanring and Boltzmann Q-Learning\n\nAccording to the Reinforcement Learning problem settings, Q-Learning is a kind of **Temporal Difference learning(TD Learning)** that can be considered as hybrid of **Monte Carlo** method and **Dynamic Programming** method. As Monte Carlo method, TD Learning algorithm can learn by experience without model of environment. And this learning algorithm is functional extension of bootstrap method as Dynamic Programming Method.\n\nIn this library, Q-Learning can be distinguished into **Epsilon Greedy Q-Leanring** and **Boltzmann Q-Learning**. These algorithm is functionally equivalent but their structures should be conceptually distinguished.\n\nEpsilon Greedy Q-Leanring algorithm is a typical off-policy algorithm. In this paradigm, *stochastic* searching and *deterministic* searching can coexist by hyperparameter <img src=\"https://storage.googleapis.com/accel-brain-code/Reinforcement-Learning/img/latex/epsilon.gif\" /> that is probability that agent searches greedy. Greedy searching is *deterministic* in the sense that policy of agent follows the selection that maximizes the Q-Value.\n\nBoltzmann Q-Learning algorithm is based on Boltzmann action selection mechanism, where the probability\n<img src=\"https://storage.googleapis.com/accel-brain-code/Reinforcement-Learning/img/latex/x_i.gif\" /> of selecting the action <img src=\"https://storage.googleapis.com/accel-brain-code/Reinforcement-Learning/img/latex/i.gif\" /> is given by\n\n<!-- $$x_i(t) = \\frac{e^{\\frac{Q_i(t)}{T}}}{\\sum_{k}^{ } e^{\\frac{Q_i(t)}{T}}} \\ \\  (i = 1, 2, ..., n)$$ -->\n<div><img src=\"https://storage.googleapis.com/accel-brain-code/Reinforcement-Learning/img/latex/boltzmann_action_selection.gif\" /></div>\n\nwhere the temperature <img src=\"https://storage.googleapis.com/accel-brain-code/Reinforcement-Learning/img/latex/t_gt_0.gif\" /> controls exploration/exploitation tradeoff. For <img src=\"https://storage.googleapis.com/accel-brain-code/Reinforcement-Learning/img/latex/t_to_0.gif\" /> the agent always acts greedily and chooses the strategy corresponding to the maximum Q\u2013value, so as to be pure *deterministic* exploitation, whereas for <img src=\"https://storage.googleapis.com/accel-brain-code/Reinforcement-Learning/img/latex/t_to_infty.gif\" /> the agent\u2019s strategy is completely random, so as to be pure *stochastic* exploration.\n\n### Commonality/variability of Q-learning models\n\nConsidering many variable parts and functional extensions in the Q-learning paradigm from perspective of *commonality/variability analysis* in order to practice object-oriented design, this library provides abstract class that defines the skeleton of a Q-Learning algorithm in an operation, deferring some steps in concrete variant algorithms such as Epsilon Greedy Q-Leanring and Boltzmann Q-Learning to client subclasses. The abstract class in this library lets subclasses redefine certain steps of a Q-Learning algorithm without changing the algorithm's structure.\n\n<img src=\"https://storage.googleapis.com/accel-brain-code/Reinforcement-Learning/img/class_diagram_pyqleanring_QLearning.png\" />\n\nTypical concepts such as `State`, `Action`, `Reward`, and `Q-Value` in Q-learning models should be refered as viewpoints for distinguishing between *commonality* and *variability*. Among the functions related to these concepts, the class `QLearning` is responsible for more *common* attributes and behaviors. On the other hand, in relation to *your* concrete problem settings, more *variable* elements have to be implemented by subclasses such as `YourGreedyQLearning` or `YourBoltzmannQLearning`.\n\nFor more detailed specification of this template method, refer to API documentation: [pyqlearning.q_learning module](https://code.accel-brain.com/Reinforcement-Learning/pyqlearning.html#module-pyqlearning.q_learning). If you want to know the samples of implemented code, see [demo/](https://github.com/chimera0/accel-brain-code/tree/master/Reinforcement-Learning/demo). \n\n### Structural extension: Deep Reinforcement Learning\n\nThe Reinforcement learning theory presents several issues from a perspective of deep learning theory(Mnih, V., et al. 2013). Firstly, deep learning applications have required large amounts of hand-labelled training data. Reinforcement learning algorithms, on the other hand, must be able to learn from a scalar reward signal that is frequently sparse, noisy and delayed.\n\nThe difference between the two theories is not only the type of data but also the timing to be observed. The delay between taking actions and receiving rewards, which can be thousands of timesteps long, seems particularly daunting when compared to the direct association between inputs and targets found in supervised learning.\n\nAnother issue is that deep learning algorithms assume the data samples to be independent, while in reinforcement learning one typically encounters sequences of highly correlated states. Furthermore, in Reinforcement learning, the data distribution changes as the algorithm learns new behaviours, presenting aspects of *recursive learning*, which can be problematic for deep learning methods that assume a fixed underlying distribution.\n\n#### Generalisation, or a function approximation\n\nThis library considers problem setteing in which an agent interacts with an environment <img src=\"https://storage.googleapis.com/accel-brain-code/Reinforcement-Learning/img/latex/mathcal_E.png\" />, in a sequence of actions, observations and rewards. At each time-step the agent selects an action at from the set of possible actions, <img src=\"https://storage.googleapis.com/accel-brain-code/Reinforcement-Learning/img/latex/A_1_K.png\" />. The state/action-value function is <img src=\"https://storage.googleapis.com/accel-brain-code/Reinforcement-Learning/img/latex/Q_s_a.png\" />.\n\nThe goal of the agent is to interact with the <img src=\"https://storage.googleapis.com/accel-brain-code/Reinforcement-Learning/img/latex/mathcal_E.png\" /> by selecting actions in a way that maximises future rewards. We can make the standard assumption that future rewards are discounted by a factor of $\\gamma$ per time-step, and define the future discounted return at time <img src=\"https://storage.googleapis.com/accel-brain-code/Reinforcement-Learning/img/latex/t.png\" /> as \n\n<img src=\"https://storage.googleapis.com/accel-brain-code/Reinforcement-Learning/img/latex/r_t_sum_t_t_T_gamma.png\" />, \n\nwhere <img src=\"https://storage.googleapis.com/accel-brain-code/Reinforcement-Learning/img/latex/Tt.png\" /> is the time-step at which the agent will reach the goal. This library defines the optimal state/action-value function <img src=\"https://storage.googleapis.com/accel-brain-code/Reinforcement-Learning/img/latex/Q_ast_s_a.png\" /> as the maximum expected return achievable by following any strategy, after seeing some state <img src=\"https://storage.googleapis.com/accel-brain-code/Reinforcement-Learning/img/latex/s.png\" /> and then taking some action <img src=\"https://storage.googleapis.com/accel-brain-code/Reinforcement-Learning/img/latex/a.png\" />, \n\n<img src=\"https://storage.googleapis.com/accel-brain-code/Reinforcement-Learning/img/latex/Q_ast_s_a_max_pi_E.png\" />, \n\nwhere <img src=\"https://storage.googleapis.com/accel-brain-code/Reinforcement-Learning/img/latex/pi.png\" /> is a policy mapping sequences to actions (or distributions over actions). \n\nThe optimal state/action-value function obeys an important identity known as the Bellman equation. This is based on the following *intuition*: if the optimal value <img src=\"https://storage.googleapis.com/accel-brain-code/Reinforcement-Learning/img/latex/Q_ast_s_d_a_d.png\" /> of the sequence <img src=\"https://storage.googleapis.com/accel-brain-code/Reinforcement-Learning/img/latex/s_d.png\" /> at the next time-step was known for all possible actions <img src=\"https://storage.googleapis.com/accel-brain-code/Reinforcement-Learning/img/latex/a_d.png\" />, then the optimal strategy is to select the action <img src=\"https://storage.googleapis.com/accel-brain-code/Reinforcement-Learning/img/latex/a_d.png\" /> maximising the expected value of \n\n<img src=\"https://storage.googleapis.com/accel-brain-code/Reinforcement-Learning/img/latex/r_gamma_Q_ast_s_d_a_d.png\" />, \n\n<img src=\"https://storage.googleapis.com/accel-brain-code/Reinforcement-Learning/img/latex/Q_ast_s_d_a_d_mathbb_E_s_d_sim_mathcal_E.png\" />.\n\nThe basic idea behind many reinforcement learning algorithms is to estimate the state/action-value function, by using the Bellman equation as an iterative update,\n\n<img src=\"https://storage.googleapis.com/accel-brain-code/Reinforcement-Learning/img/latex/Q_i_1_s_a_mathbb_E_r_gamma_max_a_d.png\" />.\n\nSuch *value iteration algorithms* converge to the optimal state/action-value function, <img src=\"https://storage.googleapis.com/accel-brain-code/Reinforcement-Learning/img/latex/Q_i_rightarrow_Q_ast.png\" /> as <img src=\"https://storage.googleapis.com/accel-brain-code/Reinforcement-Learning/img/latex/i_rightarrow_infty.png\" />. \n\nBut increasing the complexity of states/actions is equivalent to increasing the number of combinations of states/actions. If the value function is continuous and granularities of states/actions are extremely fine, the combinatorial explosion will be encountered. In other words, this basic approach is totally impractical, because the state/action-value function is estimated separately for each sequence, without any **generalisation**. Instead, it is common to use a **function approximator** to estimate the state/action-value function,\n\n<img src=\"https://storage.googleapis.com/accel-brain-code/Reinforcement-Learning/img/latex/Q_s_a_theta_approx_Q_ast_s_a.png\" />\n\nSo the Reduction of complexities is required.\n\n### Deep Q-Network\n\nIn this problem setting, the function of nerual network or deep learning is a function approximation with weights <img src=\"https://storage.googleapis.com/accel-brain-code/Reinforcement-Learning/img/latex/theta.png\" /> as a Q-Network. A Q-Network can be trained by minimising a loss functions <img src=\"https://storage.googleapis.com/accel-brain-code/Reinforcement-Learning/img/latex/L_i_theta_i.png\" /> that changes at each iteration <img src=\"https://storage.googleapis.com/accel-brain-code/Reinforcement-Learning/img/latex/i.png\" />,\n\n<img src=\"https://storage.googleapis.com/accel-brain-code/Reinforcement-Learning/img/latex/L_i_theta_i_mathbb_E_s_a_sim_rho_cdot.png\" />\n\nwhere \n\n<img src=\"https://storage.googleapis.com/accel-brain-code/Reinforcement-Learning/img/latex/y_i_mathbb_E_s_d_sim_mathcal_E_r_gamma_max_a_d.png\" />\n\nis the target for iteration <img src=\"https://storage.googleapis.com/accel-brain-code/Reinforcement-Learning/img/latex/i.png\" /> and <img src=\"https://storage.googleapis.com/accel-brain-code/Reinforcement-Learning/img/latex/rho_cdot.png\" /> is a so-called behaviour distribution. This is probability distribution over states and actions. The parameters from the previous iteration <img src=\"https://storage.googleapis.com/accel-brain-code/Reinforcement-Learning/img/latex/theta_i_1.png\" /> are held fixed when optimising the loss function <img src=\"https://storage.googleapis.com/accel-brain-code/Reinforcement-Learning/img/latex/L_i_theta_i.png\" />. Differentiating the loss function with respect to the weights we arrive at the following gradient,\n\n<img src=\"https://storage.googleapis.com/accel-brain-code/Reinforcement-Learning/img/latex/nabla_theta_i_L_i_theta_i_mathbb_E_s_a_sim_rho_cdot.png\" />\n\n### Functional equivalent: LSTM\n\nIt is not inevitable to functionally reuse CNN as a function approximator. In the above problem setting of generalisation and Combination explosion, for instance, Long Short-Term Memory(LSTM) networks, which is-a special Reccurent Neural Network(RNN) structure, and CNN as a function approximator are functionally equivalent. In the same problem setting, functional equivalents can be functionally replaced. Considering that the feature space of the rewards has the time-series nature, LSTM will be more useful.\n\n##### Structure of LSTM.\n\nOriginally, Long Short-Term Memory(LSTM) networks as a special RNN structure has proven stable and\npowerful for modeling long-range dependencies. The Key point of structural expansion is its memory cell <img src=\"https://storage.googleapis.com/accel-brain-code/Deep-Learning-by-means-of-Design-Pattern/img/latex/c_t.png\" /> which essentially acts as an accumulator of the state information. Every time observed data points are given as new information <img src=\"https://storage.googleapis.com/accel-brain-code/Deep-Learning-by-means-of-Design-Pattern/img/latex/g_t.png\" /> and input to LSTM's input gate, its information will be accumulated to the cell if the input gate <img src=\"https://storage.googleapis.com/accel-brain-code/Deep-Learning-by-means-of-Design-Pattern/img/latex/i_t.png\" /> is activated. The past state of cell <img src=\"https://storage.googleapis.com/accel-brain-code/Deep-Learning-by-means-of-Design-Pattern/img/latex/c_t-1.png\" /> could be forgotten in this process if LSTM's forget gate <img src=\"https://storage.googleapis.com/accel-brain-code/Deep-Learning-by-means-of-Design-Pattern/img/latex/f_t.png\" /> is on. Whether the latest cell output <img src=\"https://storage.googleapis.com/accel-brain-code/Deep-Learning-by-means-of-Design-Pattern/img/latex/c_t.png\" /> will be propagated to the final state <img src=\"https://storage.googleapis.com/accel-brain-code/Deep-Learning-by-means-of-Design-Pattern/img/latex/h_t.png\" /> is further controlled by the output gate <img src=\"https://storage.googleapis.com/accel-brain-code/Deep-Learning-by-means-of-Design-Pattern/img/latex/o_t.png\" />.\n\nOmitting so-called peephole connection, it makes possible to combine the activations in LSTM gates into an affine transformation below.\n\n<div><img src=\"https://storage.googleapis.com/accel-brain-code/Deep-Learning-by-means-of-Design-Pattern/img/latex/lstm_affine.png\" /></div>\n\nwhere <img src=\"https://storage.googleapis.com/accel-brain-code/Deep-Learning-by-means-of-Design-Pattern/img/latex/W_lstm.png\" /> is a weight matrix which connects observed data points and hidden units in LSTM gates, and <img src=\"https://storage.googleapis.com/accel-brain-code/Deep-Learning-by-means-of-Design-Pattern/img/latex/u.png\" /> is a weight matrix which connects hidden units as a remembered memory in LSTM gates. Furthermore, activation functions are as follows:\n\n<div><img src=\"https://storage.googleapis.com/accel-brain-code/Deep-Learning-by-means-of-Design-Pattern/img/latex/lstm_given.png\" /></div>\n\n<div><img src=\"https://storage.googleapis.com/accel-brain-code/Deep-Learning-by-means-of-Design-Pattern/img/latex/lstm_input_gate.png\" /></div>\n\n<div><img src=\"https://storage.googleapis.com/accel-brain-code/Deep-Learning-by-means-of-Design-Pattern/img/latex/lstm_forget_gate.png\" /></div>\n\n<div><img src=\"https://storage.googleapis.com/accel-brain-code/Deep-Learning-by-means-of-Design-Pattern/img/latex/lstm_output_gate.png\" /></div>\n\nand the acitivation of memory cell and hidden units are calculated as follows:\n\n<div><img src=\"https://storage.googleapis.com/accel-brain-code/Deep-Learning-by-means-of-Design-Pattern/img/latex/lstm_memory_cell.png\" /></div>\n\n<div><img src=\"https://storage.googleapis.com/accel-brain-code/Deep-Learning-by-means-of-Design-Pattern/img/latex/lstm_hidden_activity.png\" /></div>\n\n### Commonality/variability of Deep Q-learning models\n\nAlso considering many variable parts and functional extensions in the Deep Q-learning paradigm from perspective of *commonality/variability analysis* in order to practice object-oriented design, this library provides abstract class that defines the skeleton of a Deep Q-Learning algorithm in an operation, deferring some steps in concrete variant algorithms such as Deep Q-Network to client subclasses. The abstract class in this library lets subclasses redefine certain steps of a Deep Q-Learning algorithm without changing the algorithm's structure.\n\nAnd this library provides the interface to implement many variable function approximators, which defines a family of algorithms to solve generalisation problems, encapsulate each one, and make them interchangeable. Strategy lets the algorithms such as CNNs and LSTM vary independently from the clients that use it. Capture the abstraction in an interface, bury implementation details in derived classes.\n\n<img src=\"https://storage.googleapis.com/accel-brain-code/Reinforcement-Learning/img/pyqlearning_DeepQLearning_class_diagram-v2.png\">\n\nThe viewpoints for distinguishing between *commonality* and *variability* should  relate to not only typical concepts such as `State`, `Action`, `Reward`, and `Q-Value` in Q-learning models but also concepts of function approximators based on the Deep Learning Theory. Among the functions related to these concepts, the class `DeepQLearning` and `DeepQNetwork` are responsible for more *common* attributes and behaviors. On the other hand, in relation to *your* concrete problem settings, more *variable* elements have to be implemented by subclasses `YourDeepQNetwork`. And `DeepQLearning` has a `FunctionApproximator` to learn and inference `Q-Value` but the concrete object as a `FunctionApproximator` is more *variable*. The designers have to decide what Deep Learning Algorithm to use.\n\nFor more detailed specification of this template method, refer to API documentation: [pyqlearning.deep_q_learning module](https://code.accel-brain.com/Reinforcement-Learning/pyqlearning.html#pyqlearning.deep_q_learning.DeepQLearning) and [pyqlearning.function_approximator module](https://code.accel-brain.com/Reinforcement-Learning/pyqlearning.html#pyqlearning.function_approximator.FunctionApproximator). If you want to know the samples of implemented code, see [demo/](https://github.com/chimera0/accel-brain-code/tree/master/Reinforcement-Learning/demo) and the following tutorial. \n\n### Sample subclass\n\n[templates/your_deep_q_network.py](https://github.com/chimera0/accel-brain-code/blob/master/Reinforcement-Learning/templates/your_deep_q_network.py) is a template subclass which is-a `DeepQNetwork`. The following code is an example of implementation.\n\nFirst of all, we should practice model selection in relatino to *your* problem settings. If the general Deep Q-Network can be expected to function as a *your* problem solution, CNNs can be selected as a function approximator model.\n\nLet's build the model as follows, while confirming the specifications of the [pydbm](https://github.com/chimera0/accel-brain-code/tree/master/Deep-Learning-by-means-of-Design-Pattern).\n\n```python\n# First convolution layer.\nfrom pydbm.cnn.layerablecnn.convolution_layer import ConvolutionLayer as ConvolutionLayer1\n# Second convolution layer.\nfrom pydbm.cnn.layerablecnn.convolution_layer import ConvolutionLayer as ConvolutionLayer2\n# Computation graph for first convolution layer.\nfrom pydbm.synapse.cnn_graph import CNNGraph as ConvGraph1\n# Computation graph for second convolution layer.\nfrom pydbm.synapse.cnn_graph import CNNGraph as ConvGraph2\n# Tanh Function as activation function.\nfrom pydbm.activation.tanh_function import TanhFunction\n# Mean squared error is-a `pydbm.optimization.opt_params.OptParams`.\nfrom pydbm.loss.mean_squared_error import MeanSquaredError\n# Adam optimizer which is-a `pydbm.optimization.opt_params.OptParams`.\nfrom pydbm.optimization.optparams.adam import Adam\n\n\n# First convolution layer.\nconv1 = ConvolutionLayer1(\n    # Computation graph for first convolution layer.\n    ConvGraph1(\n        # Logistic function as activation function.\n        activation_function=TanhFunction(),\n        # The number of `filter`.\n        filter_num=batch_size,\n        # The number of channel.\n        channel=channel,\n        # The size of kernel.\n        kernel_size=kernel_size,\n        # The filter scale.\n        scale=scale,\n        # The nubmer of stride.\n        stride=stride,\n        # The number of zero-padding.\n        pad=pad\n    )\n)\n\n# Second convolution layer.\nconv2 = ConvolutionLayer2(\n    # Computation graph for second convolution layer.\n    ConvGraph2(\n        # Logistic function as activation function.\n        activation_function=TanhFunction(),\n        # The number of `filter`.\n        filter_num=batch_size,\n        # The number of channel.\n        channel=batch_size,\n        # The size of kernel.\n        kernel_size=kernel_size,\n        # The filter scale.\n        scale=scale,\n        # The nubmer of stride.\n        stride=stride,\n        # The number of zero-padding.\n        pad=pad\n    )\n)\n\n# Stack.\nlayerable_cnn_list=[\n    conv1, \n    conv2\n]\n\n# is-a `pydbm.loss.interface.computable_loss.ComputableLoss`.\ncomputable_loss = MeanSquaredError()\n# is-a `pydbm.optimization.opt_params.OptParams`.\nopt_params = Adam()\n```\n\nNext, we construct a function approximator by delegating the above model. Because of the model selection, we should import `CNNFA` which is-a `FunctionApproximator`.\n\n```python\n# CNN as a Function Approximator.\nfrom pyqlearning.functionapproximator.cnn_fa import CNNFA\n\n# CNN as a function approximator.\nfunction_approximator = CNNFA(\n    # Batch size.\n    batch_size=batch_size,\n    # Stacked CNNs.\n    layerable_cnn_list=layerable_cnn_list,\n    # Learning rate.\n    learning_rate=learning_rate,\n    # is-a `pydbm.loss.interface.computable_loss.ComputableLoss`.\n    computable_loss=computable_loss,\n    # is-a `pydbm.optimization.opt_params.OptParams`.\n    opt_params=opt_params\n)\n```\n\nAnd, we delegate the function approximator to class `YourDeepQNetwork` and instantiate it.\n\n```python\n# Deep Q-Network to solive your problem.\nfrom _path.to.your_deep_q_network import YourDeepQNetwork\n# Instantiate your class.\nyour_deep_q_network = YourDeepQNetwork(function_approximator=function_approximator)\n```\n\nFinally, we perform learning and inference based on the functionality of the implemented concrete class.\n\n```python\n# Learning.\nyour_deep_q_network.learn(state_arr, limit=1000)\n# Inferencing.\nresult_arr = your_deep_q_network.inference(state_arr, limit=1000)\n```\n\n#### Storing pre-learned parameters and do transfer learning\n\n`function_approximator` has a `object` which has the selected model. `model.cnn` which has a method `save_pre_learned_params` to store pre-learned parameters.\n\n```python\n# Delegated model.\nmodel = your_deep_q_network.function_approximator.model\n# Save pre-learned parameters.\nmodel.cnn.save_pre_learned_params(\n    # Path of dir. If `None`, the file is saved in the current directory.\n    dir_path=\"/var/tmp/\",\n    # The naming rule of files. If `None`, this value is `cnn`.\n    file_name=\"demo_cnn\"\n)\n```\n\n`function_approximator.model` has different fields depending on the model selection. For more detailed specification of this function for pre-learned parameters and function of trasfer learning, refer to API documentation: [pyqlearning.function_approximator module](https://code.accel-brain.com/Reinforcement-Learning/pyqlearning.html#pyqlearning.function_approximator.FunctionApproximator) and [pydbm's documentation](https://code.accel-brain.com/Deep-Learning-by-means-of-Design-Pattern/README.html).\n\n## Tutorial: Maze Solving and the pursuit-evasion game by Deep Q-Network (Jupyter notebook)\n\n[demo/search_maze_by_deep_q_network.ipynb](https://github.com/chimera0/accel-brain-code/blob/master/Reinforcement-Learning/demo/search_maze_by_deep_q_network.ipynb) is a Jupyter notebook which demonstrates a maze solving algorithm based on Deep Q-Network, rigidly coupled with Deep Convolutional Neural Networks(Deep CNNs). The function of the Deep Learning is **generalisation** and CNNs is-a **function approximator**. In this notebook, several functional equivalents such as CNN and LSTM can be compared from a functional point of view.\n\n<div align=\"center\">\n    <p><a href=\"https://github.com/chimera0/accel-brain-code/blob/master/Reinforcement-Learning/demo/search_maze_by_deep_q_network.ipynb\" target=\"_blank\"><img src=\"https://storage.googleapis.com/accel-brain-code/Reinforcement-Learning/img/DQN_single_agent_goal_compressed-loop.gif\" /></a></p>\n    <p>Deep Reinforcement Learning to solve the Maze.</p>\n</div>\n\n* Black squares represent a wall.\n* Light gray squares represent passages.\n* A dark gray square represents a start point.\n* A white squeare represents a goal point.\n\n### The pursuit-evasion game\n\nExpanding the search problem of the maze makes it possible to describe the pursuit-evasion game that is a family of problems in mathematics and computer science in which one group attempts to track down members of another group in an environment.\n\nThis problem can be re-described as the multi-agent control problem, which involves decomposing the global system state into an image like representation with information encoded in separate channels. This reformulation allows us to use convolutional neural networks to efficiently extract important features from the image-like state.\n\nEgorov, M. (2016) and Gupta, J. K. et al.(2017) proposed new algorithm which uses the image-like state representation of the multi-agent system as an input, and outputs the estimated Q-values for the agent in question. They described a number of implementation contributions that make training efficient and allow agents to learn directly from the behavior of other agents in the system.\n\n<img src=\"https://storage.googleapis.com/accel-brain-code/Reinforcement-Learning/img/multi_agent_q_learning_and_channels_big.png\" />\n<p><cite><a href=\"https://pdfs.semanticscholar.org/dd98/9d94613f439c05725bad958929357e365084.pdf\" target=\"_blank\">Egorov, M. (2016). Multi-agent deep reinforcement learning., p4.</a></cite></p>\n\nAn important aspect of this data modeling is that by expressing each state of the multi-agent as channels, it is possible to enclose states of all the agents as **a target of convolution operation all at once**. By the affine transformation executed by the neural network, combinations of an enormous number of states of multi-agent can be computed in principle with an allowable range of memory.\n\n<img src=\"https://storage.googleapis.com/accel-brain-code/Reinforcement-Learning/img/multi_agent_q_learning_and_cnn_model_big.png\" />\n<p><cite><a href=\"https://pdfs.semanticscholar.org/dd98/9d94613f439c05725bad958929357e365084.pdf\" target=\"_blank\">Egorov, M. (2016). Multi-agent deep reinforcement learning., p4.</a></cite></p>\n\n[demo/search_maze_by_deep_q_network.ipynb](https://github.com/chimera0/accel-brain-code/blob/master/Reinforcement-Learning/demo/search_maze_by_deep_q_network.ipynb) also prototypes Multi Agent Deep Q-Network to solve the pursuit-evasion game based on the image-like state representation of the multi-agent.\n\n<div align=\"center\">\n    <table style=\"border: none;\">\n        <tr>\n            <td width=\"45%\" align=\"center\">\n            <p><a href=\"https://github.com/chimera0/accel-brain-code/blob/master/Reinforcement-Learning/demo/search_maze_by_deep_q_network.ipynb\" target=\"_blank\"><img src=\"https://storage.googleapis.com/accel-brain-code/Reinforcement-Learning/img/DQN_multi_agent_demo_crash_enemy_2-compressed.gif\" /></a></p>\n            <p>Multi-agent Deep Reinforcement Learning to solve the pursuit-evasion game. The player is caught by enemies.</p>\n            </td>\n            <td width=\"45%\" align=\"center\">\n            <p><a href=\"https://github.com/chimera0/accel-brain-code/blob/master/Reinforcement-Learning/demo/search_maze_by_deep_q_network.ipynb\" target=\"_blank\"><img src=\"https://storage.googleapis.com/accel-brain-code/Reinforcement-Learning/img/DQN_multi_agent_demo_goal_enemy_2-compressed.gif\" /></a></p>\n            <p>\n            <p>Multi-agent Deep Reinforcement Learning to solve the pursuit-evasion game. The player reaches the goal.</p>\n            </td>\n        </tr>\n    </table>\n</div>\n\n* Black squares represent a wall.\n* Light gray squares represent passages.\n* A dark gray square represents a start point.\n* Moving dark gray squares represent enemies.\n* A white squeare represents a goal point.\n\n## Tutorial: Complexity of Hyperparameters, or how can be hyperparameters decided?\n\nThere are many hyperparameters that we have to set before the actual searching and learning process begins. Each parameter should be decided in relation to Deep/Reinforcement Learning theory and it cause side effects in training model. Because of this complexity of hyperparameters, so-called the hyperparameter tuning must become a burden of Data scientists and R & D engineers from the perspective of not only a theoretical point of view but also implementation level.\n\n### Combinatorial optimization problem and Simulated Annealing.\n\nThis issue can be considered as **Combinatorial optimization problem** which is an optimization problem, where an optimal solution has to be identified from a finite set of solutions. The solutions are normally discrete or can be converted into discrete. This is an important topic studied in operations research such as software engineering, artificial intelligence(AI), and machine learning. For instance, travelling sales man problem is one of the popular combinatorial optimization problem.\n\nIn this problem setting, this library provides an Annealing Model to search optimal combination of hyperparameters. For instance, **Simulated Annealing** is a probabilistic single solution based search method inspired by the annealing process in metallurgy. Annealing is a physical process referred to as tempering certain alloys of metal, glass, or crystal by heating above its melting point, holding its temperature, and then cooling it very slowly until it solidifies into a perfect crystalline structure. The simulation of this process is known as simulated annealing.\n\n### Functional comparison.\n\n[demo/annealing_hand_written_digits.ipynb](https://github.com/chimera0/accel-brain-code/blob/master/Reinforcement-Learning/demo/annealing_hand_written_digits.ipynb) is a Jupyter notebook which demonstrates a very simple classification problem: Recognizing hand-written digits, in which the aim is to assign each input vector to one of a finite number of discrete categories, to learn observed data points from already labeled data how to predict the class of unlabeled data. In the usecase of hand-written digits dataset, the task is to predict, given an image, which digit it represents.\n\nThere are many structural extensions and functional equivalents of **Simulated Annealing**. For instance, **Adaptive Simulated Annealing**, also known as the very fast simulated reannealing, is a very efficient version of simulated annealing. And **Quantum Monte Carlo**, which is generally known a stochastic method to solve the Schr\u00f6dinger equation, is one of the earliest types of solution in order to simulate the **Quantum Annealing** in classical computer. In summary, one of the function of this algorithm is to solve the ground state search problem which is known as logically equivalent to combinatorial optimization problem. Then this Jupyter notebook demonstrates functional comparison in the same problem setting.\n\n## Demonstration: Epsilon Greedy Q-Learning and Simulated Annealing.\n\nImport python modules.\n\n```python\nfrom pyqlearning.annealingmodel.costfunctionable.greedy_q_learning_cost import GreedyQLearningCost\nfrom pyqlearning.annealingmodel.simulated_annealing import SimulatedAnnealing\n# See demo/demo_maze_greedy_q_learning.py\nfrom demo.demo_maze_greedy_q_learning import MazeGreedyQLearning\n```\n\nThe class `GreedyQLearningCost` is implemented the interface `CostFunctionable` to be called by `AnnealingModel`. This cost function is defined by\n\n<div><img src=\"https://storage.googleapis.com/accel-brain-code/Reinforcement-Learning/img/latex/q_cost.gif\"></div>\n\nwhere <img src=\"https://storage.googleapis.com/accel-brain-code/Reinforcement-Learning/img/latex/n_search.gif\"> is the number of searching(learning) and L is a limit of <img src=\"https://storage.googleapis.com/accel-brain-code/Reinforcement-Learning/img/latex/n_search.gif\">. \n\nLike Monte Carlo method, let us draw random samples from a normal (Gaussian) or unifrom distribution.\n\n```python\n# Epsilon-Greedy rate in Epsilon-Greedy-Q-Learning.\ngreedy_rate_arr = np.random.normal(loc=0.5, scale=0.1, size=100)\n# Alpha value in Q-Learning.\nalpha_value_arr = np.random.normal(loc=0.5, scale=0.1, size=100)\n# Gamma value in Q-Learning.\ngamma_value_arr = np.random.normal(loc=0.5, scale=0.1, size=100)\n# Limit of the number of Learning(searching).\nlimit_arr = np.random.normal(loc=10, scale=1, size=100)\n\nvar_arr = np.c_[greedy_rate_arr, alpha_value_arr, gamma_value_arr, limit_arr]\n```\n\nInstantiate and initialize `MazeGreedyQLearning` which is-a `GreedyQLearning`.\n\n```python\n# Instantiation.\ngreedy_q_learning = MazeGreedyQLearning()\ngreedy_q_learning.initialize(hoge=fuga)\n```\n\nInstantiate `GreedyQLearningCost` which is implemented the interface `CostFunctionable` to be called by `AnnealingModel`.\n\n```python\ninit_state_key = (\"Some\", \"data\")\ncost_functionable = GreedyQLearningCost(\n    greedy_q_learning, \n    init_state_key=init_state_key\n)\n```\n\nInstantiate `SimulatedAnnealing` which is-a `AnnealingModel`.\n\n```python\nannealing_model = SimulatedAnnealing(\n    # is-a `CostFunctionable`.\n    cost_functionable=cost_functionable,\n    # The number of annealing cycles.\n    cycles_num=5,\n    # The number of trials of searching per a cycle.\n    trials_per_cycle=3\n)\n```\n\nFit the `var_arr` to `annealing_model`.\n\n```python\nannealing_model.var_arr = var_arr\n```\n\nStart annealing.\n\n```python\nannealing_model.annealing()\n```\n\nTo extract result of searching, call the property `predicted_log_list` which is list of tuple: `(Cost, Delta energy, Mean of delta energy, probability in Boltzmann distribution, accept flag)`. And refer the property `x` which is `np.ndarray` that has combination of hyperparameters. The optimal combination can be extracted as follow.\n\n```python\n# Extract list: [(Cost, Delta energy, Mean of delta energy, probability, accept)]\npredicted_log_arr = annealing_model.predicted_log_arr\n\n# [greedy rate, Alpha value, Gamma value, Limit of the number of searching.]\nmin_e_v_arr = annealing_model.var_arr[np.argmin(predicted_log_arr[:, 2])]\n```\n\n### Contingency of definitions\n\nThe above definition of cost function is possible option: not necessity but contingent from the point of view of modal logic. You should questions the necessity of definition and re-define, for designing the implementation of interface `CostFunctionable`, in relation to *your* problem settings.\n\n## Demonstration: Epsilon Greedy Q-Learning and Adaptive Simulated Annealing.\n\nThere are various Simulated Annealing such as Boltzmann Annealing, Adaptive Simulated Annealing(SAS), and Quantum Simulated Annealing. On the premise of Combinatorial optimization problem, these annealing methods can be considered as functionally equivalent. The *Commonality/Variability* in these methods are able to keep responsibility of objects all straight as the class diagram below indicates.\n\n<img src=\"https://storage.googleapis.com/accel-brain-code/Reinforcement-Learning/img/class_diagram_annealing_model.png\" />\n\n### Code sample.\n\n`AdaptiveSimulatedAnnealing` is-a subclass of `SimulatedAnnealing`. The *variability* is aggregated in the method `AdaptiveSimulatedAnnealing.adaptive_set()` which must be called before executing `AdaptiveSimulatedAnnealing.annealing()`.\n\n```python\nfrom pyqlearning.annealingmodel.simulatedannealing.adaptive_simulated_annealing import AdaptiveSimulatedAnnealing\n\nannealing_model = AdaptiveSimulatedAnnealing(\n    cost_functionable=cost_functionable,\n    cycles_num=33,\n    trials_per_cycle=3,\n    accepted_sol_num=0.0,\n    init_prob=0.7,\n    final_prob=0.001,\n    start_pos=0,\n    move_range=3\n)\n\n# Variability part.\nannealing_model.adaptive_set(\n    # How often will this model reanneals there per cycles.\n    reannealing_per=50,\n    # Thermostat.\n    thermostat=0.,\n    # The minimum temperature.\n    t_min=0.001,\n    # The default temperature.\n    t_default=1.0\n)\nannealing_model.var_arr = params_arr\nannealing_model.annealing()\n```\n\nTo extract result of searching, call the property like the case of using `SimulatedAnnealing`. If you want to know how to visualize the searching process, see my Jupyter notebook: [demo/annealing_hand_written_digits.ipynb](https://github.com/chimera0/accel-brain-code/blob/master/Reinforcement-Learning/demo/annealing_hand_written_digits.ipynb).\n\n## Demonstration: Epsilon Greedy Q-Learning and Quantum Monte Carlo.\n\nGenerally, Quantum Monte Carlo is a stochastic method to solve the Schr\u00f6dinger equation. This algorithm is one of the earliest types of solution in order to simulate the Quantum Annealing in classical computer. In summary, one of the function of this algorithm is to solve the ground state search problem which is known as logically equivalent to combinatorial optimization problem.\n\nAccording to theory of spin glasses, the ground state search problem can be described as minimization energy determined by the hamiltonian <img src=\"https://storage.googleapis.com/accel-brain-code/Reinforcement-Learning/img/latex/h_0.png\" /> as follow\n\n<img src=\"https://storage.googleapis.com/accel-brain-code/Reinforcement-Learning/img/latex/hamiltonian_in_ising_model.png\" />\n\nwhere <img src=\"https://storage.googleapis.com/accel-brain-code/Reinforcement-Learning/img/latex/pauli_z_i.png\" /> refers to the Pauli spin matrix below for the spin-half particle at lattice point <img src=\"https://storage.googleapis.com/accel-brain-code/Reinforcement-Learning/img/latex/i.gif\" />. In spin glasses, random value is assigned to <img src=\"https://storage.googleapis.com/accel-brain-code/Reinforcement-Learning/img/latex/j_i_j.png\" />. The number of combinations is enormous. If this value is <img src=\"https://storage.googleapis.com/accel-brain-code/Reinforcement-Learning/img/latex/n.png\" />, a trial frequency is <img src=\"https://storage.googleapis.com/accel-brain-code/Reinforcement-Learning/img/latex/2_n.png\" />. This computation complexity makes it impossible to solve the ground state search problem. Then, in theory of spin glasses, the standard hamiltonian is re-described in expanded form.\n\n<img src=\"https://storage.googleapis.com/accel-brain-code/Reinforcement-Learning/img/latex/hamiltonian_in_t_ising_model.png\" />\n\nwhere <img src=\"https://storage.googleapis.com/accel-brain-code/Reinforcement-Learning/img/latex/pauli_x_i.png\" /> also refers to the Pauli spin matrix and <img src=\"https://storage.googleapis.com/accel-brain-code/Reinforcement-Learning/img/latex/gamma.png\" /> is so-called annealing coefficient, which is hyperparameter that contains vely high value. Ising model to follow this Hamiltonian is known as the Transverse Ising model.\n\nIn relation to this system, thermal equilibrium amount of a physical quantity <img src=\"https://storage.googleapis.com/accel-brain-code/Reinforcement-Learning/img/latex/q.png?1\" /> is as follow.\n\n<img src=\"https://storage.googleapis.com/accel-brain-code/Reinforcement-Learning/img/latex/langle_q_rangle.png\" />\n\nIf <img src=\"https://storage.googleapis.com/accel-brain-code/Reinforcement-Learning/img/latex/h.png\" /> is a diagonal matrix, then also <img src=\"https://storage.googleapis.com/accel-brain-code/Reinforcement-Learning/img/latex/e_beta_h.png\" /> is diagonal matrix. If diagonal element in <img src=\"https://storage.googleapis.com/accel-brain-code/Reinforcement-Learning/img/latex/h.png\" /> is <img src=\"https://storage.googleapis.com/accel-brain-code/Reinforcement-Learning/img/latex/e_i.png\" />, Each diagonal element is <img src=\"https://storage.googleapis.com/accel-brain-code/Reinforcement-Learning/img/latex/e_beta_h_ij_e_i.png\" />. However if <img src=\"https://storage.googleapis.com/accel-brain-code/Reinforcement-Learning/img/latex/h.png\" /> has off-diagonal elements, It is known that <img src=\"https://storage.googleapis.com/accel-brain-code/Reinforcement-Learning/img/latex/e_beta_h_ij_e_i_neq.png\" /> since for any of the exponent <img src=\"https://storage.googleapis.com/accel-brain-code/Reinforcement-Learning/img/latex/i.gif\" /> we must exponentiate the matrix as follow.\n\n<img src=\"https://storage.googleapis.com/accel-brain-code/Reinforcement-Learning/img/latex/e_matrix_infty.png\" />\n\nTherefore, a path integration based on Trotter-Suzuki decomposition has been introduced in Quantum Monte Carlo Method. This path integration makes it possible to obtain the partition function <img src=\"https://storage.googleapis.com/accel-brain-code/Reinforcement-Learning/img/latex/z.png\" />.\n\n<img src=\"https://storage.googleapis.com/accel-brain-code/Reinforcement-Learning/img/latex/z_in_t_ising_model.png\" />\n\nwhere if <img src=\"https://storage.googleapis.com/accel-brain-code/Reinforcement-Learning/img/latex/m.png\" /> is large enough, relational expression below is established.\n\n<img src=\"https://storage.googleapis.com/accel-brain-code/Reinforcement-Learning/img/latex/exp_left_frac_1_m_beta_h_right.png\" /></td></tr>\n\nThen the partition function <img src=\"https://storage.googleapis.com/accel-brain-code/Reinforcement-Learning/img/latex/z.png\" /> can be re-descibed as follow.\n\n<img src=\"https://storage.googleapis.com/accel-brain-code/Reinforcement-Learning/img/latex/z_in_t_ising_model_re_described.png\" />\n\nwhere <img src=\"https://storage.googleapis.com/accel-brain-code/Reinforcement-Learning/img/latex/mid_sigma_k_rangle.png\" /> is <img src=\"https://storage.googleapis.com/accel-brain-code/Reinforcement-Learning/img/latex/l.png\" /> topological products (product spaces). Because <img src=\"https://storage.googleapis.com/accel-brain-code/Reinforcement-Learning/img/latex/h_0.png\" /> is the diagonal matrix, <img src=\"https://storage.googleapis.com/accel-brain-code/Reinforcement-Learning/img/latex/tilde_sigma_j_z_mid_sigma.png\" />.\n\nTherefore, \n\n<img src=\"https://storage.googleapis.com/accel-brain-code/Reinforcement-Learning/img/latex/langle_sigma_k_mid.png\" />\n\nThe partition function <img src=\"https://storage.googleapis.com/accel-brain-code/Reinforcement-Learning/img/latex/z.png\" /> can be re-descibed as follow.\n\n<img src=\"https://storage.googleapis.com/accel-brain-code/Reinforcement-Learning/img/latex/z_in_t_ising_model_re_described_last.png\" />\n\nwhere <img src=\"https://storage.googleapis.com/accel-brain-code/Reinforcement-Learning/img/latex/m.png\" /> is the number of trotter.\n\nThis relational expression indicates that the quantum - mechanical Hamiltonian in <img src=\"https://storage.googleapis.com/accel-brain-code/Reinforcement-Learning/img/latex/d.png\" /> dimentional Tranverse Ising model is functional equivalence to classical Hamiltonian in <img src=\"https://storage.googleapis.com/accel-brain-code/Reinforcement-Learning/img/latex/d_1.png\" /> dimentional Ising model, which means that the state of the quantum - mechanical system can be approximate by the state of classical system.\n\n### Code sample.\n\n```python\nfrom pyqlearning.annealingmodel.quantum_monte_carlo import QuantumMonteCarlo\nfrom pyqlearning.annealingmodel.distancecomputable.cost_as_distance import CostAsDistance\n\n# User defined function which is-a `CostFuntionable`.\ncost_functionable = YourCostFunctions()\n\n# Compute cost as distance for `QuantumMonteCarlo`.\ndistance_computable = CostAsDistance(params_arr, cost_functionable)\n\n# Init.\nannealing_model = QuantumMonteCarlo(\n    distance_computable=distance_computable,\n\n    # The number of annealing cycles.\n    cycles_num=100,\n\n    # Inverse temperature (Beta).\n    inverse_temperature_beta=0.1,\n\n    # Gamma. (so-called annealing coefficient.) \n    gammma=1.0,\n\n    # Attenuation rate for simulated time.\n    fractional_reduction=0.99,\n\n    # The dimention of Trotter.\n    trotter_dimention=10,\n\n    # The number of Monte Carlo steps.\n    mc_step=100,\n\n    # The number of parameters which can be optimized.\n    point_num=100,\n\n    # Default `np.ndarray` of 2-D spin glass in Ising model.\n    spin_arr=None,\n\n    # Tolerance for the optimization.\n    # When the \u0394E is not improving by at least `tolerance_diff_e`\n    # for two consecutive iterations, annealing will stops.\n    tolerance_diff_e=0.01\n)\n\n# Execute annealing.\nannealing_model.annealing()\n```\n\nTo extract result of searching, call the property like the case of using `SimulatedAnnealing`. If you want to know how to visualize the searching process, see my Jupyter notebook: [demo/annealing_hand_written_digits.ipynb](https://github.com/chimera0/accel-brain-code/blob/master/Reinforcement-Learning/demo/annealing_hand_written_digits.ipynb).\n\n## References\n\n### Q-Learning models.\n\n- Agrawal, S., & Goyal, N. (2011). Analysis of Thompson sampling for the multi-armed bandit problem. arXiv preprint arXiv:1111.1797.\n- Bubeck, S., & Cesa-Bianchi, N. (2012). Regret analysis of stochastic and nonstochastic multi-armed bandit problems. arXiv preprint arXiv:1204.5721.\n- Chapelle, O., & Li, L. (2011). An empirical evaluation of thompson sampling. In Advances in neural information processing systems (pp. 2249-2257).\n- Du, K. L., & Swamy, M. N. S. (2016). Search and optimization by metaheuristics (p. 434). New York City: Springer.\n- Kaufmann, E., Cappe, O., & Garivier, A. (2012). On Bayesian upper confidence bounds for bandit problems. In International Conference on Artificial Intelligence and Statistics (pp. 592-600).\n- Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., & Riedmiller, M. (2013). Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602.\n- Richard Sutton and Andrew Barto (1998). Reinforcement Learning. MIT Press.\n- Watkins, C. J. C. H. (1989). Learning from delayed rewards (Doctoral dissertation, University of Cambridge).\n- Watkins, C. J., & Dayan, P. (1992). Q-learning. Machine learning, 8(3-4), 279-292.\n- White, J. (2012). Bandit algorithms for website optimization. \u201d O\u2019Reilly Media, Inc.\u201d.\n\n### Deep Q-Network models.\n\n- Cho, K., Van Merri\u00ebnboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., & Bengio, Y. (2014). Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078.\n- <a href=\"https://pdfs.semanticscholar.org/dd98/9d94613f439c05725bad958929357e365084.pdf\" target=\"_blank\">Egorov, M. (2016). Multi-agent deep reinforcement learning.</a>\n- Gupta, J. K., Egorov, M., & Kochenderfer, M. (2017, May). Cooperative multi-agent control using deep reinforcement learning. In International Conference on Autonomous Agents and Multiagent Systems (pp. 66-83). Springer, Cham.\n- Malhotra, P., Ramakrishnan, A., Anand, G., Vig, L., Agarwal, P., & Shroff, G. (2016). LSTM-based encoder-decoder for multi-sensor anomaly detection. arXiv preprint arXiv:1607.00148.\n- Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., & Riedmiller, M. (2013). Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602.\n- Sainath, T. N., Vinyals, O., Senior, A., & Sak, H. (2015, April). Convolutional, long short-term memory, fully connected deep neural networks. In Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on (pp. 4580-4584). IEEE.\n- Xingjian, S. H. I., Chen, Z., Wang, H., Yeung, D. Y., Wong, W. K., & Woo, W. C. (2015). Convolutional LSTM network: A machine learning approach for precipitation nowcasting. In Advances in neural information processing systems (pp. 802-810).\n- Zaremba, W., Sutskever, I., & Vinyals, O. (2014). Recurrent neural network regularization. arXiv preprint arXiv:1409.2329.\n\n### Annealing models.\n\n- Bektas, T. (2006). The multiple traveling salesman problem: an overview of formulations and solution procedures. Omega, 34(3), 209-219.\n- Bertsimas, D., & Tsitsiklis, J. (1993). Simulated annealing. Statistical science, 8(1), 10-15.\n- Das, A., & Chakrabarti, B. K. (Eds.). (2005). Quantum annealing and related optimization methods (Vol. 679). Springer Science & Business Media.\n- Du, K. L., & Swamy, M. N. S. (2016). Search and optimization by metaheuristics. New York City: Springer.\n- Edwards, S. F., & Anderson, P. W. (1975). Theory of spin glasses. Journal of Physics F: Metal Physics, 5(5), 965.\n- Facchi, P., & Pascazio, S. (2008). Quantum Zeno dynamics: mathematical and physical aspects. Journal of Physics A: Mathematical and Theoretical, 41(49), 493001.\n- Heim, B., R\u00f8nnow, T. F., Isakov, S. V., & Troyer, M. (2015). Quantum versus classical annealing of Ising spin glasses. Science, 348(6231), 215-217.\n- Heisenberg, W. (1925) \u00dcber quantentheoretische Umdeutung kinematischer und mechanischer Beziehungen. Z. Phys. 33, pp.879\u2014893.\n- Heisenberg, W. (1927). \u00dcber den anschaulichen Inhalt der quantentheoretischen Kinematik und Mechanik. Zeitschrift fur Physik, 43, 172-198.\n- Heisenberg, W. (1984). The development of quantum mechanics. In Scientific Review Papers, Talks, and Books -Wissenschaftliche \u00dcbersichtsartikel, Vortr\u00e4ge und B\u00fccher (pp. 226-237). Springer Berlin Heidelberg.\nHilgevoord, Jan and Uffink, Jos, \"The Uncertainty Principle\", The Stanford Encyclopedia of Philosophy (Winter 2016 Edition), Edward N. Zalta (ed.), URL = \uff1chttps://plato.stanford.edu/archives/win2016/entries/qt-uncertainty/\uff1e.\n- Jarzynski, C. (1997). Nonequilibrium equality for free energy differences. Physical Review Letters, 78(14), 2690.\n- Messiah, A. (1966). Quantum mechanics. 2 (1966). North-Holland Publishing Company.\n- Mezard, M., & Montanari, A. (2009). Information, physics, and computation. Oxford University Press.\n- Nallusamy, R., Duraiswamy, K., Dhanalaksmi, R., & Parthiban, P. (2009). Optimization of non-linear multiple traveling salesman problem using k-means clustering, shrink wrap algorithm and meta-heuristics. International Journal of Nonlinear Science, 8(4), 480-487.\n- Schr\u00f6dinger, E. (1926). Quantisierung als eigenwertproblem. Annalen der physik, 385(13), S.437-490.\n- Somma, R. D., Batista, C. D., & Ortiz, G. (2007). Quantum approach to classical statistical mechanics. Physical review letters, 99(3), 030603.\n- \u9234\u6728\u6b63. (2008). \u300c\u7d44\u307f\u5408\u308f\u305b\u6700\u9069\u5316\u554f\u984c\u3068\u91cf\u5b50\u30a2\u30cb\u30fc\u30ea\u30f3\u30b0: \u91cf\u5b50\u65ad\u71b1\u767a\u5c55\u306e\u7406\u8ad6\u3068\u6027\u80fd\u8a55\u4fa1\u300d.,\u300e\u7269\u6027\u7814\u7a76\u300f, 90(4): pp598-676. \u53c2\u7167\u7b87\u6240\u306fpp619-624.\n- \u897f\u68ee\u79c0\u7a14\u3001\u5927\u95a2\u771f\u4e4b(2018) \u300e\u91cf\u5b50\u30a2\u30cb\u30fc\u30ea\u30f3\u30b0\u306e\u57fa\u790e\u300f\u9808\u85e4 \u5f70\u4e09\u3001\u5ca1 \u771f \u76e3\u4fee\u3001\u5171\u7acb\u51fa\u7248\u3001\u53c2\u7167\u7b87\u6240\u306fpp9-46.\n\n### More detail demos\n\n- [Web\u30af\u30ed\u30fc\u30e9\u578b\u4eba\u5de5\u77e5\u80fd\uff1a\u30ad\u30e1\u30e9\u30fb\u30cd\u30c3\u30c8\u30ef\u30fc\u30af\u306e\u4ed5\u69d8](https://media.accel-brain.com/_chimera-network-is-web-crawling-ai/) (Japanese)\n    - 20001 bots are running as 20001 web-crawlers and 20001 web-scrapers.\n- [\u30ed\u30dc\u30a2\u30c9\u30d0\u30a4\u30b6\u30fc\u578b\u4eba\u5de5\u77e5\u80fd\uff1a\u30ad\u30e1\u30e9\u30fb\u30cd\u30c3\u30c8\u30ef\u30fc\u30af\u306e\u4ed5\u69d8](https://media.accel-brain.com/_chimera-network-is-robo-adviser/) (Japanese)\n   - The 20001 bots can also simulate the portfolio optimization of securities such as stocks and circulation currency such as cryptocurrencies.\n\n### Related PoC\n\n- [\u91cf\u5b50\u529b\u5b66\u3001\u7d71\u8a08\u529b\u5b66\u3001\u71b1\u529b\u5b66\u306b\u304a\u3051\u308b\u5929\u624d\u7269\u7406\u5b66\u8005\u305f\u3061\u306e\u795e\u5b66\u7684\u306a\u5f62\u8c61\u306b\u3064\u3044\u3066](https://accel-brain.com/das-theologische-bild-genialer-physiker-in-der-quantenmechanik-und-der-statistischen-mechanik-und-thermodynamik/) (Japanese)\n    - [\u71b1\u529b\u5b66\u306e\u524d\u53f2\u3001\u30de\u30af\u30b9\u30a6\u30a7\u30eb\uff1d\u30dc\u30eb\u30c4\u30de\u30f3\u5206\u5e03\u306b\u304a\u3051\u308b\u30a8\u30f3\u30c8\u30ed\u30d4\u30fc\u306e\u6b74\u53f2\u7684\u610f\u5473\u8ad6](https://accel-brain.com/das-theologische-bild-genialer-physiker-in-der-quantenmechanik-und-der-statistischen-mechanik-und-thermodynamik/historische-semantik-der-entropie-in-der-maxwell-boltzmann-verteilung/)\n    - [\u30e1\u30c7\u30a3\u30a2\u3068\u3057\u3066\u306e\u7d71\u8a08\u529b\u5b66\u3068\u5f62\u5f0f\u3068\u3057\u3066\u306e\u30a2\u30f3\u30b5\u30f3\u30d6\u30eb\u3001\u305d\u306e\u30ae\u30d6\u30b9\u7684\u985e\u63a8](https://accel-brain.com/das-theologische-bild-genialer-physiker-in-der-quantenmechanik-und-der-statistischen-mechanik-und-thermodynamik/statistische-mechanik-als-medium-und-ensemble-als-form/)\n    - [\u300c\u30de\u30af\u30b9\u30a6\u30a7\u30eb\u306e\u60aa\u9b54\u300d\u3001\u529b\u5b66\u306e\u57fa\u790e\u6cd5\u5247\u3068\u3057\u3066\u306e\u795e](https://accel-brain.com/das-theologische-bild-genialer-physiker-in-der-quantenmechanik-und-der-statistischen-mechanik-und-thermodynamik/maxwell-damon/)\n- [Web\u30af\u30ed\u30fc\u30e9\u578b\u4eba\u5de5\u77e5\u80fd\u306b\u3088\u308b\u30d1\u30e9\u30c9\u30c3\u30af\u30b9\u63a2\u7d22\u66b4\u9732\u6a5f\u80fd\u306e\u793e\u4f1a\u9032\u5316\u8ad6](https://accel-brain.com/social-evolution-of-exploration-and-exposure-of-paradox-by-web-crawling-type-artificial-intelligence/) (Japanese)\n    - [World-Wide Web\u306e\u793e\u4f1a\u69cb\u9020\u3068Web\u30af\u30ed\u30fc\u30e9\u578b\u4eba\u5de5\u77e5\u80fd\u306e\u610f\u5473\u8ad6](https://accel-brain.com/social-evolution-of-exploration-and-exposure-of-paradox-by-web-crawling-type-artificial-intelligence/sozialstruktur-des-world-wide-web-und-semantik-der-kunstlichen-intelligenz-des-web-crawlers/)\n    - [\u610f\u5473\u8ad6\u306e\u610f\u5473\u8ad6\u3001\u89b3\u5bdf\u306e\u89b3\u5bdf](https://accel-brain.com/social-evolution-of-exploration-and-exposure-of-paradox-by-web-crawling-type-artificial-intelligence/semantik-der-semantik-und-beobachtung-der-beobachtung/)\n- [\u6df1\u5c64\u5f37\u5316\u5b66\u7fd2\u306e\u30d9\u30a4\u30ba\u4e3b\u7fa9\u7684\u306a\u60c5\u5831\u63a2\u7d22\u306b\u99c6\u52d5\u3055\u308c\u305f\u81ea\u7136\u8a00\u8a9e\u51e6\u7406\u306e\u610f\u5473\u8ad6](https://accel-brain.com/semantics-of-natural-language-processing-driven-by-bayesian-information-search-by-deep-reinforcement-learning/) (Japanese)\n    - [\u30d0\u30f3\u30c7\u30a3\u30c3\u30c8\u30a2\u30eb\u30b4\u30ea\u30ba\u30e0\u306e\u6a5f\u80fd\u7684\u62e1\u5f35\u3068\u3057\u3066\u306e\u5f37\u5316\u5b66\u7fd2\u30a2\u30eb\u30b4\u30ea\u30ba\u30e0](https://accel-brain.com/semantics-of-natural-language-processing-driven-by-bayesian-information-search-by-deep-reinforcement-learning/verstarkungslernalgorithmus-als-funktionale-erweiterung-des-banditenalgorithmus/)\n    - [\u6df1\u5c64\u5f37\u5316\u5b66\u7fd2\u306e\u7d71\u8a08\u7684\u6a5f\u68b0\u5b66\u7fd2\u3001\u5f37\u5316\u5b66\u7fd2\u306e\u95a2\u6570\u8fd1\u4f3c\u5668\u3068\u3057\u3066\u306e\u6df1\u5c64\u5b66\u7fd2](https://accel-brain.com/semantics-of-natural-language-processing-driven-by-bayesian-information-search-by-deep-reinforcement-learning/deep-learning-als-funktionsapproximator-fur-verstarktes-lernen/)\n- [\u30cf\u30c3\u30ab\u30fc\u502b\u7406\u306b\u6e96\u62e0\u3057\u305f\u4eba\u5de5\u77e5\u80fd\u306e\u30a2\u30fc\u30ad\u30c6\u30af\u30c1\u30e3\u8a2d\u8a08](https://accel-brain.com/architectural-design-of-artificial-intelligence-conforming-to-hacker-ethics/) (Japanese)\n    - [\u30a2\u30fc\u30ad\u30c6\u30af\u30c1\u30e3\u4e2d\u5fc3\u8a2d\u8a08\u306e\u793e\u4f1a\u69cb\u9020\u3068\u30a2\u30fc\u30ad\u30c6\u30af\u30c1\u30e3\u306e\u610f\u5473\u8ad6](https://accel-brain.com/architectural-design-of-artificial-intelligence-conforming-to-hacker-ethics/sozialstruktur-des-architekturzentrum-designs-und-architektur-der-semantik/)\n- [\u300c\u4eba\u5de5\u306e\u7406\u60f3\u300d\u3092\u80cc\u666f\u3068\u3057\u305f\u300c\u4e07\u7269\u7167\u5fdc\u300d\u306e\u30c7\u30fc\u30bf\u30e2\u30c7\u30ea\u30f3\u30b0](https://accel-brain.com/data-modeling-von-korrespondenz-in-artificial-paradise/) (Japanese)\n    - [\u30ae\u30e3\u30f3\u30d6\u30e9\u30fc\u306e\u6a5f\u80fd\u7684\u7b49\u4fa1\u7269\u3068\u3057\u3066\u306e\u5f37\u5316\u5b66\u7fd2\u30a8\u30fc\u30b8\u30a7\u30f3\u30c8\u3001\u6295\u8cc7\u306b\u304a\u3051\u308b\u51b7\u9759\u6c88\u7740\u306a\u7cbe\u795e\u306e\u73fe\u5728\u6027](https://accel-brain.com/data-modeling-von-korrespondenz-in-artificial-paradise/agent-in-reignforcement-lernen-als-funktionelle-aquivalente-von-spielern/)\n\n## Author\n\n- chimera0(RUM)\n\n## Author URI\n\n- http://accel-brain.com/\n\n## License\n\n- GNU General Public License v2.0",
        "description_content_type": "text/markdown",
        "docs_url": null,
        "download_url": "",
        "downloads": {
            "last_day": -1,
            "last_month": -1,
            "last_week": -1
        },
        "home_page": "https://github.com/chimera0/accel-brain-code/tree/master/Reinforcement-Learning",
        "keywords": "Q-Learning Deep Q-Network DQN Reinforcement Learning Boltzmann Multi-agent LSTM CNN Convolution",
        "license": "GPL2",
        "maintainer": "",
        "maintainer_email": "",
        "name": "pyqlearning",
        "package_url": "https://pypi.org/project/pyqlearning/",
        "platform": "",
        "project_url": "https://pypi.org/project/pyqlearning/",
        "project_urls": {
            "Homepage": "https://github.com/chimera0/accel-brain-code/tree/master/Reinforcement-Learning"
        },
        "release_url": "https://pypi.org/project/pyqlearning/1.2.4/",
        "requires_dist": null,
        "requires_python": "",
        "summary": "pyqlearning is Python library to implement Reinforcement Learning and Deep Reinforcement Learning, especially for Q-Learning, Deep Q-Network, and Multi-agent Deep Q-Network which can be optimized by Annealing models such as Simulated Annealing, Adaptive Simulated Annealing, and Quantum Monte Carlo Method.",
        "version": "1.2.4"
    },
    "last_serial": 5437086,
    "releases": {
        "1.0.1": [
            {
                "comment_text": "",
                "digests": {
                    "md5": "2bed1e3b026defe7fd201fd1d87536fa",
                    "sha256": "a166974bc4d71d4bd6c23e004c9108a401c40a78f46c47cb2529ec7049f1660f"
                },
                "downloads": -1,
                "filename": "pyqlearning-1.0.1-py3-none-any.whl",
                "has_sig": false,
                "md5_digest": "2bed1e3b026defe7fd201fd1d87536fa",
                "packagetype": "bdist_wheel",
                "python_version": "py3",
                "requires_python": null,
                "size": 8703,
                "upload_time": "2017-11-22T05:57:56",
                "url": "https://files.pythonhosted.org/packages/ac/50/ba8eb1a8b58ede21c60ac384511c679b4243cf176ddfac4673d9f1d04699/pyqlearning-1.0.1-py3-none-any.whl"
            },
            {
                "comment_text": "",
                "digests": {
                    "md5": "f20d76179cf458fc78dc7dc67396be07",
                    "sha256": "6897ac2055b40d3f7a397f134259efe21cae86514e7d796cbe604595d33bdf9f"
                },
                "downloads": -1,
                "filename": "pyqlearning-1.0.1.tar.gz",
                "has_sig": false,
                "md5_digest": "f20d76179cf458fc78dc7dc67396be07",
                "packagetype": "sdist",
                "python_version": "source",
                "requires_python": null,
                "size": 9075,
                "upload_time": "2017-11-22T05:57:57",
                "url": "https://files.pythonhosted.org/packages/ef/bd/9ae2568bb3530376b3f4f602fedb4a71ab6cf2a43f752281a6c4172acd3d/pyqlearning-1.0.1.tar.gz"
            }
        ],
        "1.0.2": [
            {
                "comment_text": "",
                "digests": {
                    "md5": "8f128ed356693a0efac97ee20a0ee745",
                    "sha256": "da267ec843128dc708f028e388a6a8860f7118cc9517cecfe80b5f5595d8fc1a"
                },
                "downloads": -1,
                "filename": "pyqlearning-1.0.2-py3-none-any.whl",
                "has_sig": false,
                "md5_digest": "8f128ed356693a0efac97ee20a0ee745",
                "packagetype": "bdist_wheel",
                "python_version": "py3",
                "requires_python": null,
                "size": 8701,
                "upload_time": "2017-11-23T07:13:46",
                "url": "https://files.pythonhosted.org/packages/8f/f9/77ea68cd92bbfe1dba4fea840b9307d09d725d8838460b19b5f71c4e4c00/pyqlearning-1.0.2-py3-none-any.whl"
            },
            {
                "comment_text": "",
                "digests": {
                    "md5": "7daaa232968342b39fc8f0fdaf669c83",
                    "sha256": "8e08af1551ad04d570b38c42922ada637171fd29b4f0789884cc114dde2d2ead"
                },
                "downloads": -1,
                "filename": "pyqlearning-1.0.2.tar.gz",
                "has_sig": false,
                "md5_digest": "7daaa232968342b39fc8f0fdaf669c83",
                "packagetype": "sdist",
                "python_version": "source",
                "requires_python": null,
                "size": 9634,
                "upload_time": "2017-11-23T07:13:48",
                "url": "https://files.pythonhosted.org/packages/6a/76/7c04b935498197b82acd56ac27a54967d4f0529a41dd0b3184036f7722f9/pyqlearning-1.0.2.tar.gz"
            }
        ],
        "1.0.3": [
            {
                "comment_text": "",
                "digests": {
                    "md5": "ec004631dbe7029a8f8bced2b6438c49",
                    "sha256": "f2e7353cdc6ff080a38e1e17140a71a3a0d4a8e88b4cf7dca8ac12027b5af451"
                },
                "downloads": -1,
                "filename": "pyqlearning-1.0.3.linux-x86_64.tar.gz",
                "has_sig": false,
                "md5_digest": "ec004631dbe7029a8f8bced2b6438c49",
                "packagetype": "sdist",
                "python_version": "source",
                "requires_python": null,
                "size": 12038,
                "upload_time": "2018-04-18T14:20:07",
                "url": "https://files.pythonhosted.org/packages/7f/61/e76ad4934fff4cdf178b4599bbd31950015611cde8fb9a6fa8cd5137080f/pyqlearning-1.0.3.linux-x86_64.tar.gz"
            },
            {
                "comment_text": "",
                "digests": {
                    "md5": "6d96e1e8aced2ee75a2465377d903eb5",
                    "sha256": "7122461ba44f37141dbbc3c18cb093a83019296f1a07aaff4444ade60b258767"
                },
                "downloads": -1,
                "filename": "pyqlearning-1.0.3-py3-none-any.whl",
                "has_sig": false,
                "md5_digest": "6d96e1e8aced2ee75a2465377d903eb5",
                "packagetype": "bdist_wheel",
                "python_version": "py3",
                "requires_python": null,
                "size": 8895,
                "upload_time": "2018-04-18T14:20:05",
                "url": "https://files.pythonhosted.org/packages/82/4e/1f18079e6638e1e3ec4bc40abffc213cc0640b499f9dd990fc1ef33f6312/pyqlearning-1.0.3-py3-none-any.whl"
            }
        ],
        "1.0.6": [
            {
                "comment_text": "",
                "digests": {
                    "md5": "a96cfe4cde219492786972aa018fd954",
                    "sha256": "d81e504c994beb3d776d694ce7061f1952c3a34546e38943c1f08f32ee1ca1eb"
                },
                "downloads": -1,
                "filename": "pyqlearning-1.0.6-py3-none-any.whl",
                "has_sig": false,
                "md5_digest": "a96cfe4cde219492786972aa018fd954",
                "packagetype": "bdist_wheel",
                "python_version": "py3",
                "requires_python": null,
                "size": 16597,
                "upload_time": "2018-05-06T02:17:57",
                "url": "https://files.pythonhosted.org/packages/10/e4/b6c8c36ac0dd9dab968dd82a7fa5633ccfcb4bc44f1cb3e9bf6a3568d02b/pyqlearning-1.0.6-py3-none-any.whl"
            },
            {
                "comment_text": "",
                "digests": {
                    "md5": "44d28706d5951bfa5a4d3278d4678abe",
                    "sha256": "8bb380d15a6b716eaa74add30d3bf489c7324181dc7f612a933dd940ff4a1ed2"
                },
                "downloads": -1,
                "filename": "pyqlearning-1.0.6.tar.gz",
                "has_sig": false,
                "md5_digest": "44d28706d5951bfa5a4d3278d4678abe",
                "packagetype": "sdist",
                "python_version": "source",
                "requires_python": null,
                "size": 16386,
                "upload_time": "2018-05-06T02:17:58",
                "url": "https://files.pythonhosted.org/packages/d5/89/5a0aa42c3c5b22a03f94c1ac5a5782df5c0b0862a9e7d2ce5ad982ee7f03/pyqlearning-1.0.6.tar.gz"
            }
        ],
        "1.0.7": [
            {
                "comment_text": "",
                "digests": {
                    "md5": "ec128883816f582ef2bf18736d7ad8ba",
                    "sha256": "5d404777d8ec47f5d5656074576003190706cf847175a507a8a9cfdf6eb59c5d"
                },
                "downloads": -1,
                "filename": "pyqlearning-1.0.7-py3-none-any.whl",
                "has_sig": false,
                "md5_digest": "ec128883816f582ef2bf18736d7ad8ba",
                "packagetype": "bdist_wheel",
                "python_version": "py3",
                "requires_python": null,
                "size": 16632,
                "upload_time": "2018-05-06T02:29:31",
                "url": "https://files.pythonhosted.org/packages/ba/46/a4d720631dc77cee91766814da365fff70bf7d03c10365db73ddbfcea1ed/pyqlearning-1.0.7-py3-none-any.whl"
            },
            {
                "comment_text": "",
                "digests": {
                    "md5": "625c17555fb044159d95aa157a3a0258",
                    "sha256": "24df9968d04ced63bc802629c7aa995cc0fa9e74cd3aeb80cb6c79ca5601c1d7"
                },
                "downloads": -1,
                "filename": "pyqlearning-1.0.7.tar.gz",
                "has_sig": false,
                "md5_digest": "625c17555fb044159d95aa157a3a0258",
                "packagetype": "sdist",
                "python_version": "source",
                "requires_python": null,
                "size": 16405,
                "upload_time": "2018-05-06T02:29:32",
                "url": "https://files.pythonhosted.org/packages/83/5d/3af0c684f838a964503b856849d5f503664e02f74f8a40daf2dcde5544f1/pyqlearning-1.0.7.tar.gz"
            }
        ],
        "1.0.8": [
            {
                "comment_text": "",
                "digests": {
                    "md5": "3f64a70fe033b2deb1c749f283984259",
                    "sha256": "9b5b68baa574ccc3fb0534ea6facf41c9ad0c0c35644d671e2c0d50bc9c332f6"
                },
                "downloads": -1,
                "filename": "pyqlearning-1.0.8-py3-none-any.whl",
                "has_sig": false,
                "md5_digest": "3f64a70fe033b2deb1c749f283984259",
                "packagetype": "bdist_wheel",
                "python_version": "py3",
                "requires_python": null,
                "size": 19621,
                "upload_time": "2018-05-13T07:44:19",
                "url": "https://files.pythonhosted.org/packages/66/c9/8b53586ada39ab1c266b77a5c4ab8c7120b614c566f59a73faa18fafc742/pyqlearning-1.0.8-py3-none-any.whl"
            },
            {
                "comment_text": "",
                "digests": {
                    "md5": "345979d4e4c7df75e2c805d937bcd72a",
                    "sha256": "be6ba491322269f214d1fe1b6c40df6eb3df3c43d82ba088a8704b7ab27ce8f6"
                },
                "downloads": -1,
                "filename": "pyqlearning-1.0.8.tar.gz",
                "has_sig": false,
                "md5_digest": "345979d4e4c7df75e2c805d937bcd72a",
                "packagetype": "sdist",
                "python_version": "source",
                "requires_python": null,
                "size": 15230,
                "upload_time": "2018-05-13T07:44:22",
                "url": "https://files.pythonhosted.org/packages/58/3f/e9b6f9be4547e265a7efab7300a06b27c37df906738bd522c0fee480ee63/pyqlearning-1.0.8.tar.gz"
            }
        ],
        "1.0.9": [
            {
                "comment_text": "",
                "digests": {
                    "md5": "6a3c2c89587445441a8690b562531a79",
                    "sha256": "de23e4c02ead41a01df17f93fe07988d589de296bb6fb70456a4053228c863f6"
                },
                "downloads": -1,
                "filename": "pyqlearning-1.0.9-py3-none-any.whl",
                "has_sig": false,
                "md5_digest": "6a3c2c89587445441a8690b562531a79",
                "packagetype": "bdist_wheel",
                "python_version": "py3",
                "requires_python": null,
                "size": 19596,
                "upload_time": "2018-05-27T07:31:55",
                "url": "https://files.pythonhosted.org/packages/02/f8/9006316c036e9875ba05c85a7c851b3fa8e6442783361c15e2e69684b489/pyqlearning-1.0.9-py3-none-any.whl"
            },
            {
                "comment_text": "",
                "digests": {
                    "md5": "938337540bb6b0dc559e0695de85c3f4",
                    "sha256": "1b7fc08943d5ea7f7666fb98ffebdd99d042e95529115ba30880a962d0c9e24b"
                },
                "downloads": -1,
                "filename": "pyqlearning-1.0.9.tar.gz",
                "has_sig": false,
                "md5_digest": "938337540bb6b0dc559e0695de85c3f4",
                "packagetype": "sdist",
                "python_version": "source",
                "requires_python": null,
                "size": 15208,
                "upload_time": "2018-05-27T07:31:57",
                "url": "https://files.pythonhosted.org/packages/3e/74/44ad30e0f39a35cfca07882c1f1d97722b1c6188d660417ad1b416d5599f/pyqlearning-1.0.9.tar.gz"
            }
        ],
        "1.1.1": [
            {
                "comment_text": "",
                "digests": {
                    "md5": "c9ccfc4bc26bed5e36853e0f4f288082",
                    "sha256": "3b36aafbfee9a45d1be4cf3a59ac38b579c90364799e2a0e914b024aa8b95c08"
                },
                "downloads": -1,
                "filename": "pyqlearning-1.1.1-py3-none-any.whl",
                "has_sig": false,
                "md5_digest": "c9ccfc4bc26bed5e36853e0f4f288082",
                "packagetype": "bdist_wheel",
                "python_version": "py3",
                "requires_python": null,
                "size": 19316,
                "upload_time": "2018-05-30T13:55:37",
                "url": "https://files.pythonhosted.org/packages/3d/49/f06fb5be30dc405002ade67760f89d43af928e1f2dbbcc8851c30da353ae/pyqlearning-1.1.1-py3-none-any.whl"
            },
            {
                "comment_text": "",
                "digests": {
                    "md5": "161da378ba5634b8a81f25b7127546b4",
                    "sha256": "4779c615b575850d72b2d3227679cc566731446ac64c8665f059eeb99da5474b"
                },
                "downloads": -1,
                "filename": "pyqlearning-1.1.1.tar.gz",
                "has_sig": false,
                "md5_digest": "161da378ba5634b8a81f25b7127546b4",
                "packagetype": "sdist",
                "python_version": "source",
                "requires_python": null,
                "size": 28023,
                "upload_time": "2018-05-30T13:55:38",
                "url": "https://files.pythonhosted.org/packages/3a/94/d03a1e203b88ba0f61c790925c0d0f49a82dd83af979a2609f2fb2fd6427/pyqlearning-1.1.1.tar.gz"
            }
        ],
        "1.1.2": [
            {
                "comment_text": "",
                "digests": {
                    "md5": "a0087c2c97f9cacf0433803881919993",
                    "sha256": "5d10fd0588281b4de1c8bb84b51eac472117d98b70a412d2e74a26082b0008eb"
                },
                "downloads": -1,
                "filename": "pyqlearning-1.1.2-py3-none-any.whl",
                "has_sig": false,
                "md5_digest": "a0087c2c97f9cacf0433803881919993",
                "packagetype": "bdist_wheel",
                "python_version": "py3",
                "requires_python": null,
                "size": 21139,
                "upload_time": "2018-06-09T04:24:22",
                "url": "https://files.pythonhosted.org/packages/4e/20/acf2068872d9dafb433587be0fed39bac0a18e6c01256a940fde72b8110a/pyqlearning-1.1.2-py3-none-any.whl"
            },
            {
                "comment_text": "",
                "digests": {
                    "md5": "d46c0f0fba4a5ac06b3275d6f8580cf8",
                    "sha256": "f86aae37eefce498abecbb107fb3d2b37ac1c8ef1d39c1e90097e496bea13cfd"
                },
                "downloads": -1,
                "filename": "pyqlearning-1.1.2.tar.gz",
                "has_sig": false,
                "md5_digest": "d46c0f0fba4a5ac06b3275d6f8580cf8",
                "packagetype": "sdist",
                "python_version": "source",
                "requires_python": null,
                "size": 31505,
                "upload_time": "2018-06-09T04:24:23",
                "url": "https://files.pythonhosted.org/packages/41/ea/2fbc49e99b6030955934b09b571a99665ea2b21ef0d05031ff33ac2b6613/pyqlearning-1.1.2.tar.gz"
            }
        ],
        "1.1.3": [
            {
                "comment_text": "",
                "digests": {
                    "md5": "efc174dd0a7e3acb710cac69466d950b",
                    "sha256": "5d00756593cd1a87c08e9e4d048f3c23c6beeb6a660ce77f093f5d326b1cfe8e"
                },
                "downloads": -1,
                "filename": "pyqlearning-1.1.3-py3-none-any.whl",
                "has_sig": false,
                "md5_digest": "efc174dd0a7e3acb710cac69466d950b",
                "packagetype": "bdist_wheel",
                "python_version": "py3",
                "requires_python": null,
                "size": 21644,
                "upload_time": "2018-06-10T04:26:21",
                "url": "https://files.pythonhosted.org/packages/13/23/a5661446271520f1fe2eda6131aa345ea371a3cd567753c012ae81b827d7/pyqlearning-1.1.3-py3-none-any.whl"
            },
            {
                "comment_text": "",
                "digests": {
                    "md5": "0e241d3839510a4e099e4dbe99b50326",
                    "sha256": "3ee1d464ed7f91efce2e7eb843a9da7bb113fdb23d21e805c73ca28589ea10a3"
                },
                "downloads": -1,
                "filename": "pyqlearning-1.1.3.tar.gz",
                "has_sig": false,
                "md5_digest": "0e241d3839510a4e099e4dbe99b50326",
                "packagetype": "sdist",
                "python_version": "source",
                "requires_python": null,
                "size": 31910,
                "upload_time": "2018-06-10T04:26:22",
                "url": "https://files.pythonhosted.org/packages/00/ba/a2d42860c67b6fd30585f27e6ded6fe8918047a29520205c5c6a48923211/pyqlearning-1.1.3.tar.gz"
            }
        ],
        "1.1.4": [
            {
                "comment_text": "",
                "digests": {
                    "md5": "8648f44c7e7f132db0483a94fd0cd024",
                    "sha256": "5a205f63e503c688201803af381d1eb4c6c9438b1ac505687018a7e9db146a9a"
                },
                "downloads": -1,
                "filename": "pyqlearning-1.1.4-py3-none-any.whl",
                "has_sig": false,
                "md5_digest": "8648f44c7e7f132db0483a94fd0cd024",
                "packagetype": "bdist_wheel",
                "python_version": "py3",
                "requires_python": null,
                "size": 28970,
                "upload_time": "2018-06-16T10:44:43",
                "url": "https://files.pythonhosted.org/packages/a0/cb/00fc9671e0e0d193ad789018c6771620fe4e90faaecdd9c57af843386d1d/pyqlearning-1.1.4-py3-none-any.whl"
            },
            {
                "comment_text": "",
                "digests": {
                    "md5": "abe4e024f891d229b420917e816a819e",
                    "sha256": "32e8a6923c6e1974953709930084c6147c50310551dc72fbd4135569f27eb440"
                },
                "downloads": -1,
                "filename": "pyqlearning-1.1.4.tar.gz",
                "has_sig": false,
                "md5_digest": "abe4e024f891d229b420917e816a819e",
                "packagetype": "sdist",
                "python_version": "source",
                "requires_python": null,
                "size": 45022,
                "upload_time": "2018-06-16T10:44:44",
                "url": "https://files.pythonhosted.org/packages/89/45/a40088ab4fe827aa23a765aafbd25460e2bb0fa9f41b5700417a314cf6a2/pyqlearning-1.1.4.tar.gz"
            }
        ],
        "1.1.5": [
            {
                "comment_text": "",
                "digests": {
                    "md5": "2bf21da0f9d2d41bb71bd3cc5cad0bf4",
                    "sha256": "0f30844c4f11465f82c992c7db5d4a8ffda5cd63b750d524205183b4c9e15646"
                },
                "downloads": -1,
                "filename": "pyqlearning-1.1.5-py3-none-any.whl",
                "has_sig": false,
                "md5_digest": "2bf21da0f9d2d41bb71bd3cc5cad0bf4",
                "packagetype": "bdist_wheel",
                "python_version": "py3",
                "requires_python": null,
                "size": 32076,
                "upload_time": "2018-07-01T00:59:31",
                "url": "https://files.pythonhosted.org/packages/da/90/e61fe46e55a7182680deaab6260aca1b69a6d0b024c2b31f99ffc189a367/pyqlearning-1.1.5-py3-none-any.whl"
            }
        ],
        "1.1.6": [
            {
                "comment_text": "",
                "digests": {
                    "md5": "7db3d5d546cf4066a3ed3a10cc2a056c",
                    "sha256": "74d14d8df1f9b0d7a0ac9ade807c6e85bfef1eb49321dd92806cd0f9ae507fef"
                },
                "downloads": -1,
                "filename": "pyqlearning-1.1.6-py3-none-any.whl",
                "has_sig": false,
                "md5_digest": "7db3d5d546cf4066a3ed3a10cc2a056c",
                "packagetype": "bdist_wheel",
                "python_version": "py3",
                "requires_python": null,
                "size": 32288,
                "upload_time": "2018-09-16T08:35:57",
                "url": "https://files.pythonhosted.org/packages/fa/46/137df6b510dd63081052930c2ab9d51e68273812b811e00f8a90407eccde/pyqlearning-1.1.6-py3-none-any.whl"
            }
        ],
        "1.1.7": [
            {
                "comment_text": "",
                "digests": {
                    "md5": "63d59e5baf6d9703ae72bda57701c23b",
                    "sha256": "ce7ca1ccdbf2f696a65982070cd3e2dace51433e0dafecb70ce47f7d7cac6be7"
                },
                "downloads": -1,
                "filename": "pyqlearning-1.1.7-py3-none-any.whl",
                "has_sig": false,
                "md5_digest": "63d59e5baf6d9703ae72bda57701c23b",
                "packagetype": "bdist_wheel",
                "python_version": "py3",
                "requires_python": null,
                "size": 40929,
                "upload_time": "2018-11-27T02:17:05",
                "url": "https://files.pythonhosted.org/packages/fc/df/1f787ac3cf9c113df80452278fd6e8896cb883a9d72b5191e58ddc246205/pyqlearning-1.1.7-py3-none-any.whl"
            }
        ],
        "1.1.8": [
            {
                "comment_text": "",
                "digests": {
                    "md5": "a1e1c74034341e69f553542a68cc240e",
                    "sha256": "fcede9c41f87ce6d66be07ce3eff12d1754f7a93b5ad2b14fa949311281062fb"
                },
                "downloads": -1,
                "filename": "pyqlearning-1.1.8-py3-none-any.whl",
                "has_sig": false,
                "md5_digest": "a1e1c74034341e69f553542a68cc240e",
                "packagetype": "bdist_wheel",
                "python_version": "py3",
                "requires_python": null,
                "size": 58210,
                "upload_time": "2018-12-13T13:53:43",
                "url": "https://files.pythonhosted.org/packages/67/a4/f74588a482e0061175ee087a8bde5f6a48be1f796c58d0e7cb192f331800/pyqlearning-1.1.8-py3-none-any.whl"
            }
        ],
        "1.1.9": [
            {
                "comment_text": "",
                "digests": {
                    "md5": "2aa4835cb3b2f26e822b2d23806c0f92",
                    "sha256": "142580d4e6fb1c610910c15eb928d48c2feaa0a67fa4a6f49156f23cbd696bc3"
                },
                "downloads": -1,
                "filename": "pyqlearning-1.1.9-py3-none-any.whl",
                "has_sig": false,
                "md5_digest": "2aa4835cb3b2f26e822b2d23806c0f92",
                "packagetype": "bdist_wheel",
                "python_version": "py3",
                "requires_python": null,
                "size": 58531,
                "upload_time": "2019-01-25T16:32:16",
                "url": "https://files.pythonhosted.org/packages/eb/70/59c22b132bc4626cf1f8d46b2d79b59f49380454c0dc1683c9a959718a0e/pyqlearning-1.1.9-py3-none-any.whl"
            }
        ],
        "1.2.0": [
            {
                "comment_text": "",
                "digests": {
                    "md5": "6a3d1149334fae3a8575f96159c1749e",
                    "sha256": "76020f0ad6d0aeea2f8a4c8b1dbac6db145c1f0e33ca143b5b447d17589dd7c0"
                },
                "downloads": -1,
                "filename": "pyqlearning-1.2.0-py3-none-any.whl",
                "has_sig": false,
                "md5_digest": "6a3d1149334fae3a8575f96159c1749e",
                "packagetype": "bdist_wheel",
                "python_version": "py3",
                "requires_python": null,
                "size": 61520,
                "upload_time": "2019-01-27T13:57:25",
                "url": "https://files.pythonhosted.org/packages/e0/a0/ba4814388937cb48848773377d18cbcde99bb581e0641f780e1f9f891557/pyqlearning-1.2.0-py3-none-any.whl"
            }
        ],
        "1.2.1": [
            {
                "comment_text": "",
                "digests": {
                    "md5": "d1135661cb21e5731fd52f15adb25e92",
                    "sha256": "894e5296f933ebf87f7f2013fea994bfab20effc1ba8eddba4e246fa6ba73092"
                },
                "downloads": -1,
                "filename": "pyqlearning-1.2.1-py3-none-any.whl",
                "has_sig": false,
                "md5_digest": "d1135661cb21e5731fd52f15adb25e92",
                "packagetype": "bdist_wheel",
                "python_version": "py3",
                "requires_python": null,
                "size": 61414,
                "upload_time": "2019-02-17T07:03:05",
                "url": "https://files.pythonhosted.org/packages/a1/b4/33670727dc3038b5115e306b8e9be419fe122a41a1331372b33b5a5e7ef5/pyqlearning-1.2.1-py3-none-any.whl"
            }
        ],
        "1.2.2": [
            {
                "comment_text": "",
                "digests": {
                    "md5": "d9cd08368aac9262384bc150aee4bc84",
                    "sha256": "0e8a6a9bf75971315285a7e6be440ec7b6afef0d5912a339fc117f1300e04d14"
                },
                "downloads": -1,
                "filename": "pyqlearning-1.2.2-py3-none-any.whl",
                "has_sig": false,
                "md5_digest": "d9cd08368aac9262384bc150aee4bc84",
                "packagetype": "bdist_wheel",
                "python_version": "py3",
                "requires_python": null,
                "size": 61714,
                "upload_time": "2019-03-23T14:57:12",
                "url": "https://files.pythonhosted.org/packages/d6/db/9ad1cb39ade1e8277b02e52f33721048a7a957d3fac3216c4f6240af73e0/pyqlearning-1.2.2-py3-none-any.whl"
            }
        ],
        "1.2.3": [
            {
                "comment_text": "",
                "digests": {
                    "md5": "f1d2033df521ca29b5c5f881ff0576b0",
                    "sha256": "e6f9787d7c152f9ab70ce3900f624364711ad524e1b2f65e854ccce7fc76a68c"
                },
                "downloads": -1,
                "filename": "pyqlearning-1.2.3-py3-none-any.whl",
                "has_sig": false,
                "md5_digest": "f1d2033df521ca29b5c5f881ff0576b0",
                "packagetype": "bdist_wheel",
                "python_version": "py3",
                "requires_python": null,
                "size": 52051,
                "upload_time": "2019-04-07T03:15:09",
                "url": "https://files.pythonhosted.org/packages/94/ce/c54efbc4ad803864de1540e84cfb59f693b77924a9c886d923ba84da0aea/pyqlearning-1.2.3-py3-none-any.whl"
            }
        ],
        "1.2.4": [
            {
                "comment_text": "",
                "digests": {
                    "md5": "f296d1f839ac4403ad04eaceb1a1cd16",
                    "sha256": "ddcd5be22bc0f97111bb906e2f11cff4fe61418c63c624b132d5828d4bdf03c6"
                },
                "downloads": -1,
                "filename": "pyqlearning-1.2.4.tar.gz",
                "has_sig": false,
                "md5_digest": "f296d1f839ac4403ad04eaceb1a1cd16",
                "packagetype": "sdist",
                "python_version": "source",
                "requires_python": null,
                "size": 67908,
                "upload_time": "2019-06-23T11:30:33",
                "url": "https://files.pythonhosted.org/packages/0d/3f/08285559fcad87153535d38d65efab9b3f4a2a4cbaaeca67083f0206aa93/pyqlearning-1.2.4.tar.gz"
            }
        ]
    },
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "md5": "f296d1f839ac4403ad04eaceb1a1cd16",
                "sha256": "ddcd5be22bc0f97111bb906e2f11cff4fe61418c63c624b132d5828d4bdf03c6"
            },
            "downloads": -1,
            "filename": "pyqlearning-1.2.4.tar.gz",
            "has_sig": false,
            "md5_digest": "f296d1f839ac4403ad04eaceb1a1cd16",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 67908,
            "upload_time": "2019-06-23T11:30:33",
            "url": "https://files.pythonhosted.org/packages/0d/3f/08285559fcad87153535d38d65efab9b3f4a2a4cbaaeca67083f0206aa93/pyqlearning-1.2.4.tar.gz"
        }
    ]
}