{ "info": { "author": "Maxim Podkolzine", "author_email": "maxim.podkolzine@gmail.com", "bugtrack_url": null, "classifiers": [], "description": "============================================\nHyper-parameters Tuning for Machine Learning\n============================================\n\n- `Overview <#overview>`__\n - `About <#about>`__\n - `Installation <#installation>`__\n - `How to use <#how-to-use>`__\n- `Features <#features>`__\n - `Straight-forward specification <#straight-forward-specification>`__\n - `Exploration-exploitation trade-off <#exploration-exploitation-trade-off>`__\n - `Learning Curve Estimation <#learning-curve-estimation>`__\n- `Bayesian Optimization <#bayesian-optimization>`__\n\n--------\nOverview\n--------\n\nAbout\n=====\n\n*Hyper-Engine* is a toolbox for `model selection and hyper-parameters tuning `__.\nIt aims to provide most state-of-the-art techniques via intuitive API and with minimum dependencies.\n*Hyper-Engine* is **not a framework**, which means it doesn't enforce any structure or design to the main code,\nthus making integration local and non-intrusive.\n\nInstallation\n============\n\n.. code-block:: shell\n\n pip install git+https://github.com/maxim5/hyper-engine.git@master \n\nDependencies:\n\n- Six, NumPy, SciPy\n- TensorFlow (optional)\n- PyPlot (optional, only for development)\n\nCompatibility:\n\n.. image:: https://travis-ci.org/maxim5/hyper-engine.svg?branch=master\n :target: https://travis-ci.org/maxim5/hyper-engine\n\n- Python 2.7, 3.5, 3.6\n\nLicense:\n\n- `Apache 2.0 `__\n\n*Hyper-Engine* is designed to be ML-platform agnostic, but currently provides only simple `TensorFlow `__ binding.\n\nHow to use\n==========\n\nAdapting your code to *Hyper-Engine* usually boils down to migrating hard-coded hyper-parameters to a dictionary (or an object)\nand giving names to particular tensors.\n\n**Before:**\n\n.. code-block:: python\n\n def my_model():\n x = tf.placeholder(...)\n y = tf.placeholder(...)\n ...\n optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.01)\n ...\n\n**After:**\n\n.. code-block:: python\n\n def my_model(params):\n x = tf.placeholder(..., name='input')\n y = tf.placeholder(..., name='label')\n ...\n optimizer = tf.train.GradientDescentOptimizer(learning_rate=params['learning_rate'])\n ...\n\n # Now can run the model with any set of hyper-parameters\n\n\nThe rest of the integration code is isolated and can be placed in the ``main`` script.\nSee the examples of hyper-parameter tuning in `examples `__ package.\n\n--------\nFeatures\n--------\n\nStraight-forward specification\n==============================\n\nThe crucial part of hyper-parameter tuning is the definition of a *domain*\nover which the engine is going to optimize the model. Some variables are continuous (e.g., the learning rate),\nsome variables are integer values in a certain range (e.g., the number of hidden units), some variables are categorical\nand represent architecture knobs (e.g., the choice of non-linearity).\n\nYou can define all these variables and their ranges in ``numpy``-like fashion:\n\n.. code-block:: python\n\n hyper_params_spec = {\n 'optimizer': {\n 'learning_rate': 10**spec.uniform(-3, -1), # makes the continuous range [0.1, 0.001]\n 'epsilon': 1e-8, # constants work too\n },\n 'conv': {\n 'filters': [[3, 3, spec.choice(range(32, 48))], # an integer between [32, 48]\n [3, 3, spec.choice(range(64, 96))], # an integer between [64, 96]\n [3, 3, spec.choice(range(128, 192))]], # an integer between [128, 192]\n 'activation': spec.choice(['relu','prelu','elu']), # a categorical range: 1 of 3 activations\n 'down_sample': {\n 'size': [2, 2],\n 'pooling': spec.choice(['max_pool', 'avg_pool']) # a categorical range: 1 of 2 pooling methods\n },\n 'residual': spec.random_bool(), # either True or False\n 'dropout': spec.uniform(0.75, 1.0), # a uniform continuous range\n },\n }\n\nNote that ``10**spec.uniform(-3, -1)`` is not the same *distribution* as ``spec.uniform(0.001, 0.1)``\n(though they both define the same *range* of values).\nIn the first case, the whole logarithmic spectrum ``(-3, -1)`` is equally probable, while in\nthe second case, small values around ``0.001`` are much less likely than the values around the mean ``0.0495``.\nSpecifying the following domain range for the learning rate - ``spec.uniform(0.001, 0.1)`` - will likely skew the results\ntowards higher learning rates. This outlines the importance of random variable transformations and arithmetic operations.\n\nExploration-exploitation trade-off\n==================================\n\nMachine learning model selection is expensive.\nEach model evaluation requires full training from scratch and may take minutes to hours to days, \ndepending on the problem complexity and available computational resources.\n*Hyper-Engine* provides the algorithm to explore the space of parameters efficiently, focus on the most promising areas,\nthus converge to the maximum as fast as possible.\n\n**Example 1**: the true function is 1-dimensional, ``f(x) = x * sin(x)`` (black curve) on [-10, 10] interval.\nRed dots represent each trial, red curve is the `Gaussian Process `__ mean,\nblue curve is the mean plus or minus one standard deviation.\nThe optimizer randomly chose the negative mode as more promising.\n\n.. image:: /.images/figure_1.png\n :width: 80%\n :alt: 1D Bayesian Optimization\n :align: center\n\n**Example 2**: the 2-dimensional function ``f(x, y) = (x + y) / ((x - 1) ** 2 - sin(y) + 2)`` (black surface) on [0,9]x[0,9] square.\nRed dots represent each trial, the Gaussian Process mean and standard deviations are not shown for simplicity.\nNote that to achieve the maximum both variables must be picked accurately.\n\n.. image:: /.images/figure_2-1.png\n :width: 100%\n :alt: 2D Bayesian Optimization\n :align: center\n\n.. image:: /.images/figure_2-2.png\n :width: 100%\n :alt: 2D Bayesian Optimization\n :align: center\n\nThe code for these and others examples is `here `__.\n\nLearning Curve Estimation\n=========================\n\n*Hyper-Engine* can monitor the model performance during the training and stop early if it's learning too slowly.\nThis is done via *learning curve prediction*. Note that this technique is compatible with Bayesian Optimization, since\nit estimates the model accuracy after full training - this value can be safely used to update Gaussian Process parameters.\n\nExample code:\n\n.. code-block:: python\n\n curve_params = {\n 'burn_in': 30, # burn-in period: 30 models \n 'min_input_size': 5, # start predicting after 5 epochs\n 'value_limit': 0.80, # stop if the estimate is less than 80% with high probability\n }\n curve_predictor = LinearCurvePredictor(**curve_params)\n\nCurrently there is only one implementation of the predictor, ``LinearCurvePredictor``, \nwhich is very efficient, but requires relatively large burn-in period to predict model accuracy without flaws.\n\nNote that learning curves can be reused between different models and works quite well for the burn-in,\nso it's recommended to serialize and load curve data via ``io_save_dir`` and ``io_load_dir`` parameters.\n\nSee also the following paper:\n`Speeding up Automatic Hyperparameter Optimization of Deep Neural Networks\nby Extrapolation of Learning Curves `__\n\n---------------------\nBayesian Optimization\n---------------------\n\nImplements the following `methods `__:\n\n- Probability of improvement (See H. J. Kushner. A new method of locating the maximum of an arbitrary multipeak curve in the presence of noise. J. Basic Engineering, 86:97\u2013106, 1964.)\n- Expected Improvement (See J. Mockus, V. Tiesis, and A. Zilinskas. Toward Global Optimization, volume 2, chapter The Application of Bayesian Methods for Seeking the Extremum, pages 117\u2013128. Elsevier, 1978)\n- `Upper Confidence Bound `__\n- `Mixed / Portfolio strategy `__\n- Naive random search.\n\nPI method prefers exploitation to exploration, UCB is the opposite. One of the best strategies we've seen is a mixed one:\nstart with high probability of UCB and gradually decrease it, increasing PI probability.\n\nDefault kernel function used is `RBF kernel `__, but it is extensible.", "description_content_type": null, "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/maxim5/hyper-engine", "keywords": "machine learning,hyper-parameters,model selection,bayesian optimization", "license": "Apache 2.0", "maintainer": "", "maintainer_email": "", "name": "hyperengine", "package_url": "https://pypi.org/project/hyperengine/", "platform": "", "project_url": "https://pypi.org/project/hyperengine/", "project_urls": { "Homepage": "https://github.com/maxim5/hyper-engine" }, "release_url": "https://pypi.org/project/hyperengine/0.1.1/", "requires_dist": null, "requires_python": "", "summary": "Python library for Bayesian hyper-parameters optimization", "version": "0.1.1" }, "last_serial": 3569838, "releases": { "0.1.1": [ { "comment_text": "", "digests": { "md5": "05c3de48ece3ffc00c21ac639bdbdfe9", "sha256": "1230857839fcaa94f8781966e2f10b1eb549a30977c7483a2933f315570d141e" }, "downloads": -1, "filename": "hyperengine-0.1.1.tar.gz", "has_sig": false, "md5_digest": "05c3de48ece3ffc00c21ac639bdbdfe9", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 31237, "upload_time": "2018-02-10T12:56:52", "url": "https://files.pythonhosted.org/packages/d7/de/cc05d99e18ddb74012bf5d5ec8f7932fd5d667a5373c576d26dfad6f598a/hyperengine-0.1.1.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "05c3de48ece3ffc00c21ac639bdbdfe9", "sha256": "1230857839fcaa94f8781966e2f10b1eb549a30977c7483a2933f315570d141e" }, "downloads": -1, "filename": "hyperengine-0.1.1.tar.gz", "has_sig": false, "md5_digest": "05c3de48ece3ffc00c21ac639bdbdfe9", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 31237, "upload_time": "2018-02-10T12:56:52", "url": "https://files.pythonhosted.org/packages/d7/de/cc05d99e18ddb74012bf5d5ec8f7932fd5d667a5373c576d26dfad6f598a/hyperengine-0.1.1.tar.gz" } ] }