{ "info": { "author": "Olivier Grisel", "author_email": "olivier.grisel@ensta.org", "bugtrack_url": null, "classifiers": [ "Development Status :: 3 - Alpha", "Intended Audience :: Developers", "Intended Audience :: Science/Research", "License :: OSI Approved", "Operating System :: MacOS", "Operating System :: Microsoft :: Windows", "Operating System :: POSIX", "Operating System :: Unix", "Programming Language :: Python", "Topic :: Scientific/Engineering", "Topic :: Software Development" ], "description": "# Pyrallel - Parallel Data Analytics in Python\n\n**Overview**: experimental project to investigate distributed computation\npatterns for machine learning and other semi-interactive data analytics\ntasks.\n\n**Scope**:\n\n- focus on small to medium dataset that fits in memory on a small\n (10+ nodes) to medium cluster (100+ nodes).\n\n- focus on small to medium data (with data locality when possible).\n\n- focus on CPU bound tasks (e.g. training Random Forests) while trying to\n limit disk / network access to a minimum.\n\n- do not focus on HA / Fault Tolerance (yet).\n\n- do not try to invent new set of high level programming abstractions\n (yet): use a low level programming model (IPython.parallel) to finely\n control the cluster elements and messages transfered and help identify\n what are the practical underlying constraints in distributed machine\n learning setting.\n\n\n**Disclaimer**: the public API of this library will probably not be\nstable soon as the current goal of this project is to experiment.\n\n\n## Dependencies\n\nThe usual suspects: Python 2.7, NumPy, SciPy.\n\nFetch the development version (master branch) from:\n\n- https://github.com/ipython/ipython\n\n- https://github.com/scikit-learn/scikit-learn\n\nStarCluster `develop` branch and its `IPCluster` plugin is also required\nto easily startup a bunch of nodes with IPython.parallel setup.\n\n## Patterns currently under investigation\n\n- Asynchronous & randomized hyper-parameters search (a.k.a. Randomized Grid\n Search) for machine learning models\n\n- Share numerical arrays efficiently over the nodes and make them\n available to concurrently running Python processes without making\n copies in memory using memory-mapped files.\n\n- Distributed Random Forests fitting.\n\n- Ensembling heterogeneous library models.\n\n- Parallel implementation of online averaged models using a MPI AllReduce, for\n instance using MiniBatchKMeans on partitioned data.\n\n\nSee the content of the `examples/` folder for more details.\n\n\n## License\n\nSimplified BSD.\n\n\n## History\n\nThis project started at the [PyCon 2012 PyData\nsprint](http://wiki.ipython.org/PyCon12Sprint)\nas a set of proof of concept [IPython.parallel\nscripts](https://github.com/ogrisel/pycon-pydata-sprint).\n", "description_content_type": null, "docs_url": null, "download_url": "UNKNOWN", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "http://github.com/pydata/pyrallel", "keywords": null, "license": "MIT", "maintainer": null, "maintainer_email": null, "name": "pyrallel", "package_url": "https://pypi.org/project/pyrallel/", "platform": "UNKNOWN", "project_url": "https://pypi.org/project/pyrallel/", "project_urls": { "Download": "UNKNOWN", "Homepage": "http://github.com/pydata/pyrallel" }, "release_url": "https://pypi.org/project/pyrallel/0.2.1/", "requires_dist": null, "requires_python": null, "summary": "Experimental tools for parallel machine learning", "version": "0.2.1" }, "last_serial": 775398, "releases": { "0.2": [ { "comment_text": "", "digests": { "md5": "9cd094b4e2d7eed5511933a23fc43212", "sha256": "f586c0d66b56038c32edd112e25f6d98e16c3df766abbdef0f044501e7ac8836" }, "downloads": -1, "filename": "pyrallel-0.2.tar.gz", "has_sig": false, "md5_digest": "9cd094b4e2d7eed5511933a23fc43212", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 7841, "upload_time": "2013-06-13T14:46:34", "url": "https://files.pythonhosted.org/packages/34/7c/23761a09302544e3275cebc8d2930248e43efbe1f125fd8219807ed2691c/pyrallel-0.2.tar.gz" } ], "0.2.1": [ { "comment_text": "", "digests": { "md5": "ffe1bed1bff178ec2f91e4155d7fe930", "sha256": "b57cfdf7dfc14628d7c3b738e0e23d8750a5ffc1ac2aae83d2216cd1549b6b30" }, "downloads": -1, "filename": "pyrallel-0.2.1.tar.gz", "has_sig": false, "md5_digest": "ffe1bed1bff178ec2f91e4155d7fe930", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 7853, "upload_time": "2013-06-20T14:47:34", "url": "https://files.pythonhosted.org/packages/ca/b8/42036676c89dcc92c90e74f8cb5ccc60bc2da860bef3115068806ef639ef/pyrallel-0.2.1.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "ffe1bed1bff178ec2f91e4155d7fe930", "sha256": "b57cfdf7dfc14628d7c3b738e0e23d8750a5ffc1ac2aae83d2216cd1549b6b30" }, "downloads": -1, "filename": "pyrallel-0.2.1.tar.gz", "has_sig": false, "md5_digest": "ffe1bed1bff178ec2f91e4155d7fe930", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 7853, "upload_time": "2013-06-20T14:47:34", "url": "https://files.pythonhosted.org/packages/ca/b8/42036676c89dcc92c90e74f8cb5ccc60bc2da860bef3115068806ef639ef/pyrallel-0.2.1.tar.gz" } ] }