{ "info": { "author": "Joseph Bradley", "author_email": "joseph@databricks.com", "bugtrack_url": null, "classifiers": [ "Development Status :: 4 - Beta", "Intended Audience :: Developers", "License :: OSI Approved :: Apache Software License", "Natural Language :: English", "Operating System :: OS Independent", "Programming Language :: Python", "Programming Language :: Python :: 2.6", "Programming Language :: Python :: 2.7", "Programming Language :: Python :: 3", "Programming Language :: Python :: 3.2", "Topic :: Scientific/Engineering" ], "description": "Scikit-learn integration package for Apache Spark\n=================================================\n\nThis package contains some tools to integrate the `Spark computing framework `_\nwith the popular `scikit-learn machine library `_. Among other things, it can:\n\n- train and evaluate multiple scikit-learn models in parallel. It is a distributed analog to the\n `multicore implementation `_ included by default in ``scikit-learn``\n- convert Spark's Dataframes seamlessly into numpy ``ndarray`` or sparse matrices\n- (experimental) distribute Scipy's sparse matrices as a dataset of sparse vectors\n\nIt focuses on problems that have a small amount of data and that can be run in parallel.\nFor small datasets, it distributes the search for estimator parameters (``GridSearchCV`` in scikit-learn),\nusing Spark. For datasets that do not fit in memory, we recommend using the `distributed implementation in\n`Spark MLlib `_.\n\nThis package distributes simple tasks like grid-search cross-validation.\nIt does not distribute individual learning algorithms (unlike Spark MLlib).\n\nInstallation\n------------\n\nThis package is available on PYPI:\n\n::\n\n\tpip install spark-sklearn\n\nThis project is also available as `Spark package `_.\n\nThe developer version has the following requirements:\n\n- scikit-learn 0.18 or 0.19. Later versions may work, but tests currently are incompatible with 0.20.\n- Spark >= 2.1.1. Spark may be downloaded from the `Spark website `_.\n In order to use this package, you need to use the pyspark interpreter or another Spark-compliant python\n interpreter. See the `Spark guide `_\n for more details.\n- `nose `_ (testing dependency only)\n- pandas, if using the pandas integration or testing. pandas==0.18 has been tested.\n\nIf you want to use a developer version, you just need to make sure the ``python/`` subdirectory is in the\n``PYTHONPATH`` when launching the pyspark interpreter:\n\n::\n\n\tPYTHONPATH=$PYTHONPATH:./python:$SPARK_HOME/bin/pyspark\n\nYou can directly run tests:\n\n::\n\n cd python && ./run-tests.sh\n\nThis requires the environment variable ``SPARK_HOME`` to point to your local copy of Spark.\n\nExample\n-------\n\nHere is a simple example that runs a grid search with Spark. See the `Installation <#installation>`_ section\non how to install the package.\n\n.. code:: python\n\n from sklearn import svm, datasets\n from spark_sklearn import GridSearchCV\n iris = datasets.load_iris()\n parameters = {'kernel':('linear', 'rbf'), 'C':[1, 10]}\n svr = svm.SVC(gamma='auto')\n clf = GridSearchCV(sc, svr, parameters)\n clf.fit(iris.data, iris.target)\n\nThis classifier can be used as a drop-in replacement for any scikit-learn classifier, with the same API.\n\nDocumentation\n-------------\n\n`API documentation `_ is currently hosted on Github pages. To\nbuild the docs yourself, see the instructions in ``docs/``.\n\n.. image:: https://travis-ci.org/databricks/spark-sklearn.svg?branch=master\n :target: https://travis-ci.org/databricks/spark-sklearn", "description_content_type": "", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/databricks/spark-sklearn", "keywords": "spark,scikit-learn,distributed computing,machine learning", "license": "Apache 2.0", "maintainer": "Tim Hunter", "maintainer_email": "timhunter@databricks.com", "name": "spark-sklearn", "package_url": "https://pypi.org/project/spark-sklearn/", "platform": "", "project_url": "https://pypi.org/project/spark-sklearn/", "project_urls": { "Homepage": "https://github.com/databricks/spark-sklearn" }, "release_url": "https://pypi.org/project/spark-sklearn/0.3.0/", "requires_dist": null, "requires_python": "", "summary": "Integration tools for running scikit-learn on Spark", "version": "0.3.0" }, "last_serial": 4761271, "releases": { "0.1.0": [ { "comment_text": "", "digests": { "md5": "8f4eb6d13d3504df14e83aaed2c83392", "sha256": "dc5f24ca62cb215e0f5f28fd80b8432c077d70ffe87fa66a319fb863f0ff367a" }, "downloads": -1, "filename": "spark-sklearn-0.1.0.tar.gz", "has_sig": false, "md5_digest": "8f4eb6d13d3504df14e83aaed2c83392", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 12705, "upload_time": "2016-01-11T18:35:34", "url": "https://files.pythonhosted.org/packages/af/94/b5084d829f24d9216829331f27ebc4421eb70237845a4c2ef6576af42434/spark-sklearn-0.1.0.tar.gz" } ], "0.1.1": [ { "comment_text": "", "digests": { "md5": "7871167cb45cf82a75494131c0e56668", "sha256": "abdcc0db8aa1c6d5c26049e8fdcae2b43d6aba7da2964d42f29b3d065d2a4a01" }, "downloads": -1, "filename": "spark-sklearn-0.1.1.tar.gz", "has_sig": false, "md5_digest": "7871167cb45cf82a75494131c0e56668", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 12769, "upload_time": "2016-01-11T21:51:35", "url": "https://files.pythonhosted.org/packages/1a/d3/61b007aee744a95f30909d2516ce3648e547df8966d4d419e81c1749be34/spark-sklearn-0.1.1.tar.gz" } ], "0.1.2": [ { "comment_text": "", "digests": { "md5": "6cad560411f5cf10b229be32cb80bfe0", "sha256": "e1760614889b04721e934be3607796dd78a25db7ab61065b9454629532aa7dc0" }, "downloads": -1, "filename": "spark-sklearn-0.1.2.tar.gz", "has_sig": false, "md5_digest": "6cad560411f5cf10b229be32cb80bfe0", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 12825, "upload_time": "2016-03-17T20:15:00", "url": "https://files.pythonhosted.org/packages/f9/59/76791ab8a6a79d1361cc525e5e957086ea9ee83a5c0d2f06dc217152b172/spark-sklearn-0.1.2.tar.gz" } ], "0.2.0": [ { "comment_text": "", "digests": { "md5": "d4a18f2e499157cb835f01854b0fa23c", "sha256": "07d6f7b9d401269a776e4faf9393b531fcf3fcee5a2c05baf9009928ad8517fd" }, "downloads": -1, "filename": "spark-sklearn-0.2.0.tar.gz", "has_sig": false, "md5_digest": "d4a18f2e499157cb835f01854b0fa23c", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 23489, "upload_time": "2016-08-16T16:08:07", "url": "https://files.pythonhosted.org/packages/e8/1c/8d1094ec833bdadc8841a481245536880d3902f47da0273589d60c91dde6/spark-sklearn-0.2.0.tar.gz" } ], "0.2.1": [ { "comment_text": "", "digests": { "md5": "5edb520f062de622691c66c752db3a58", "sha256": "4a13589f4c8f18e6a6dfddf39639cfb1d48dcb88e8b78b18351bbee4a014311e" }, "downloads": -1, "filename": "spark-sklearn-0.2.1.tar.gz", "has_sig": false, "md5_digest": "5edb520f062de622691c66c752db3a58", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 25412, "upload_time": "2017-09-11T20:40:14", "url": "https://files.pythonhosted.org/packages/b1/b1/4eb9d52d0d414aa0e9e1df83c622bd6d1bf03b6a39e1cec191aa72851063/spark-sklearn-0.2.1.tar.gz" } ], "0.2.2": [ { "comment_text": "", "digests": { "md5": "75d869ce02baa3f878864be4d153f363", "sha256": "745320929b690116ddd244a9d6908c73a2368bdf788c17b2ae886a6d8553ce04" }, "downloads": -1, "filename": "spark-sklearn-0.2.2.tar.gz", "has_sig": false, "md5_digest": "75d869ce02baa3f878864be4d153f363", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 25509, "upload_time": "2017-09-20T18:22:46", "url": "https://files.pythonhosted.org/packages/8a/6c/6739612b734be78f0a864298d1ad5d7746e8a9d21adb523d58fa73d076ad/spark-sklearn-0.2.2.tar.gz" } ], "0.2.3": [ { "comment_text": "", "digests": { "md5": "3bdce4131bad01edfbc862ba52152710", "sha256": "dc3d9d6436fe74e20b57a12411aaed4e304403ecc26e8188cc1d31dbe0c1a33d" }, "downloads": -1, "filename": "spark-sklearn-0.2.3.tar.gz", "has_sig": false, "md5_digest": "3bdce4131bad01edfbc862ba52152710", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 25561, "upload_time": "2017-09-29T21:18:11", "url": "https://files.pythonhosted.org/packages/4b/c5/b05fd936ba1656261113f63a05579166931b1c538995922a6399878a907c/spark-sklearn-0.2.3.tar.gz" } ], "0.3.0": [ { "comment_text": "", "digests": { "md5": "4460d6c8402a5b46d361c442c2e47f19", "sha256": "d78d4f08a3849b243232ef78b63b4babfdb04ec529f996f4699923f40cfce827" }, "downloads": -1, "filename": "spark-sklearn-0.3.0.tar.gz", "has_sig": false, "md5_digest": "4460d6c8402a5b46d361c442c2e47f19", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 28245, "upload_time": "2019-01-30T20:22:00", "url": "https://files.pythonhosted.org/packages/b0/3f/34b8dec7d2cfcfe0ba99d637b4f2d306c1ca0b404107c07c829e085f6b38/spark-sklearn-0.3.0.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "4460d6c8402a5b46d361c442c2e47f19", "sha256": "d78d4f08a3849b243232ef78b63b4babfdb04ec529f996f4699923f40cfce827" }, "downloads": -1, "filename": "spark-sklearn-0.3.0.tar.gz", "has_sig": false, "md5_digest": "4460d6c8402a5b46d361c442c2e47f19", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 28245, "upload_time": "2019-01-30T20:22:00", "url": "https://files.pythonhosted.org/packages/b0/3f/34b8dec7d2cfcfe0ba99d637b4f2d306c1ca0b404107c07c829e085f6b38/spark-sklearn-0.3.0.tar.gz" } ] }