{
"info": {
"author": "Joseph Bradley",
"author_email": "joseph@databricks.com",
"bugtrack_url": null,
"classifiers": [
"Development Status :: 4 - Beta",
"Intended Audience :: Developers",
"License :: OSI Approved :: Apache Software License",
"Natural Language :: English",
"Operating System :: OS Independent",
"Programming Language :: Python",
"Programming Language :: Python :: 2.6",
"Programming Language :: Python :: 2.7",
"Programming Language :: Python :: 3",
"Programming Language :: Python :: 3.2",
"Topic :: Scientific/Engineering"
],
"description": "Scikit-learn integration package for Apache Spark\n=================================================\n\nThis package contains some tools to integrate the `Spark computing framework `_\nwith the popular `scikit-learn machine library `_. Among other things, it can:\n\n- train and evaluate multiple scikit-learn models in parallel. It is a distributed analog to the\n `multicore implementation `_ included by default in ``scikit-learn``\n- convert Spark's Dataframes seamlessly into numpy ``ndarray`` or sparse matrices\n- (experimental) distribute Scipy's sparse matrices as a dataset of sparse vectors\n\nIt focuses on problems that have a small amount of data and that can be run in parallel.\nFor small datasets, it distributes the search for estimator parameters (``GridSearchCV`` in scikit-learn),\nusing Spark. For datasets that do not fit in memory, we recommend using the `distributed implementation in\n`Spark MLlib `_.\n\nThis package distributes simple tasks like grid-search cross-validation.\nIt does not distribute individual learning algorithms (unlike Spark MLlib).\n\nInstallation\n------------\n\nThis package is available on PYPI:\n\n::\n\n\tpip install spark-sklearn\n\nThis project is also available as `Spark package `_.\n\nThe developer version has the following requirements:\n\n- scikit-learn 0.18 or 0.19. Later versions may work, but tests currently are incompatible with 0.20.\n- Spark >= 2.1.1. Spark may be downloaded from the `Spark website `_.\n In order to use this package, you need to use the pyspark interpreter or another Spark-compliant python\n interpreter. See the `Spark guide `_\n for more details.\n- `nose `_ (testing dependency only)\n- pandas, if using the pandas integration or testing. pandas==0.18 has been tested.\n\nIf you want to use a developer version, you just need to make sure the ``python/`` subdirectory is in the\n``PYTHONPATH`` when launching the pyspark interpreter:\n\n::\n\n\tPYTHONPATH=$PYTHONPATH:./python:$SPARK_HOME/bin/pyspark\n\nYou can directly run tests:\n\n::\n\n cd python && ./run-tests.sh\n\nThis requires the environment variable ``SPARK_HOME`` to point to your local copy of Spark.\n\nExample\n-------\n\nHere is a simple example that runs a grid search with Spark. See the `Installation <#installation>`_ section\non how to install the package.\n\n.. code:: python\n\n from sklearn import svm, datasets\n from spark_sklearn import GridSearchCV\n iris = datasets.load_iris()\n parameters = {'kernel':('linear', 'rbf'), 'C':[1, 10]}\n svr = svm.SVC(gamma='auto')\n clf = GridSearchCV(sc, svr, parameters)\n clf.fit(iris.data, iris.target)\n\nThis classifier can be used as a drop-in replacement for any scikit-learn classifier, with the same API.\n\nDocumentation\n-------------\n\n`API documentation `_ is currently hosted on Github pages. To\nbuild the docs yourself, see the instructions in ``docs/``.\n\n.. image:: https://travis-ci.org/databricks/spark-sklearn.svg?branch=master\n :target: https://travis-ci.org/databricks/spark-sklearn",
"description_content_type": "",
"docs_url": null,
"download_url": "",
"downloads": {
"last_day": -1,
"last_month": -1,
"last_week": -1
},
"home_page": "https://github.com/databricks/spark-sklearn",
"keywords": "spark,scikit-learn,distributed computing,machine learning",
"license": "Apache 2.0",
"maintainer": "Tim Hunter",
"maintainer_email": "timhunter@databricks.com",
"name": "spark-sklearn",
"package_url": "https://pypi.org/project/spark-sklearn/",
"platform": "",
"project_url": "https://pypi.org/project/spark-sklearn/",
"project_urls": {
"Homepage": "https://github.com/databricks/spark-sklearn"
},
"release_url": "https://pypi.org/project/spark-sklearn/0.3.0/",
"requires_dist": null,
"requires_python": "",
"summary": "Integration tools for running scikit-learn on Spark",
"version": "0.3.0"
},
"last_serial": 4761271,
"releases": {
"0.1.0": [
{
"comment_text": "",
"digests": {
"md5": "8f4eb6d13d3504df14e83aaed2c83392",
"sha256": "dc5f24ca62cb215e0f5f28fd80b8432c077d70ffe87fa66a319fb863f0ff367a"
},
"downloads": -1,
"filename": "spark-sklearn-0.1.0.tar.gz",
"has_sig": false,
"md5_digest": "8f4eb6d13d3504df14e83aaed2c83392",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 12705,
"upload_time": "2016-01-11T18:35:34",
"url": "https://files.pythonhosted.org/packages/af/94/b5084d829f24d9216829331f27ebc4421eb70237845a4c2ef6576af42434/spark-sklearn-0.1.0.tar.gz"
}
],
"0.1.1": [
{
"comment_text": "",
"digests": {
"md5": "7871167cb45cf82a75494131c0e56668",
"sha256": "abdcc0db8aa1c6d5c26049e8fdcae2b43d6aba7da2964d42f29b3d065d2a4a01"
},
"downloads": -1,
"filename": "spark-sklearn-0.1.1.tar.gz",
"has_sig": false,
"md5_digest": "7871167cb45cf82a75494131c0e56668",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 12769,
"upload_time": "2016-01-11T21:51:35",
"url": "https://files.pythonhosted.org/packages/1a/d3/61b007aee744a95f30909d2516ce3648e547df8966d4d419e81c1749be34/spark-sklearn-0.1.1.tar.gz"
}
],
"0.1.2": [
{
"comment_text": "",
"digests": {
"md5": "6cad560411f5cf10b229be32cb80bfe0",
"sha256": "e1760614889b04721e934be3607796dd78a25db7ab61065b9454629532aa7dc0"
},
"downloads": -1,
"filename": "spark-sklearn-0.1.2.tar.gz",
"has_sig": false,
"md5_digest": "6cad560411f5cf10b229be32cb80bfe0",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 12825,
"upload_time": "2016-03-17T20:15:00",
"url": "https://files.pythonhosted.org/packages/f9/59/76791ab8a6a79d1361cc525e5e957086ea9ee83a5c0d2f06dc217152b172/spark-sklearn-0.1.2.tar.gz"
}
],
"0.2.0": [
{
"comment_text": "",
"digests": {
"md5": "d4a18f2e499157cb835f01854b0fa23c",
"sha256": "07d6f7b9d401269a776e4faf9393b531fcf3fcee5a2c05baf9009928ad8517fd"
},
"downloads": -1,
"filename": "spark-sklearn-0.2.0.tar.gz",
"has_sig": false,
"md5_digest": "d4a18f2e499157cb835f01854b0fa23c",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 23489,
"upload_time": "2016-08-16T16:08:07",
"url": "https://files.pythonhosted.org/packages/e8/1c/8d1094ec833bdadc8841a481245536880d3902f47da0273589d60c91dde6/spark-sklearn-0.2.0.tar.gz"
}
],
"0.2.1": [
{
"comment_text": "",
"digests": {
"md5": "5edb520f062de622691c66c752db3a58",
"sha256": "4a13589f4c8f18e6a6dfddf39639cfb1d48dcb88e8b78b18351bbee4a014311e"
},
"downloads": -1,
"filename": "spark-sklearn-0.2.1.tar.gz",
"has_sig": false,
"md5_digest": "5edb520f062de622691c66c752db3a58",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 25412,
"upload_time": "2017-09-11T20:40:14",
"url": "https://files.pythonhosted.org/packages/b1/b1/4eb9d52d0d414aa0e9e1df83c622bd6d1bf03b6a39e1cec191aa72851063/spark-sklearn-0.2.1.tar.gz"
}
],
"0.2.2": [
{
"comment_text": "",
"digests": {
"md5": "75d869ce02baa3f878864be4d153f363",
"sha256": "745320929b690116ddd244a9d6908c73a2368bdf788c17b2ae886a6d8553ce04"
},
"downloads": -1,
"filename": "spark-sklearn-0.2.2.tar.gz",
"has_sig": false,
"md5_digest": "75d869ce02baa3f878864be4d153f363",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 25509,
"upload_time": "2017-09-20T18:22:46",
"url": "https://files.pythonhosted.org/packages/8a/6c/6739612b734be78f0a864298d1ad5d7746e8a9d21adb523d58fa73d076ad/spark-sklearn-0.2.2.tar.gz"
}
],
"0.2.3": [
{
"comment_text": "",
"digests": {
"md5": "3bdce4131bad01edfbc862ba52152710",
"sha256": "dc3d9d6436fe74e20b57a12411aaed4e304403ecc26e8188cc1d31dbe0c1a33d"
},
"downloads": -1,
"filename": "spark-sklearn-0.2.3.tar.gz",
"has_sig": false,
"md5_digest": "3bdce4131bad01edfbc862ba52152710",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 25561,
"upload_time": "2017-09-29T21:18:11",
"url": "https://files.pythonhosted.org/packages/4b/c5/b05fd936ba1656261113f63a05579166931b1c538995922a6399878a907c/spark-sklearn-0.2.3.tar.gz"
}
],
"0.3.0": [
{
"comment_text": "",
"digests": {
"md5": "4460d6c8402a5b46d361c442c2e47f19",
"sha256": "d78d4f08a3849b243232ef78b63b4babfdb04ec529f996f4699923f40cfce827"
},
"downloads": -1,
"filename": "spark-sklearn-0.3.0.tar.gz",
"has_sig": false,
"md5_digest": "4460d6c8402a5b46d361c442c2e47f19",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 28245,
"upload_time": "2019-01-30T20:22:00",
"url": "https://files.pythonhosted.org/packages/b0/3f/34b8dec7d2cfcfe0ba99d637b4f2d306c1ca0b404107c07c829e085f6b38/spark-sklearn-0.3.0.tar.gz"
}
]
},
"urls": [
{
"comment_text": "",
"digests": {
"md5": "4460d6c8402a5b46d361c442c2e47f19",
"sha256": "d78d4f08a3849b243232ef78b63b4babfdb04ec529f996f4699923f40cfce827"
},
"downloads": -1,
"filename": "spark-sklearn-0.3.0.tar.gz",
"has_sig": false,
"md5_digest": "4460d6c8402a5b46d361c442c2e47f19",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 28245,
"upload_time": "2019-01-30T20:22:00",
"url": "https://files.pythonhosted.org/packages/b0/3f/34b8dec7d2cfcfe0ba99d637b4f2d306c1ca0b404107c07c829e085f6b38/spark-sklearn-0.3.0.tar.gz"
}
]
}