{ "info": { "author": "Wesley Tansey", "author_email": "wes.tansey@gmail.com", "bugtrack_url": null, "classifiers": [ "Development Status :: 3 - Alpha", "Intended Audience :: Science/Research", "License :: OSI Approved :: GNU Lesser General Public License v3 or later (LGPLv3+)", "Programming Language :: Python :: 2", "Programming Language :: Python :: 2.7", "Topic :: Scientific/Engineering" ], "description": "A Condor-powered K-means implementation\n---------------------------------------\n

\n \"Example\n

\n\n\nThis package lets you run K-means on a really big dataset of vectors. You can even stream the vectors instead of loading them into memory, so long as you can store two lists of doubles the size of your vector count (one list for cluster assignment IDs and one for distance from each vector to its cluster).\n\n## Installation\n\nInstallation is available via `pip`:\n\n```\npip install condor-kmeans\n```\n\n## Usage\n\nThe package assumes you have a CSV file of vectors which you wish to cluster, with one vector per row. Once installed, you can simply run the `kmeans` command:\n\n```\nkmeans path/to/mydata.csv path/to/save/centroids.csv path/to/save/assignments.csv --num_clusters 30 --plusplus --stream --condor --condor_workers 100 --condor_username myusername\n```\n\nThe above command will run k-means on the vectors stored in `mydata.csv` on condor with no more than 100 jobs at a time. It will save the resulting cluster centroids to `centroids.csv`, and the resulting vector-to-cluster assignments to `assignments.csv`. The `--plusplus` command specifies it should use k++ initialization. `--stream` says to stream `mydata.csv` from disk instead of loading it all into memory.\n\nThe current directory is used as the working directory. A working subdirectory named `condor` will be created. All temporary worker files are deleted after each batch of jobs is finished successfully, though the directory structure is maintained (feel free to just `rm -rf condor` afterward if you wish). If one of the workers fails, the master will throw an exception and alert you to the job that failed and where to find its output files; the temporary files will not be deleted if a worker fails.", "description_content_type": null, "docs_url": null, "download_url": null, "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/tansey/condor-kmeans", "keywords": "statistics machinelearning clustering kmeans condor", "license": "MIT", "maintainer": null, "maintainer_email": null, "name": "condor-kmeans", "package_url": "https://pypi.org/project/condor-kmeans/", "platform": "UNKNOWN", "project_url": "https://pypi.org/project/condor-kmeans/", "project_urls": { "Homepage": "https://github.com/tansey/condor-kmeans" }, "release_url": "https://pypi.org/project/condor-kmeans/0.9/", "requires_dist": null, "requires_python": null, "summary": "A package for running k-means on a Condor cluster", "version": "0.9" }, "last_serial": 2400135, "releases": { "0.9": [ { "comment_text": "", "digests": { "md5": "c9f59c4c2cd97a1a6637c4af876f296a", "sha256": "34559e6c71a9dd62d4ac7f9c5b3bd6843a858e3bf4beed86e7eba5a8a064006d" }, "downloads": -1, "filename": "condor_kmeans-0.9-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "c9f59c4c2cd97a1a6637c4af876f296a", "packagetype": "bdist_wheel", "python_version": "py2.py3", "requires_python": null, "size": 13595, "upload_time": "2016-10-14T20:29:01", "url": "https://files.pythonhosted.org/packages/4f/a7/76fcfc342b07a18379b66e861a1351b696ab3dac56182c4e23eb4911e6a2/condor_kmeans-0.9-py2.py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "6cbfe96559ebd74f20eb674618096c89", "sha256": "bf18704528ebea016bc544b832409cb5942f30906786e4043eeb1c06a9a0458d" }, "downloads": -1, "filename": "condor-kmeans-0.9.tar.gz", "has_sig": false, "md5_digest": "6cbfe96559ebd74f20eb674618096c89", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 10769, "upload_time": "2016-10-14T20:29:04", "url": "https://files.pythonhosted.org/packages/94/05/723b443921a15055c305c94916f35f73ddc4e455ba888585f5363c60dd76/condor-kmeans-0.9.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "c9f59c4c2cd97a1a6637c4af876f296a", "sha256": "34559e6c71a9dd62d4ac7f9c5b3bd6843a858e3bf4beed86e7eba5a8a064006d" }, "downloads": -1, "filename": "condor_kmeans-0.9-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "c9f59c4c2cd97a1a6637c4af876f296a", "packagetype": "bdist_wheel", "python_version": "py2.py3", "requires_python": null, "size": 13595, "upload_time": "2016-10-14T20:29:01", "url": "https://files.pythonhosted.org/packages/4f/a7/76fcfc342b07a18379b66e861a1351b696ab3dac56182c4e23eb4911e6a2/condor_kmeans-0.9-py2.py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "6cbfe96559ebd74f20eb674618096c89", "sha256": "bf18704528ebea016bc544b832409cb5942f30906786e4043eeb1c06a9a0458d" }, "downloads": -1, "filename": "condor-kmeans-0.9.tar.gz", "has_sig": false, "md5_digest": "6cbfe96559ebd74f20eb674618096c89", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 10769, "upload_time": "2016-10-14T20:29:04", "url": "https://files.pythonhosted.org/packages/94/05/723b443921a15055c305c94916f35f73ddc4e455ba888585f5363c60dd76/condor-kmeans-0.9.tar.gz" } ] }