{ "info": { "author": "Jack Maney", "author_email": "jackmaney@gmail.com", "bugtrack_url": null, "classifiers": [], "description": "K-means++ in Pandas\n===================\n\nAn implementation of the [k-means++ clustering algorithm](http://en.wikipedia.org/wiki/K-means%2B%2B) using [Pandas](http://pandas.pydata.org/).\n\nIMPORTANT NOTE\n--------------\n\n**This package should not be used in production.** The implementation of k-means++ contained therein is much slower than [that of scikit-learn](http://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html). Use that instead.\n\nThe only reason why I wrote any of this is to teach myself Pandas.\n\nPrerequisites\n-------------\n\n* Python 2.7 or lower; this is not Python 3 compatible (yet).\n* [Pandas](http://pandas.pydata.org/) (obviously).\n* [NumPy](http://numpy.org)\n\nInstallation\n------------\n\nIf you have [pip](http://www.pip-installer.org/en/latest/installing.html), then just do\n\n\tpip install k-means-plus-plus\n\nOtherwise,\n\n* Clone the repository:\n\n\t\tgit clone https://github.com/jackmaney/k-means-plus-plus-pandas.git\n\n* Enter the newly-created folder containing the repo\n\n\t\tcd k-means-plus-plus-pandas\n\n* And run the installation manually:\n\n\t\tpython setup.py install\n\n\n\nUsage\n-----\n\nHere are the constructor arguments:\n\n* `data_frame`: A Pandas data frame representing the data that you wish to cluster. Rows represent observations, and columns represent variables.\n\n* `k`: The number of clusters that you want.\n\n* `columns=None`: A list of column names upon which you wish to cluster your data. If this argument isn't provided, then all of the columns are selected. **Note:** Columns upon which you want to cluster must be numeric and have no `numpy.nan` values.\n\n* `max_iterations=None`: The maximum number of times that you wish to iterate k-means. If no value is provided, then the iterations continue until stability is reached (ie the cluster assignments don't change between one iteration and the next).\n\n* `appended_column_name=None`: If this value is set with a string, then a column will be appended to your data with the given name that contains the cluster assignments (which are integers from 0 to `k-1`). If this argument is not set, then you still have access to the clusters via the `clusters` attribute.\n\nOnce you've constructed a `KMeansPlusPlus` object, then just call the `cluster` method, and everything else should happen automagically. Take a look at the `examples` folder.\n\nTODO:\n----\n\n* Add on features that take iterations of k-means++ clusters and compares them via, eg, concordance matrices, Jaccard indices, etc.\n\n* Given a data frame, implement the so-called [Elbow Method](http://en.wikipedia.org/wiki/Determining_the_number_of_clusters_in_a_data_set#The_Elbow_Method) to take a stab at an optimal value for `k`.\n\n* ~~Make this into a proper Python module that can be installed via pip.~~\n\n* Python 3 compatibility (probably via six).", "description_content_type": null, "docs_url": null, "download_url": "UNKNOWN", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/jackmaney/k-means-plus-plus-pandas", "keywords": null, "license": "MIT", "maintainer": null, "maintainer_email": null, "name": "k-means-plus-plus", "package_url": "https://pypi.org/project/k-means-plus-plus/", "platform": "UNKNOWN", "project_url": "https://pypi.org/project/k-means-plus-plus/", "project_urls": { "Download": "UNKNOWN", "Homepage": "https://github.com/jackmaney/k-means-plus-plus-pandas" }, "release_url": "https://pypi.org/project/k-means-plus-plus/0.1.0/", "requires_dist": null, "requires_python": null, "summary": "K-Means++ Clustering for Pandas DataFrames", "version": "0.1.0" }, "last_serial": 1448003, "releases": { "0.0.0": [], "0.0.1": [], "0.0.2": [], "0.0.3": [], "0.0.4": [ { "comment_text": "", "digests": { "md5": "dd12eebc14d440ea0941042c5fe87ba2", "sha256": "261fe4dcb8f461363961ca9437b7904ee765e9766b80a72e63405cdb6b5b6522" }, "downloads": -1, "filename": "k-means-plus-plus-0.0.4.tar.gz", "has_sig": false, "md5_digest": "dd12eebc14d440ea0941042c5fe87ba2", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 4779, "upload_time": "2014-01-24T02:49:02", "url": "https://files.pythonhosted.org/packages/30/71/22fe12f0165b4b2051ae7cb734a1f3f9789ed81c30f3455756a747f8900e/k-means-plus-plus-0.0.4.tar.gz" } ], "0.0.5": [ { "comment_text": "", "digests": { "md5": "9cba36849615ffbf6d88b580fad18e39", "sha256": "29fbd4d89db661aadfcd13b12e8d16f275ff0ce23c35148c33ff9204aa4dd19a" }, "downloads": -1, "filename": "k-means-plus-plus-0.0.5.tar.gz", "has_sig": false, "md5_digest": "9cba36849615ffbf6d88b580fad18e39", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 4781, "upload_time": "2014-01-24T02:51:30", "url": "https://files.pythonhosted.org/packages/05/83/a625d459ea2e612308368ae4db99170659d4f42029676dce8a60c88f2099/k-means-plus-plus-0.0.5.tar.gz" } ], "0.0.6": [ { "comment_text": "", "digests": { "md5": "9c185ba13689f2b7f1185d3a2647f74c", "sha256": "c09e70b5efc2615d7675f4af4da837d3054e4fef56c1ac223f4484c9359910f2" }, "downloads": -1, "filename": "k-means-plus-plus-0.0.6.tar.gz", "has_sig": false, "md5_digest": "9c185ba13689f2b7f1185d3a2647f74c", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 4382, "upload_time": "2014-01-24T03:11:18", "url": "https://files.pythonhosted.org/packages/4a/37/574ed379b63f71f25f42ad509627bd0e02361e80c5bd15d1dcd8f101cf7c/k-means-plus-plus-0.0.6.tar.gz" } ], "0.1.0": [ { "comment_text": "", "digests": { "md5": "b78f055e3d134a1935ac32e3eca2f631", "sha256": "fe73020b4bc3701d96387584ec8aa7204acdc0e221a3793cbac0d8abed8fcde7" }, "downloads": -1, "filename": "k-means-plus-plus-0.1.0.tar.gz", "has_sig": false, "md5_digest": "b78f055e3d134a1935ac32e3eca2f631", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 4984, "upload_time": "2015-03-04T17:19:20", "url": "https://files.pythonhosted.org/packages/b6/63/c9029b1a8b8915e76e49c65f363bcc812f55b5b9e282e38b3832c9c46fd8/k-means-plus-plus-0.1.0.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "b78f055e3d134a1935ac32e3eca2f631", "sha256": "fe73020b4bc3701d96387584ec8aa7204acdc0e221a3793cbac0d8abed8fcde7" }, "downloads": -1, "filename": "k-means-plus-plus-0.1.0.tar.gz", "has_sig": false, "md5_digest": "b78f055e3d134a1935ac32e3eca2f631", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 4984, "upload_time": "2015-03-04T17:19:20", "url": "https://files.pythonhosted.org/packages/b6/63/c9029b1a8b8915e76e49c65f363bcc812f55b5b9e282e38b3832c9c46fd8/k-means-plus-plus-0.1.0.tar.gz" } ] }