{ "info": { "author": "Gregory Giecold", "author_email": "ggiecold@jimmy.harvard.edu", "bugtrack_url": null, "classifiers": [ "Development Status :: 4 - Beta", "Environment :: Console", "Intended Audience :: Developers", "Intended Audience :: End Users/Desktop", "Intended Audience :: Science/Research", "License :: OSI Approved :: MIT License", "Natural Language :: English", "Operating System :: MacOS :: MacOS X", "Operating System :: POSIX", "Programming Language :: Python :: 2.7", "Topic :: Scientific/Engineering", "Topic :: Scientific/Engineering :: Mathematics", "Topic :: Scientific/Engineering :: Visualization" ], "description": "Concurrent\\_AP\r\n==============\r\n\r\nOverview\r\n--------\r\n\r\nA scalable and concurrent programming implementation of Affinity\r\nPropagation clustering.\r\n\r\nAffinity Propagation is a clustering algorithm based on passing messages\r\nbetween data-points.\r\n\r\nStoring and updating matrices of 'affinities', 'responsibilities' and\r\n'similarities' between samples can be memory-intensive. We address this\r\nissue through the use of an HDF5 data structure, allowing Affinity\r\nPropagation clustering of arbitrary large data-sets, where other Python\r\nimplementations would return a MemoryError on most machines.\r\n\r\nWe also significantly speed up the computations by splitting them up\r\nacross subprocesses, thereby taking full advantage of the resources of\r\nmulti-core processors and bypassing the Global Interpreter Lock of the\r\nstandard Python interpreter, CPython.\r\n\r\nInstallation and Requirements\r\n-----------------------------\r\n\r\nConcurrent\\_AP requires Python 2.7 along with the following packages and\r\na few modules from the Standard Python Library:\r\n\r\n- NumPy >= 1.9\r\n\r\n- psutil\r\n\r\n- PyTables\r\n\r\n- scikit-learn\r\n\r\n- setuptools\r\n\r\nIt is suggested that you check that the required dependencies are installed, although the ``pip`` command below should do this automatically for you. You can indeed most conveniently download Concurrent_AP from the official Python Package Index (PyPI) as follows:\r\n\r\n- open a terminal window;\r\n\r\n- type in the command ``pip install Concurrent_AP``.\r\n\r\nThe code herewith has been tested on Fedora, OS X and Ubuntu and should\r\nwork fine on any other member of the Unix-like family of operating systems.\r\n\r\nUsage and Command Line Options\r\n------------------------------\r\n\r\nSee the docstrings associated to each function of the Concurrent\\_AP\r\nmodule for more information and an understanding of how different tasks\r\nare organized and shared among subprocesses.\r\n\r\nUsage: ``Concurrent_AP [options] file_name``, where ``file_name``\r\ndenotes the path where the data to be processed by Affinity Propagation\r\nclustering is held. The data must consist in tab-separated rows of samples, each column corresponding to a particular feature.\r\n\r\n- ``-c`` or ``--convergence``: specify the number of iterations without\r\n change in the number of clusters that signals convergence (defaults\r\n to 15);\r\n\r\n- ``-d`` or ``--damping``: the damping parameter of Affinity\r\n Propagation (defaults to 0.5);\r\n\r\n- ``-f`` or ``--file``: option to specify the file name or file handle\r\n of the hierarchical data format where the matrices involved in\r\n Affinity Propagation clustering will be stored (defaults to a\r\n temporary file);\r\n\r\n- ``-i`` or ``--iterations``: maximum number of message-passing\r\n iterations (defaults to 200);\r\n\r\n- ``-m`` or ``--multiprocessing``: the number of processes to use;\r\n\r\n- ``-p`` or ``--preference``: the preference parameter of Affinity\r\n Propagation (if not specified, will be determined as the median of\r\n the distribution of pairwise L2 Euclidean distances between samples);\r\n\r\n- ``-s`` or ``--similarities``: determine if a similarity matrix has\r\n been pre-computed and stored in the HDF5 data structure accessible at\r\n the location specified through the command line option ``-f`` or\r\n ``--file`` (see above);\r\n\r\n- ``-v`` or ``--verbose``: whether to be verbose.\r\n\r\nDemo of Concurrent\\_AP\r\n----------------------\r\n\r\nThe following few lines illustrate the use of Concurrent\\_AP on the\r\n'Iris data-set' from the UCI Machine Learning Repository. While the\r\nnumber of samples is here way too small for the benefits of the present\r\nmulti-tasking implementation and the use of an HDF5 data structure to\r\ncome fully into play, this data-set has the advantage of being\r\nwell-known and prone to a quick comparison with scikit-learn's version\r\nof Affinity Propagation clustering.\r\n\r\n- In a Python interpreter console, enter the following few lines, whose\r\n purpose is to create a file containing the Iris data-set that will be\r\n later subjected to Affinity Propagation clustering via\r\n Concurrent\\_AP:\r\n\r\n::\r\n\r\n >>> import numpy as np\r\n >>> from sklearn import datasets\r\n\r\n >>> iris = datasets.load_iris()\r\n >>> data = iris.data\r\n >>> with open('./iris_data.txt', 'w') as f:\r\n np.savetxt(f, data, fmt = '%.4f', delimiter = '\\t')\r\n\r\n- Open a terminal window.\r\n- Type in ``Concurrent_AP --preference 5.47 --v iris_data.txt`` or\r\n simply ``Concurrent_AP iris_data.txt``.\r\n\r\nThe latter will automatically compute a preference parameter from the\r\ndata-set.\r\n\r\nWhen the rounds of message-passing among data-points have completed, a\r\nfolder containing a file of cluster labels and a file of cluster centers\r\nindices both in tab-separated format is created in your current working\r\ndirectory.\r\n\r\nReference\r\n---------\r\n\r\nBrendan J. Frey and Delbert Dueck. \"Clustering by Passing Messages\r\nbetween Data Points\", Science Feb. 2007", "description_content_type": null, "docs_url": null, "download_url": "https://github.com/GGiecold/Concurrent_AP", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/GGiecold/Concurrent_AP", "keywords": "parallel multiprocessing machine-learning concurrent clustering", "license": "MIT License", "maintainer": "", "maintainer_email": "", "name": "Concurrent_AP", "package_url": "https://pypi.org/project/Concurrent_AP/", "platform": "Any", "project_url": "https://pypi.org/project/Concurrent_AP/", "project_urls": { "Download": "https://github.com/GGiecold/Concurrent_AP", "Homepage": "https://github.com/GGiecold/Concurrent_AP" }, "release_url": "https://pypi.org/project/Concurrent_AP/1.4/", "requires_dist": null, "requires_python": null, "summary": "Scalable and parallel programming implementation of Affinity Propagation clustering", "version": "1.4" }, "last_serial": 2001344, "releases": { "1.2": [ { "comment_text": "", "digests": { "md5": "9e7abc7693e2bb5899d7b96760733aec", "sha256": "52ee3b36c7cd8fd38d74ce08a48e3d1426bbb1a4d50671721bc37b0f8364ee90" }, "downloads": -1, "filename": "Concurrent_AP-1.2.tar.gz", "has_sig": false, "md5_digest": "9e7abc7693e2bb5899d7b96760733aec", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 14033, "upload_time": "2015-11-23T04:27:04", "url": "https://files.pythonhosted.org/packages/de/f8/89932f731710eda22a55168ad409514c539776a33ce4faef75d25b9170d5/Concurrent_AP-1.2.tar.gz" } ], "1.3": [ { "comment_text": "", "digests": { "md5": "6b8f424bfcc093f02e5d5fe8415de237", "sha256": "aee3a6e5f35a6d18492e4c33543b83ceeafb4d3ad913df70803dcae46a3e9468" }, "downloads": -1, "filename": "Concurrent_AP-1.3.tar.gz", "has_sig": false, "md5_digest": "6b8f424bfcc093f02e5d5fe8415de237", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 14098, "upload_time": "2015-12-07T16:32:37", "url": "https://files.pythonhosted.org/packages/57/e6/14a30f14dda029cbc8fac6c590fca56179d450ef86d6429666a8ce7c82c8/Concurrent_AP-1.3.tar.gz" } ], "1.4": [ { "comment_text": "", "digests": { "md5": "2bb16a3e49dca04f2613977fcefe2104", "sha256": "d6f4b5e82be60d7d4fa9f677e4f823731c5baf5b2487b583f7e329132bb70a2a" }, "downloads": -1, "filename": "Concurrent_AP-1.4.tar.gz", "has_sig": false, "md5_digest": "2bb16a3e49dca04f2613977fcefe2104", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 14269, "upload_time": "2016-01-15T18:44:36", "url": "https://files.pythonhosted.org/packages/46/ae/4120492cfa5569ce13bc9fa2457e444fc877c54ed6178c7f7d8499cea252/Concurrent_AP-1.4.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "2bb16a3e49dca04f2613977fcefe2104", "sha256": "d6f4b5e82be60d7d4fa9f677e4f823731c5baf5b2487b583f7e329132bb70a2a" }, "downloads": -1, "filename": "Concurrent_AP-1.4.tar.gz", "has_sig": false, "md5_digest": "2bb16a3e49dca04f2613977fcefe2104", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 14269, "upload_time": "2016-01-15T18:44:36", "url": "https://files.pythonhosted.org/packages/46/ae/4120492cfa5569ce13bc9fa2457e444fc877c54ed6178c7f7d8499cea252/Concurrent_AP-1.4.tar.gz" } ] }