{ "info": { "author": "Daniel Gribel", "author_email": "dgribel@inf.puc-rio.br", "bugtrack_url": null, "classifiers": [ "License :: OSI Approved :: GNU General Public License v3 or later (GPLv3+)", "Programming Language :: Python" ], "description": "# HG-means\n\nSource code of HG-means clustering, an efficient hybrid genetic algorithm proposed for the minimum sum-of-squares clustering (MSSC). This population-based metaheuristic uses K-means as a local search in combination with crossover, mutation, and diversification operators.\n\nAs HG-means algorithm uses K-means, we included the fundamental source files of the fast K-means implementation of Greg Hamerly (to whom we are grateful for making the source code available) in this repository, under the folder `/hamerly`. Original files and complete source code of Greg Hamerly K-means can be found at: https://github.com/ghamerly/fast-kmeans.\n\nFor the exact crossover, HG-means uses the implementation of Dlib (https://github.com/davisking/dlib) for solving an assignment problem. Dlib files are included in `/dlib-master` folder.\n\nHG-means clustering is available as a C++ code, as well as a Python package.\n\n## Related Article\n\n*HG-means: A scalable hybrid genetic algorithm for minimum sum-of-squares clustering*. D. Gribel and T. Vidal, 2019. Pattern Recognition, https://doi.org/10.1016/j.patcog.2018.12.022\n\n## Installation and Run\n\n### C++\n\nTo run the algorithm in C++, go to `/hgmeans` folder and try the following sequence of commands:\n\n`> make`\n\n`> ./hgmeans 'dataset_path' pi_min n2 [nb_clusters]`\n\n### Example\n\n`> make`\n\n`> ./hgmeans 'data/iris.txt' 10 5000 2 5 10`\n\nThis script executes HG-means clustering for \"iris\" dataset, with 10 solutions in population, a maximum of 5000 iterations, and 2, 5 and 10 clusters.\n\n**Important:** You can provide a ground-truth file with the labels of clusters. In this case, make sure that a file with the same name of the dataset and '.label' extension is placed in the same folder of the dataset. If this file is provided, HG-means clustering will compute clustering performance metrics. See section **Data format** to check the expected data format for datasets and labels files.\n\n### Parameters of the algorithm\n\n`dataset_path`: The path of dataset.\n\n`pi_min` (default = 10): Population size. Determines the size of the population in the genetic algorithm.\n\n`n2` (default = 5000): Maximum number of iterations. Determines the total number of iterations the algorithm will take.\n\n`[nb_clusters]`: The list with number of clusters. You can pass multiple values, separated by a single space.\n\n### Python\n\nHG-means is also available as a Python package.\n\nIf you use Windows, please install C++ Build tools, which can be downloaded here: https://go.microsoft.com/fwlink/?LinkId=691126\n\n\n\nTo install HG-means, run the following installation command:\n\n`> python -m pip install hgmeans`\n\nThat is it! Now, open your Python interface, import the package and create an instance of HG-means. To execute it, just call function `Go()` with the corresponding parameters. See an example below:\n\n`>>> import hgmeans`\n\n`>>> my_demo = hgmeans.PyHGMeans()`\n\n`>>> my_demo.Go('data/iris.txt', 10, 5000, [2,5,10])`\n\nThis script executes HG-means algorithm for \"iris\" dataset, with 10 solutions in population, a maximum of 5000 iterations, and 2, 5 and 10 clusters. Here the number of clusters is passed in an array, so values are separated by commas.\n\n## Data format\n\n**Dataset files.** In the first line of a dataset file, the number of data points (n) and the dimensionality of the data (d) is set, separated by a single space. The remaining lines correspond to the coordinates of data points. Each line contains the values of the d features of a sample, where x_ij correspond to the j-th feature of the i-th sample of the data. Each feature value is separated by a single space, as depicted in the scheme below:\n\n| n | d | | | |\n|------|------|------|-----|------|\n| x_11 | x_12 | x_13 | ... | x_1d |\n| x_21 | x_22 | x_23 | ... | x_2d |\n| .... | .... | .... | ... | .... |\n| x_n1 | x_n2 | x_n3 | ... | x_nd |\n\nSome datasets are provided in `/data` folder in HG-means repository.\n\n**Labels files.** The content of a labels file exhibits the cluster of each sample of the dataset according to ground-truth, where y_i correspond to the label of the i-th sample:\n\ny_1\n\ny_2\n\n...\n\ny_n\n\n**Important**: Labels files must have '.label' extension. Some labels are provided in `/data` folder in HG-means repository.\n\n", "description_content_type": "", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/danielgribel/hg-means", "keywords": "clustering optimization", "license": "", "maintainer": "", "maintainer_email": "", "name": "hgmeans", "package_url": "https://pypi.org/project/hgmeans/", "platform": "", "project_url": "https://pypi.org/project/hgmeans/", "project_urls": { "Homepage": "https://github.com/danielgribel/hg-means" }, "release_url": "https://pypi.org/project/hgmeans/2.0/", "requires_dist": null, "requires_python": "", "summary": "HG-means clustering for minimum sum-of-squares formulation", "version": "2.0" }, "last_serial": 4915805, "releases": { "2.0": [ { "comment_text": "", "digests": { "md5": "6509e35134f967a234f96a3ad10dcb32", "sha256": "bec8daf5b770e85a09455c1ac02a8528b26535407f26bec449c309eb3b7477a6" }, "downloads": -1, "filename": "hgmeans-2.0.tar.gz", "has_sig": false, "md5_digest": "6509e35134f967a234f96a3ad10dcb32", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 1725248, "upload_time": "2019-03-01T20:51:00", "url": "https://files.pythonhosted.org/packages/e5/0a/994791b37e84011333078363027a0577bf29d1eae9be019c538cd21c0447/hgmeans-2.0.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "6509e35134f967a234f96a3ad10dcb32", "sha256": "bec8daf5b770e85a09455c1ac02a8528b26535407f26bec449c309eb3b7477a6" }, "downloads": -1, "filename": "hgmeans-2.0.tar.gz", "has_sig": false, "md5_digest": "6509e35134f967a234f96a3ad10dcb32", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 1725248, "upload_time": "2019-03-01T20:51:00", "url": "https://files.pythonhosted.org/packages/e5/0a/994791b37e84011333078363027a0577bf29d1eae9be019c538cd21c0447/hgmeans-2.0.tar.gz" } ] }