{ "info": { "author": "Kensuke Mitsuzawa", "author_email": "kensuke.mit@gmail.com", "bugtrack_url": null, "classifiers": [], "description": "flexible-clustering-tree\n========================\n\n--------------\n\nWhat\u2019s this?\n============\n\nIn the context of **clustering** task, ``flexible-clustering-tree``\nprovides you **easy** and **controllable** clustering framework.\n\n|image0|\n\nBackground\n==========\n\nLet\u2019s suppose, you have huge data. You\u2019d like to observe data as easy as\npossible.\n\nHierarchical clustering is a way to see clustering tree. However,\nhierarchical clustering tends to fall into local optimization.\n\nSo, you need other clustering method. But at the same time, you wanna\nobserve your data with tree structure style.\n\nhere, ``flexible-clustering-tree`` could give you simple way from data\ninto tree viewer(d3 based)\n\nYou could set any kinds of clustering algorithm such as Kmeans, DBSCAN,\nSpectral-Clustering.\n\nMulti feature and Multi clustering\n----------------------------------\n\nDuring making a tree, you might want use various kind of clustering\nalgorithm. For example, you use Kmeans for the 1st later of a tree, and\nDBSCAN for the 2nd layer of a tree.\n\nAnd you might also use various kind of feature type depending on a layer\nof a tree. For example, in the context of document clustering, \u201ctitle\u201d\nof news for the 1st layer, and \u201cwhole text\u201d for the 2nd layer.\n\nThe example below, this is a clustering tree of 20-news data set.\n\n- 1st layer(red highlight) is done with HDBSCAN clustering, and feature\n is dense vector of ``Subject`` text, which is converted by word2vec\n model.\n- 2nd layer(blue highlight) is done with Kmeans, and feature is sparse\n vector of whole text(BOW).\n\nYou could design your clustering tree as you want!\n\n|image1|\n\nBoth are possible ``flexible-clustering-tree``!\n\nContribution\n============\n\n- Easy interface(scikit-learn way) from data(feature matrix) into a\n tree viewer\n- Possible to make various clustering algorithms ensemble\n- Possible to set various feature types\n\nHow to use?\n===========\n\n.. code:: python\n\n from flexible_clustering_tree import FeatureMatrixObject, MultiFeatureMatrixObject\n from flexible_clustering_tree import ClusteringOperator, MultiClusteringOperator\n from flexible_clustering_tree import FlexibleClustering\n import numpy\n import codecs\n\n # set feature matrix\n # an input of 1st layer is list of string\n input_string = ['d'] * 10 + ['e'] * 10 + ['c'] * 10 + ['a'] * 10 + ['b'] * 10 + ['f'] * 50\n f_obj_1st = FeatureMatrixObject(0, feature_strings=input_string)\n # an input of 2nd layer is the dense matrix (100, 300)\n f_obj_2nd = FeatureMatrixObject(1, matrix_object=numpy.random.rand(100, 300))\n # an input of 3rd layer is the dense matrix (100, 50)\n f_obj_3rd = FeatureMatrixObject(2, matrix_object=numpy.random.rand(100, 50))\n dict_index2label = {i: \"label-{}\".format(i) for i in range(0, 100)}\n multi_feature_matrix = MultiFeatureMatrixObject(\n [f_obj_1st, f_obj_2nd, f_obj_3rd],\n dict_index2label=dict_index2label\n )\n\n # set clustering operation\n from sklearn.cluster.k_means_ import KMeans\n from hdbscan.hdbscan_ import HDBSCAN\n from flexible_clustering_tree import StringAggregation\n c_operation_1st = ClusteringOperator(0, -1, StringAggregation())\n c_operation_2nd = ClusteringOperator(1, 10, KMeans(10))\n c_operation_3rd = ClusteringOperator(2, -1, HDBSCAN())\n multi_clustering = MultiClusteringOperator([c_operation_1st, c_operation_2nd])\n\n # run flexible clustering\n clustering_runner = FlexibleClustering(max_depth=5)\n index2cluster_no = clustering_runner.fit_transform(multi_feature_matrix, multi_clustering)\n html = clustering_runner.clustering_tree.to_html()\n\n # output to html\n with codecs.open(\"out.html\", \"w\", \"utf-8\") as f:\n f.write(html)\n\n # you're also able to generate tables via Pandas.\n import pandas\n table_objects = clustering_runner.clustering_tree.to_objects()\n print(pandas.DataFrame(table_objects['cluster_information']))\n print(pandas.DataFrame(table_objects['leaf_information']))\n\nThe output of pandas table is below.\n\nThe relation-table of clusters is in ``cluster_information``.\n\n::\n\n cluster_id parent_id depth_level clustering_method\n 0 0 -1 1 StringAggregation\n 1 1 -1 1 StringAggregation\n 2 2 -1 1 StringAggregation\n 3 3 -1 1 StringAggregation\n 4 4 -1 1 StringAggregation\n 5 5 -1 1 StringAggregation\n 6 6 5 2 KMeans\n 7 7 5 2 KMeans\n 8 8 5 2 KMeans\n 9 9 5 2 KMeans\n 10 10 5 2 KMeans\n 11 11 5 2 KMeans\n 12 12 5 2 KMeans\n 13 13 5 2 KMeans\n 14 14 5 2 KMeans\n 15 15 5 2 KMeans\n\nThe relation-table of leaf nodes is in ``leaf_information``.\n\n::\n\n leaf_id cluster_id label args\n 0 0 0 label-0 None\n 1 1 0 label-1 None\n 2 2 0 label-2 None\n 3 3 0 label-3 None\n 4 4 0 label-4 None\n .. ... ... ... ...\n 95 95 14 label-95 None\n 96 96 8 label-96 None\n 97 97 13 label-97 None\n 98 98 10 label-98 None\n 99 99 12 label-99 None\n [100 rows x 4 columns]\n\nYou could see examples at ``/examples``.\n\nsetup\n=====\n\n.. code:: bash\n\n pip install flexible_clustering_tree\n\nor close this repository\n\n.. code:: bash\n\n python setup.py install\n\nFor Developers\n==============\n\nEnvironment\n-----------\n\n- Python >= 3.x\n\nDev/Test environment by Docker\n------------------------------\n\nYou build dev/test environment with Docker container. Here is simple\nprocedure,\n\n1. build docker image\n2. start docker container\n3. run test in the container\n\n.. code:: bash\n\n $ cd tests\n $ docker-compose build\n $ docker-compose up\n $ docker run --name test-container -v `pwd`:/codes/flexible-clustering-tree/ -dt tests_dev_env\n $ docker exec -it test-container python /codes/flexible-clustering-tree/setup.py test\n\nIf you\u2019re using pycharm professional edition, you could call a\ndocker-compose file as Python interpreter.\n\n.. |image0| image:: https://user-images.githubusercontent.com/1772712/47308081-9980cd00-d66b-11e8-98c0-a275db021cd7.gif\n.. |image1| image:: https://user-images.githubusercontent.com/1772712/47308468-abaf3b00-d66c-11e8-9a08-26facc39e80e.png\n\n\n\n", "description_content_type": "text/markdown", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/Kensuke-Mitsuzawa/flexible_clustering_tree", "keywords": "", "license": "", "maintainer": "", "maintainer_email": "", "name": "flexible-clustering-tree", "package_url": "https://pypi.org/project/flexible-clustering-tree/", "platform": "", "project_url": "https://pypi.org/project/flexible-clustering-tree/", "project_urls": { "Homepage": "https://github.com/Kensuke-Mitsuzawa/flexible_clustering_tree" }, "release_url": "https://pypi.org/project/flexible-clustering-tree/0.21/", "requires_dist": [ "sqlitedict", "scipy", "numpy", "scikit-learn", "hdbscan", "pyexcelerate", "jinja2", "pypandoc" ], "requires_python": "", "summary": "easy interface for ensemble clustering", "version": "0.21", "yanked": false, "yanked_reason": null }, "last_serial": 6049804, "releases": { "0.11": [ { "comment_text": "", "digests": { "md5": "649e5529159c0b6d69f7fd076daac09a", "sha256": "4e16d03018c6a30378e5076d2a2230ef6939a1379fa8b1bcbbb6af5da8c93a81" }, "downloads": -1, "filename": "flexible_clustering_tree-0.11.tar.gz", "has_sig": false, "md5_digest": "649e5529159c0b6d69f7fd076daac09a", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 53694, "upload_time": "2018-10-22T18:45:34", "upload_time_iso_8601": "2018-10-22T18:45:34.507127Z", "url": "https://files.pythonhosted.org/packages/d5/70/8b131aac7cb219260cdde4a17fee2a7bf7afdb682b4b31275606b72e8942/flexible_clustering_tree-0.11.tar.gz", "yanked": false, "yanked_reason": null } ], "0.12": [ { "comment_text": "", "digests": { "md5": "17f6accafe577a330c00979d9e1f758a", "sha256": "b20d19d20dc93b894a411baba52033d16d94c6c80bd5695fe5d74a49a46f3b1a" }, "downloads": -1, "filename": "flexible_clustering_tree-0.12-py3-none-any.whl", "has_sig": false, "md5_digest": "17f6accafe577a330c00979d9e1f758a", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 22183, "upload_time": "2019-05-09T14:31:30", "upload_time_iso_8601": "2019-05-09T14:31:30.349812Z", "url": "https://files.pythonhosted.org/packages/0d/1d/184ec73c1115e36264fe597456e9dbbc9c65f169b7bf2aa85d94a4e0892c/flexible_clustering_tree-0.12-py3-none-any.whl", "yanked": false, "yanked_reason": null } ], "0.13": [ { "comment_text": "", "digests": { "md5": "0f11ed9c3a1e503e098906e6276d5c90", "sha256": "23672dfcd4e3800bd6756d8dc40ca7b7eecfbd734240b5079996b4c6538536e8" }, "downloads": -1, "filename": "flexible_clustering_tree-0.13-py3-none-any.whl", "has_sig": false, "md5_digest": "0f11ed9c3a1e503e098906e6276d5c90", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 42719, "upload_time": "2019-10-22T17:07:36", "upload_time_iso_8601": "2019-10-22T17:07:36.853505Z", "url": "https://files.pythonhosted.org/packages/23/7b/60491153949d6ef289c01e2f22981b3706a2b12265d93a2e7834ab1c2c19/flexible_clustering_tree-0.13-py3-none-any.whl", "yanked": false, "yanked_reason": null }, { "comment_text": "", "digests": { "md5": "d2374af5e16a2cb0c50ba79625678017", "sha256": "0f725976d60428723c99777f578f9abe90bb042a512dba491cdd35359e806c9c" }, "downloads": -1, "filename": "flexible_clustering_tree-0.13.tar.gz", "has_sig": false, "md5_digest": "d2374af5e16a2cb0c50ba79625678017", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 44668, "upload_time": "2019-10-22T17:07:38", "upload_time_iso_8601": "2019-10-22T17:07:38.357693Z", "url": "https://files.pythonhosted.org/packages/cb/db/e6a5f3a84bf1fa26e21c595026517e7f8f61610785d9bcc35124f3925792/flexible_clustering_tree-0.13.tar.gz", "yanked": false, "yanked_reason": null } ], "0.14": [ { "comment_text": "", "digests": { "md5": "6b048f608e5c8b9f79a21c84f734471e", "sha256": "04068d33f94d04770c5e4129c6dc134d36b21c0390b8bb52e4c9a9970f80bf5c" }, "downloads": -1, "filename": "flexible_clustering_tree-0.14-py3-none-any.whl", "has_sig": false, "md5_digest": "6b048f608e5c8b9f79a21c84f734471e", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 42780, "upload_time": "2019-10-27T17:17:08", "upload_time_iso_8601": "2019-10-27T17:17:08.438938Z", "url": "https://files.pythonhosted.org/packages/7c/2d/5cdde3cf1cd6dd5205427d1bab2a10a28cc3167e02c7fe7b4815319155af/flexible_clustering_tree-0.14-py3-none-any.whl", "yanked": false, "yanked_reason": null }, { "comment_text": "", "digests": { "md5": "04d70302da3c705c211aa0e50f77c31a", "sha256": "26866475c5143229cdbe7b37ba523abae5937e143c95ac7dcb2ac58b47a14b77" }, "downloads": -1, "filename": "flexible_clustering_tree-0.14.tar.gz", "has_sig": false, "md5_digest": "04d70302da3c705c211aa0e50f77c31a", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 44714, "upload_time": "2019-10-27T17:17:09", "upload_time_iso_8601": "2019-10-27T17:17:09.986466Z", "url": "https://files.pythonhosted.org/packages/9f/4d/e7011087671d226b457858933d916fef33f018782da40dfa515de7f0a6b6/flexible_clustering_tree-0.14.tar.gz", "yanked": false, "yanked_reason": null } ], "0.15": [ { "comment_text": "", "digests": { "md5": "5090a704e53994fcfb2f9fab477badb8", "sha256": "08e9db1a611547d666ec27f36e18eec383ef41e413f51ce143c9bf83beb7fc1e" }, "downloads": -1, "filename": "flexible_clustering_tree-0.15-py3-none-any.whl", "has_sig": false, "md5_digest": "5090a704e53994fcfb2f9fab477badb8", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 43634, "upload_time": "2019-10-27T18:29:20", "upload_time_iso_8601": "2019-10-27T18:29:20.679129Z", "url": "https://files.pythonhosted.org/packages/04/4b/b52297be6f87f72313e017d338d5cdbe3836c70a7871f8a8bc8f19b2600f/flexible_clustering_tree-0.15-py3-none-any.whl", "yanked": false, "yanked_reason": null }, { "comment_text": "", "digests": { "md5": "7a7a79c0245e5dc0b5813ab4180afae3", "sha256": "490341b264844360849f2a03849d17cd83d02e46aa53b73a13392857d31dcfe0" }, "downloads": -1, "filename": "flexible_clustering_tree-0.15.tar.gz", "has_sig": false, "md5_digest": "7a7a79c0245e5dc0b5813ab4180afae3", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 45127, "upload_time": "2019-10-27T18:29:22", "upload_time_iso_8601": "2019-10-27T18:29:22.299580Z", "url": "https://files.pythonhosted.org/packages/8c/aa/6927f2f429b79c3132532937950116d3bb5f2c6f7ac25ddc94f9a4d1d586/flexible_clustering_tree-0.15.tar.gz", "yanked": false, "yanked_reason": null } ], "0.16": [ { "comment_text": "", "digests": { "md5": "29aa83ef9b19ad7572fff52cb4e38c5f", "sha256": "16765a4ef7d8812e9af9dfe93a2cb34180810b312082823cc2666f9f43888051" }, "downloads": -1, "filename": "flexible_clustering_tree-0.16-py3-none-any.whl", "has_sig": false, "md5_digest": "29aa83ef9b19ad7572fff52cb4e38c5f", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 46853, "upload_time": "2019-10-28T10:55:02", "upload_time_iso_8601": "2019-10-28T10:55:02.499941Z", "url": "https://files.pythonhosted.org/packages/3a/00/8c05056ad873be369e576cb6dfa01b6981f70e3f5555f2171d4d4b0eb8b2/flexible_clustering_tree-0.16-py3-none-any.whl", "yanked": false, "yanked_reason": null }, { "comment_text": "", "digests": { "md5": "3666f3051e91fe90f25b58a697e29a59", "sha256": "a91636bbc32acc9dd6993af7b3b63d273482cc048f473af9b26d84373d50ad60" }, "downloads": -1, "filename": "flexible_clustering_tree-0.16.tar.gz", "has_sig": false, "md5_digest": "3666f3051e91fe90f25b58a697e29a59", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 48591, "upload_time": "2019-10-28T10:55:04", "upload_time_iso_8601": "2019-10-28T10:55:04.573623Z", "url": "https://files.pythonhosted.org/packages/35/d3/875298c6d2a2ebcf8b333f1db2b730cf24e7fdfc89eca9da078af0d87ddb/flexible_clustering_tree-0.16.tar.gz", "yanked": false, "yanked_reason": null } ], "0.2": [ { "comment_text": "", "digests": { "md5": "093c07f2ea8132100f2b95d55d8bbf72", "sha256": "433dbf56471e5265edf563e6014375eee40df88cb4d6573a5f68b35a1d33dea3" }, "downloads": -1, "filename": "flexible_clustering_tree-0.2-py3-none-any.whl", "has_sig": false, "md5_digest": "093c07f2ea8132100f2b95d55d8bbf72", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 49242, "upload_time": "2019-10-29T23:26:43", "upload_time_iso_8601": "2019-10-29T23:26:43.561654Z", "url": "https://files.pythonhosted.org/packages/40/95/25e1346633897a420acfee31220bef5a95d5fc976d7cb3aa089f775d3b95/flexible_clustering_tree-0.2-py3-none-any.whl", "yanked": false, "yanked_reason": null }, { "comment_text": "", "digests": { "md5": "444dc0a452c24bf3be890d41a7fab7ee", "sha256": "dd17157fd4c08ade8d5876e9627272b53832f2dbf65f88044ea0b07f2da1cecc" }, "downloads": -1, "filename": "flexible_clustering_tree-0.2.tar.gz", "has_sig": false, "md5_digest": "444dc0a452c24bf3be890d41a7fab7ee", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 51076, "upload_time": "2019-10-29T23:26:46", "upload_time_iso_8601": "2019-10-29T23:26:46.230925Z", "url": "https://files.pythonhosted.org/packages/f7/32/3deceb4f964e764352722e945974fafa7c73d7b7421b2f19280542edb9bc/flexible_clustering_tree-0.2.tar.gz", "yanked": false, "yanked_reason": null } ], "0.21": [ { "comment_text": "", "digests": { "md5": "721c3e023f0915229465cc89e42832f8", "sha256": "b9d320b32dc9c90f9d35fc4c8d60f21a027b77f59abc257891511eda7a73cd14" }, "downloads": -1, "filename": "flexible_clustering_tree-0.21-py3-none-any.whl", "has_sig": false, "md5_digest": "721c3e023f0915229465cc89e42832f8", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 49254, "upload_time": "2019-10-29T23:30:19", "upload_time_iso_8601": "2019-10-29T23:30:19.264697Z", "url": "https://files.pythonhosted.org/packages/a7/43/3672c1b458afaec869351c5a1e32a0d095ee5585bb4c2c9d2cd50ca9ae95/flexible_clustering_tree-0.21-py3-none-any.whl", "yanked": false, "yanked_reason": null }, { "comment_text": "", "digests": { "md5": "15a1e39856bc4f08603b0eff532ee815", "sha256": "bf15002fe0edd6d3b1a580e536c4cf356a63e0167eba8b133e55d449d3cc9550" }, "downloads": -1, "filename": "flexible_clustering_tree-0.21.tar.gz", "has_sig": false, "md5_digest": "15a1e39856bc4f08603b0eff532ee815", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 51074, "upload_time": "2019-10-29T23:30:21", "upload_time_iso_8601": "2019-10-29T23:30:21.219566Z", "url": "https://files.pythonhosted.org/packages/1a/82/bdf9309d0cd4f8a0c6d356feb76889faba81ad36429552451c64e6824c74/flexible_clustering_tree-0.21.tar.gz", "yanked": false, "yanked_reason": null } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "721c3e023f0915229465cc89e42832f8", "sha256": "b9d320b32dc9c90f9d35fc4c8d60f21a027b77f59abc257891511eda7a73cd14" }, "downloads": -1, "filename": "flexible_clustering_tree-0.21-py3-none-any.whl", "has_sig": false, "md5_digest": "721c3e023f0915229465cc89e42832f8", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 49254, "upload_time": "2019-10-29T23:30:19", "upload_time_iso_8601": "2019-10-29T23:30:19.264697Z", "url": "https://files.pythonhosted.org/packages/a7/43/3672c1b458afaec869351c5a1e32a0d095ee5585bb4c2c9d2cd50ca9ae95/flexible_clustering_tree-0.21-py3-none-any.whl", "yanked": false, "yanked_reason": null }, { "comment_text": "", "digests": { "md5": "15a1e39856bc4f08603b0eff532ee815", "sha256": "bf15002fe0edd6d3b1a580e536c4cf356a63e0167eba8b133e55d449d3cc9550" }, "downloads": -1, "filename": "flexible_clustering_tree-0.21.tar.gz", "has_sig": false, "md5_digest": "15a1e39856bc4f08603b0eff532ee815", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 51074, "upload_time": "2019-10-29T23:30:21", "upload_time_iso_8601": "2019-10-29T23:30:21.219566Z", "url": "https://files.pythonhosted.org/packages/1a/82/bdf9309d0cd4f8a0c6d356feb76889faba81ad36429552451c64e6824c74/flexible_clustering_tree-0.21.tar.gz", "yanked": false, "yanked_reason": null } ], "vulnerabilities": [] }