{ "info": { "author": "Navee", "author_email": "", "bugtrack_url": null, "classifiers": [ "Operating System :: OS Independent", "Programming Language :: Python :: 3" ], "description": "# URL Clustering package\n\nThis is the repository of a python package doing clustering on urls\n\n# Basic usage\n\n## Clustering urls\n\nTo cluster a list of urls, the steps are simple.\n\nFirst, import the url_clustering package.\n\n```python\nimport url_clustering\n...\n```\n\nThen, assuming that you have a list of urls, use the `mdl_clustering` function to clusterize them.\n\n```python\n...\n\nurls = [\"https://www.example.com/path1/path2\", \"https://www.example.com/path1/path3\", ...]\n\nclustering = url_clustering.mdl_clustering(urls, c, alpha, kmax, delimiters=None)\n```\n\nThe `c`, `alpha` and `kmax` parameters defines the clustering rules. The `delimiters` parameters defines the characters used to split the url paths.\n\n## Get a flat clustering\n\nThe `clustering` object returned by the `url_clustering` function has a tree structure. If you want to have a flat clustering (a list of clusters), you might want to use the `flatten_clustering` function.\n\n```python\n...\n\nflat_clustering = url_clustering.flatten_clustering(clustering)\n...\n```\n\n## Use the clustering to clusterize new urls\n\nOnce the clustering has been made, you can extend it with new urls. It will not create new clusters, but will try to put the new urls in the best cluster. To achieve this, use the `clusterize` function.\n\n```python\n...\nnew_clustering = clusterize(\"https://www.example.com/path4/path5\", clustering)\n...\n```\n\nBe careful, this function does return a new clustering, but also modify the initial clustering.\n\n# Advanced usage\n\n## Use custom UrlDescriptor objects\n\nIt is possible to use custom UrlDescriptor objects during the initial clustering and when clusterizing new urls. To do so, your custom descriptor class must inherit from `UrlDescriptor` and define 3 attributes at least : `url`, `tokens` and `tokens_set`. If you want your clustering to be exported in JSON, it must also define the method `to_json`. Here is an example class:\n\n```python\nfrom url_clustering import UrlDescriptor\n\nclass CustomUrlDescriptor(UrlDescriptor):\n def __init__(self, url):\n super().__init__(url)\n self.new_attribute = \"value\"\n\n def to_json(self):\n super_json = super().to_json()\n super_json['new_attribute'] = self.new_attribute\n```\n\n## Export to JSON\n\nIt is possible to export and import your clustering to/from a json file. To do so, you must use the `ClusteringEncoder` and `ClusteringDecoder` classes when exporting/importing to JSON. Here is an example.\n\n```python\nimport json\nfrom url_clustering import ClusteringEncoder, ClusteringDecoder\n\n...\n\nwith open(\"output.json\", \"w\") as f:\n json.dump(clustering, f, indent=4, cls=ClusteringEncoder)\n\nwith open(\"output.json\", \"r\") as f:\n clustering = json.load(cls=ClusteringDecoder)\n```\n\nNotice that the export also works with flat clusterings.\n\n\n", "description_content_type": "text/markdown", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/TheNavee/url-clustering", "keywords": "", "license": "", "maintainer": "", "maintainer_email": "", "name": "url-clustering", "package_url": "https://pypi.org/project/url-clustering/", "platform": "", "project_url": "https://pypi.org/project/url-clustering/", "project_urls": { "Homepage": "https://github.com/TheNavee/url-clustering" }, "release_url": "https://pypi.org/project/url-clustering/1.2.0/", "requires_dist": null, "requires_python": "", "summary": "", "version": "1.2.0" }, "last_serial": 5970487, "releases": { "1.0.0": [ { "comment_text": "", "digests": { "md5": "8d9dac8b5f9249497a48bce30edd8b94", "sha256": "79730fc4388432f79444d2a255701bc15708ad08a735f2dfe81e767ca182d446" }, "downloads": -1, "filename": "url_clustering-1.0.0-py3-none-any.whl", "has_sig": false, "md5_digest": "8d9dac8b5f9249497a48bce30edd8b94", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 5970, "upload_time": "2019-06-21T14:50:54", "url": "https://files.pythonhosted.org/packages/90/38/61f88a70aff2b44d6519169c89097536fbd4dce7e9bdb4615761a9f5bd4a/url_clustering-1.0.0-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "5e063a5065b7c2645df1b602de9d8b2d", "sha256": "cbcbedd79a0a81a406d7d7731bcb364fc8e381f3cbf625b73d0bf001cdc412af" }, "downloads": -1, "filename": "url_clustering-1.0.0.tar.gz", "has_sig": false, "md5_digest": "5e063a5065b7c2645df1b602de9d8b2d", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 5708, "upload_time": "2019-06-21T14:50:56", "url": "https://files.pythonhosted.org/packages/bb/1a/132562871b039f3954c4c9b925aa8f02fd6701262253c9d3eaeb1a4398f6/url_clustering-1.0.0.tar.gz" } ], "1.1.0": [ { "comment_text": "", "digests": { "md5": "64f10de22429ebdb5cf34df613db62d2", "sha256": "83eb7187ca334a25c383a7fd6237d1319ee93d8bc3de7549ba2cb591203b2391" }, "downloads": -1, "filename": "url_clustering-1.1.0-py3-none-any.whl", "has_sig": false, "md5_digest": "64f10de22429ebdb5cf34df613db62d2", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 6361, "upload_time": "2019-10-02T10:20:59", "url": "https://files.pythonhosted.org/packages/fc/0c/7698e178bab05152d2284d1561e2211f8a60be775bced7db18ee05f6e265/url_clustering-1.1.0-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "9dd7271d19a19f6bf55f40bead576ac8", "sha256": "13e741f2bf905991c576fc6d09e98404b8c2b2d711daf04ad5dc6069489e6056" }, "downloads": -1, "filename": "url_clustering-1.1.0.tar.gz", "has_sig": false, "md5_digest": "9dd7271d19a19f6bf55f40bead576ac8", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 6071, "upload_time": "2019-10-02T10:20:03", "url": "https://files.pythonhosted.org/packages/a1/7d/fb3c31bfc55c8fff431a01f05dcb22f128f5bde3699d132b4a7d5917e368/url_clustering-1.1.0.tar.gz" } ], "1.1.1": [ { "comment_text": "", "digests": { "md5": "8f98ea18b7b84b3f74b2e3193ddaa345", "sha256": "db959e1c979282d930d9485a0f276bd719475a9638619a90edffeb8f2eb131eb" }, "downloads": -1, "filename": "url_clustering-1.1.1-py3-none-any.whl", "has_sig": false, "md5_digest": "8f98ea18b7b84b3f74b2e3193ddaa345", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 6358, "upload_time": "2019-10-11T11:16:43", "url": "https://files.pythonhosted.org/packages/19/5a/5aa744c8847566e446bd16d21240adb07da23b669ca64722663219d29142/url_clustering-1.1.1-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "33353b06fc3f12b1f735e69e3241c698", "sha256": "26d7fc62c3f462554697bebd3186a661186e64ac8d87c95b8353afe20d672c32" }, "downloads": -1, "filename": "url_clustering-1.1.1.tar.gz", "has_sig": false, "md5_digest": "33353b06fc3f12b1f735e69e3241c698", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 6044, "upload_time": "2019-10-11T11:16:45", "url": "https://files.pythonhosted.org/packages/af/45/05e0daf0125b4a690da49fc20e7ae09b6ab21045577b61a735e1554dcd85/url_clustering-1.1.1.tar.gz" } ], "1.2.0": [ { "comment_text": "", "digests": { "md5": "9910dd8910a8ec2d96c7381c44c27ae9", "sha256": "1894b052e3f3e155c3c2d87e0413eb7f5d5cf188214904ddcb2c474ab5822e83" }, "downloads": -1, "filename": "url_clustering-1.2.0-py3-none-any.whl", "has_sig": false, "md5_digest": "9910dd8910a8ec2d96c7381c44c27ae9", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 6505, "upload_time": "2019-10-14T09:01:12", "url": "https://files.pythonhosted.org/packages/a4/87/ffa95222f1ebf41baf41da7c1a4d4e3985c4ba2b0d4d22c35f47aa29d116/url_clustering-1.2.0-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "2f574b47809c5c1bce71dab7b3f2d7a2", "sha256": "ae8bb927e21eaf834844b288fc6dd87efaa551499fa3cfe3a73ef62001068c93" }, "downloads": -1, "filename": "url_clustering-1.2.0.tar.gz", "has_sig": false, "md5_digest": "2f574b47809c5c1bce71dab7b3f2d7a2", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 6160, "upload_time": "2019-10-14T09:01:14", "url": "https://files.pythonhosted.org/packages/6c/78/d084be34e218cf07af880ec7de8501acb15d4b07175ac3a323247b7939ae/url_clustering-1.2.0.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "9910dd8910a8ec2d96c7381c44c27ae9", "sha256": "1894b052e3f3e155c3c2d87e0413eb7f5d5cf188214904ddcb2c474ab5822e83" }, "downloads": -1, "filename": "url_clustering-1.2.0-py3-none-any.whl", "has_sig": false, "md5_digest": "9910dd8910a8ec2d96c7381c44c27ae9", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 6505, "upload_time": "2019-10-14T09:01:12", "url": "https://files.pythonhosted.org/packages/a4/87/ffa95222f1ebf41baf41da7c1a4d4e3985c4ba2b0d4d22c35f47aa29d116/url_clustering-1.2.0-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "2f574b47809c5c1bce71dab7b3f2d7a2", "sha256": "ae8bb927e21eaf834844b288fc6dd87efaa551499fa3cfe3a73ef62001068c93" }, "downloads": -1, "filename": "url_clustering-1.2.0.tar.gz", "has_sig": false, "md5_digest": "2f574b47809c5c1bce71dab7b3f2d7a2", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 6160, "upload_time": "2019-10-14T09:01:14", "url": "https://files.pythonhosted.org/packages/6c/78/d084be34e218cf07af880ec7de8501acb15d4b07175ac3a323247b7939ae/url_clustering-1.2.0.tar.gz" } ] }