{ "info": { "author": "Amit Moscovich", "author_email": "moscovich@gmail.com", "bugtrack_url": null, "classifiers": [ "Development Status :: 4 - Beta", "Intended Audience :: Science/Research", "License :: OSI Approved :: MIT License", "Operating System :: OS Independent", "Programming Language :: Python :: 2", "Programming Language :: Python :: 3", "Topic :: Software Development :: Libraries :: Python Modules" ], "description": "# Graph KNN Python module\n\nGiven an undirected graph and a set of terminal (or seed) vertices T, this python package finds, for every vertex, its K nearest neighbors from the set T.\n\n\n# Usage\n\nThe main functions are **graphknn.algorithm1(W, mask, k)** and **graphknn.algorithm2(W, mask, k)**.\nBoth algorithms have the same interface with slightly different implementations.\n**Input**:\n* **W:** n x n matrix of edge weights, of type scipy.sparse.csr_matrix.\n* **mask:** boolean array of length n indicating which vertices belong to the terminal set T.\n* **k:** how many nearest neighbors to find for each vertex.\n\n**Output:**\n* **knn:** this is an array of size n such that knn[i] is a list of up to k pairs of (distance, terminal_vertex_index). Note that knn[i] is not sorted.\n\n\nAlgorithm 1 is simpler whereas Algorithm 2 has tighter runtime guarantees. We have seen cases where algorithm 1 is faster than algorithm 2 and vice versa, so try both on your data and choose the faster one.\n\n\n## Example\n\n```\nimport numpy as np\nimport scipy.sparse\nimport graphknn\n\ndef build_sparse_undirected_nonnegative_csr_matrix(n):\n W = np.random.random((n,n))\n W = W + W.transpose()\n W[W < 1.5] = np.inf\n return scipy.sparse.csr_matrix(W)\n\n\ndef test_graphknn():\n N = 10\n p = 0.5 \n k = 3\n\n W = build_sparse_undirected_nonnegative_csr_matrix(N)\n mask = np.random.random(N) < p\n\n print('Graph edges:')\n print(W,'\\n')\n\n print('Terminal indices:')\n print(mask.nonzero()[0], '\\n')\n\n result = graphknn.algorithm1(W, mask, k)\n\n print('K nearest terminal indices of all vertices:')\n for i in range(len(result)):\n print('result[{0}]:\\n{1}'.format(i, sorted(result[i])))\n\ntest_graphknn()\n```\n\n# Details\n\nA simple solution to the problem of finding the k nearest terminal vertices is\nto run Dijkstra's algorithm from each of the terminal vertices, forming a |T| by |V| matrix. Then for each vertex i we examine the i-th column of the matrix and pick the k nearest cells (this can be done efficiently using Hoare's selection algorithm). The runtime of this method is O(|T||V|log|V| + |E|).\nHowever, this approach is wasteful, since it spends a lot of time finding irrelevant shortest paths from terminals to vertices that are very far from them.\n\nThis module implements a faster approach that can be described as performing |T| Dijkstra runs in parallel combined with an early stopping rule that prevents unnecessary traversals. This stopping rule simply stops exploring vertices once we have found shortest paths from k different terminals.\n\nFor more details, including a proof of correctness and runtime bounds, see Section 4 and Appendix B of our paper:\n\n[Amit Moscovich](https://mosco.github.io), [Ariel Jaffe](https://arieljaffe.wixsite.com/homepage), [Boaz Nadler](http://www.weizmann.ac.il/math/Nadler/home)\n[**Minimax-optimal semi-supervised regression on unknown manifolds**](https://arxiv.org/abs/1611.02221),\nAISTATS (2017).\n\nPlease cite our paper if using this code for your research.\n\n\n\n", "description_content_type": "", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/mosco/graphknn", "keywords": "graph knn dijkstra shortest path", "license": "MIT", "maintainer": "", "maintainer_email": "", "name": "graphknn", "package_url": "https://pypi.org/project/graphknn/", "platform": "", "project_url": "https://pypi.org/project/graphknn/", "project_urls": { "Homepage": "https://github.com/mosco/graphknn" }, "release_url": "https://pypi.org/project/graphknn/1.0.0b1/", "requires_dist": [ "HeapDict", "SciPy" ], "requires_python": "", "summary": "For every vertex in a graph, this code efficiently finds its K nearest vertices from a particular subset of \"special\" vertices.", "version": "1.0.0b1" }, "last_serial": 3689994, "releases": { "1.0.0b1": [ { "comment_text": "", "digests": { "md5": "3ba2f68f4b3fd4923bc606e1b83e277a", "sha256": "e8ae62ea9e3cb6059eb74f6d8b4b53a950bdec358f5e03b8a612f311472d03c8" }, "downloads": -1, "filename": "graphknn-1.0.0b1-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "3ba2f68f4b3fd4923bc606e1b83e277a", "packagetype": "bdist_wheel", "python_version": "py2.py3", "requires_python": null, "size": 6870, "upload_time": "2018-03-21T03:22:54", "url": "https://files.pythonhosted.org/packages/61/ca/1b4f08b709fcea1fc22a1f395507c84aaaebe9f80b9eb29251487f1ae818/graphknn-1.0.0b1-py2.py3-none-any.whl" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "3ba2f68f4b3fd4923bc606e1b83e277a", "sha256": "e8ae62ea9e3cb6059eb74f6d8b4b53a950bdec358f5e03b8a612f311472d03c8" }, "downloads": -1, "filename": "graphknn-1.0.0b1-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "3ba2f68f4b3fd4923bc606e1b83e277a", "packagetype": "bdist_wheel", "python_version": "py2.py3", "requires_python": null, "size": 6870, "upload_time": "2018-03-21T03:22:54", "url": "https://files.pythonhosted.org/packages/61/ca/1b4f08b709fcea1fc22a1f395507c84aaaebe9f80b9eb29251487f1ae818/graphknn-1.0.0b1-py2.py3-none-any.whl" } ] }