{ "info": { "author": "David N. Mashburn", "author_email": "david.n.mashburn@gmail.com", "bugtrack_url": null, "classifiers": [], "description": "mtm_stats\nHighly efficient set statistics about many-to-many relationships\n\n\nThis module computes statistics (counts) in the common case where two sets\n(A and B) have a many-to-many relationship. Specifically, it computes:\n\n* *connection count* (or just *count*): the number of connections a member of A has to B\n* *intersection count*: the number of connections to B that two members of A share in common\n\nThis module aims to compute these for every possible combination of\nmembers in A as fast as possible and without approximation.\n\nThe input for this module is a list of A/B pairs, i.e.:\n[(a1, b1), (a1, b2), (a2, b2), ...]\n\nIf N_A is the length of A, the connection count will have size N_A and\nthe intersection count will have size N_A * (N_A - 1) / 2.\n\nAny other set property can be easily derived from these.\n\n* union_count(A1, A2) = count(A1) + count(A2) - intersection_count(A1, A2)\n* single_sided_difference_count(A1, A2) = count(A1) - intersection(A1, A2)\n* symmetric_difference_count(A1, A2) = union_count(A1, A2) - intersection_count(A1, A2)\n\nThe union is actually computed in the main \"mtm_stats\" function in \"get_dicts_from_array_outputs\"\n\nThis approach will be the most effective when the connections between A and B are sparse.\n\nIn addition, there is one optional approximation that can be used to enhance performance:\na cutoff filter for small intersections (default is 0).\nIf cutoff > 0, any result with intersection count <= cutoff will be approximated as 0.\nThe union count is then assumed to be count(A1) + count(A2).\n\n### Implementation details:\n\nThis module uses a combination of techniques to achieve this goal:\n* Core implemented in C and Cython\n* Bit packing (members of B are each assigned one bit)\n* A sparse block array compression scheme that ignores large blocks of zeros in the bit-packed sets\n* Efficiently applied bit operations (&, |, popcount)\n* Only storing nonzero intersections using a 2-index sparse array: [(A1, A2, intersection_count), ...]\n* The optional cutoff described above\n\nThese techniques not only give huge time savings over a brute-force\nimplementation, but also cut down on memory usage\n(which could easily crash a machine with a sets as small as 100K elements).\n\n### Alternative techniques\n\nThis technique will work well until the size of N_A^2 becomes unmanagely large.\nIn a case like that, you should try a probabalistic algorithm like HyperLogLog, Count Min Sketch or MinHash:\n* http://bravenewgeek.com/tag/count-min-sketch/\n* https://redislabs.com/blog/count-min-sketch-the-art-and-science-of-estimating-stuff#.V-6Mctz3aaM\n* https://tech.shareaholic.com/2012/12/03/the-count-min-sketch-how-to-count-over-large-keyspaces-when-about-right-is-good-enough/\n\nSome sample packages to do these kinds of operations are algebird and datasketch:\n* https://github.com/twitter/algebird\n* https://github.com/ekzhu/datasketch\n\n### Installation instructions:\n\nUnder Debian systems, all you need to do is the following (or similar):\n\n* sudo apt-get install -y build-essential cmake g++ python python-dev python-setuptools python-numpy\n* sudo easy_install pip\n* sudo pip install numpy Cython\n* sudo pip install mtm_stats\n\n### Special instructions for installation on Mac OS X\n\nThis packages uses OpenMP to automatically utilize all available processing cores.\nFor this reason, you will need to get gcc (and not clang which lacks OpenMP support).\n\n##### Install Homebrew from https://brew.sh/\n\n/usr/bin/ruby -e \"$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)\"\n\n##### Install gcc and cmake\n\nbrew install gcc cmake\n\n##### Install numpy and Cython\n\npip install numpy Cython\n\n##### Set compiler commands and include path environment variables (versions may need to be updated)\n\nexport CC=/usr/local/Cellar/gcc/6.3.0_1/bin/gcc-6\nexport CXX=/usr/local/Cellar/gcc/6.3.0_1/bin/g++-6\nexport C_INCLUDE_PATH=/Users/developers/.virtualenvs/learning_intelligence_data/lib/python2.7/site-packages/numpy/core/include\n\n##### Install the library\n\npip install mtm_stats\n\nPython 3 is supported, provided the same dependencies are met.", "description_content_type": "", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "", "keywords": "", "license": "LICENSE.txt", "maintainer": "", "maintainer_email": "", "name": "mtm_stats", "package_url": "https://pypi.org/project/mtm_stats/", "platform": "", "project_url": "https://pypi.org/project/mtm_stats/", "project_urls": null, "release_url": "https://pypi.org/project/mtm_stats/0.4.4.3/", "requires_dist": null, "requires_python": "", "summary": "Highly efficient set statistics about many-to-many relationships", "version": "0.4.4.3" }, "last_serial": 4395198, "releases": { "0.3.1.1": [ { "comment_text": "", "digests": { "md5": "0425aadd76b08f994aca49487f18db55", "sha256": "4e45c62e481c637a271a83e38f25b448002dd5cd06bd9b5dfc210b37b71ffc1d" }, "downloads": -1, "filename": "mtm_stats-0.3.1.1.tar.gz", "has_sig": false, "md5_digest": "0425aadd76b08f994aca49487f18db55", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 42222, "upload_time": "2016-11-01T21:17:54", "url": "https://files.pythonhosted.org/packages/61/5b/4dc8d43610143d5872f3fcdbfa4123faa076bea3bc40423f1cbd8dbd65f2/mtm_stats-0.3.1.1.tar.gz" } ], "0.3.1.2": [ { "comment_text": "", "digests": { "md5": "91bd575403323e3be203174929703e1a", "sha256": "460262f6740d433ce3ee35b5158b9d8f7de8e5e4f98d1f0bebdca77d565a209f" }, "downloads": -1, "filename": "mtm_stats-0.3.1.2.tar.gz", "has_sig": false, "md5_digest": "91bd575403323e3be203174929703e1a", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 44339, "upload_time": "2016-12-09T20:08:40", "url": "https://files.pythonhosted.org/packages/c0/c8/c0d5dfe7ae2bd5f86f77abb6f32c7f34c3bafc465d3da1b6d5ed2dfdafde/mtm_stats-0.3.1.2.tar.gz" } ], "0.3.1.3": [ { "comment_text": "", "digests": { "md5": "b747a02b6b120538e72eab7d6a06898e", "sha256": "832363d4cef44fea34b6ba1dfc51eb95168166abec4f0b74146d082e9bce13f1" }, "downloads": -1, "filename": "mtm_stats-0.3.1.3.tar.gz", "has_sig": false, "md5_digest": "b747a02b6b120538e72eab7d6a06898e", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 58803, "upload_time": "2016-12-13T19:40:29", "url": "https://files.pythonhosted.org/packages/70/10/7b009066a0ceb050d59e5f76eb39828df4e91ee4e183e877314c73b10c1a/mtm_stats-0.3.1.3.tar.gz" } ], "0.3.2.0": [ { "comment_text": "", "digests": { "md5": "ad85195be73e6dc784a408b8e3d148ac", "sha256": "c4b7769598a3a9071470f40d846ae82849095bd78006078a31e4b1b3d5011cd4" }, "downloads": -1, "filename": "mtm_stats-0.3.2.0.tar.gz", "has_sig": false, "md5_digest": "ad85195be73e6dc784a408b8e3d148ac", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 60762, "upload_time": "2017-01-15T20:03:44", "url": "https://files.pythonhosted.org/packages/e2/90/ca0201e0dd295abc7e6776a55e73da180289c37f352b8aff506121323f56/mtm_stats-0.3.2.0.tar.gz" } ], "0.3.3.0": [ { "comment_text": "", "digests": { "md5": "176039d1a1bc0fc6e20c81ef42fdd9cb", "sha256": "4fd0bee37612a4da5b8e54d94fc81f6fa57023543f2702440c569e082bfbd2b8" }, "downloads": -1, "filename": "mtm_stats-0.3.3.0.tar.gz", "has_sig": false, "md5_digest": "176039d1a1bc0fc6e20c81ef42fdd9cb", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 69805, "upload_time": "2017-01-15T20:04:28", "url": "https://files.pythonhosted.org/packages/e0/4f/6302620765d0c539fed3874f181479465bbfbb15edc5a7849f0c1921de13/mtm_stats-0.3.3.0.tar.gz" } ], "0.3.4.0": [ { "comment_text": "", "digests": { "md5": "4cac85fd37779d5092c1117a31a8f557", "sha256": "5b03f87f10f15d8e0fee27b15faf9ff3ec7297438c34d47c15e0a9f0aebbd2d4" }, "downloads": -1, "filename": "mtm_stats-0.3.4.0.tar.gz", "has_sig": false, "md5_digest": "4cac85fd37779d5092c1117a31a8f557", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 70034, "upload_time": "2017-02-07T17:26:57", "url": "https://files.pythonhosted.org/packages/98/3f/eeb096523300e6b527f31dc26aa99111b962c7ad9a1eb1308e196473dd20/mtm_stats-0.3.4.0.tar.gz" } ], "0.3.4.1": [ { "comment_text": "", "digests": { "md5": "f4f495c117e75c9d5e1da70ba18017bf", "sha256": "207895a077fb89689604efa34e932e99121354467699158997537a68bf05b350" }, "downloads": -1, "filename": "mtm_stats-0.3.4.1.tar.gz", "has_sig": false, "md5_digest": "f4f495c117e75c9d5e1da70ba18017bf", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 70126, "upload_time": "2017-02-07T19:00:28", "url": "https://files.pythonhosted.org/packages/bd/a9/94e49ed353216839f67a0b67abacf6feb9cb17e2ca996aa1bd2076cfeb8e/mtm_stats-0.3.4.1.tar.gz" } ], "0.4.0.0": [ { "comment_text": "", "digests": { "md5": "e72d9772f025fbceeec63c75b329ffe7", "sha256": "16f608f5abdfc7943b2083c262b30162c9d8aa244c14f2b5cd17aa9da6e58b27" }, "downloads": -1, "filename": "mtm_stats-0.4.0.0.tar.gz", "has_sig": false, "md5_digest": "e72d9772f025fbceeec63c75b329ffe7", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 75252, "upload_time": "2017-02-08T18:36:08", "url": "https://files.pythonhosted.org/packages/2c/78/97fcfdd7636fbcf513ed7262c2f096e32722fc72f8b18d476d855070c3d1/mtm_stats-0.4.0.0.tar.gz" } ], "0.4.1.0": [ { "comment_text": "", "digests": { "md5": "b29a78f493f2b6d53705f86e73e970c7", "sha256": "98084cdf1f1ae2b980666020d2e1ebfc4bb5f11fe6c58cc12f5148ec92d878e4" }, "downloads": -1, "filename": "mtm_stats-0.4.1.0.tar.gz", "has_sig": false, "md5_digest": "b29a78f493f2b6d53705f86e73e970c7", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 71258, "upload_time": "2017-02-13T23:25:48", "url": "https://files.pythonhosted.org/packages/01/6c/21d8699a77d90604ea710877beda548a557e7f046454ae2ce5b77fc14d32/mtm_stats-0.4.1.0.tar.gz" } ], "0.4.2.1": [ { "comment_text": "", "digests": { "md5": "dfd8c14a0e061a3e68ac074decf72ff0", "sha256": "e2fa63ae4ca16e81a19391ab9a6653c2e8654c20878811830cf3d27e53ab3520" }, "downloads": -1, "filename": "mtm_stats-0.4.2.1.tar.gz", "has_sig": false, "md5_digest": "dfd8c14a0e061a3e68ac074decf72ff0", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 71906, "upload_time": "2017-02-21T17:18:32", "url": "https://files.pythonhosted.org/packages/ff/8b/b48773b0e61bdd496c5b27e5d91f55c7dd9214da74d70b6085ef6d17b7ff/mtm_stats-0.4.2.1.tar.gz" } ], "0.4.3.0": [ { "comment_text": "", "digests": { "md5": "300d274c1411fea380df9838562d9a5f", "sha256": "c8e34c7c11439665c5a222b3b5248c17b6c017868aa2e1ef9b12281f691f1405" }, "downloads": -1, "filename": "mtm_stats-0.4.3.0.tar.gz", "has_sig": false, "md5_digest": "300d274c1411fea380df9838562d9a5f", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 71605, "upload_time": "2017-02-23T17:14:23", "url": "https://files.pythonhosted.org/packages/ea/f3/18171493bf5c8b112627b86222adcc79e1ac40b731c7e26af9b2e20bf6f8/mtm_stats-0.4.3.0.tar.gz" } ], "0.4.4.0": [ { "comment_text": "", "digests": { "md5": "ce258819e51212ae58d0fd1de27f6629", "sha256": "1f730d8474284b7887ecba5a6a5014609e3ef41f7fbea61d709f0cc4947836c6" }, "downloads": -1, "filename": "mtm_stats-0.4.4.0.tar.gz", "has_sig": false, "md5_digest": "ce258819e51212ae58d0fd1de27f6629", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 71620, "upload_time": "2017-04-27T16:43:25", "url": "https://files.pythonhosted.org/packages/2c/2c/b2338b984a36abd3769feffdb840f28ae648dabc4ef3981834d113ab6fce/mtm_stats-0.4.4.0.tar.gz" } ], "0.4.4.1": [ { "comment_text": "", "digests": { "md5": "b61f674e47ee9dc908e690fa431eac0b", "sha256": "40d6365087883a9f4ba7564d00bb1b47f36d38777050dfe88d226393768a5ca5" }, "downloads": -1, "filename": "mtm_stats-0.4.4.1.tar.gz", "has_sig": false, "md5_digest": "b61f674e47ee9dc908e690fa431eac0b", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 72241, "upload_time": "2017-04-28T05:10:29", "url": "https://files.pythonhosted.org/packages/9e/40/0f32d6b1397493e459d04ad45ef92ab2629fd8ed6d09580aaab24c3102eb/mtm_stats-0.4.4.1.tar.gz" } ], "0.4.4.2": [ { "comment_text": "", "digests": { "md5": "8f32fac03694ac2fc4d670ba0829eced", "sha256": "008bba1e128451232a4177838d03c732b04de332b543640b71230d7f171a567d" }, "downloads": -1, "filename": "mtm_stats-0.4.4.2.tar.gz", "has_sig": false, "md5_digest": "8f32fac03694ac2fc4d670ba0829eced", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 79988, "upload_time": "2018-10-16T19:29:46", "url": "https://files.pythonhosted.org/packages/5b/f5/c8f64b98d41ee10b33af1a7362ca9ff360193984393c7e6e909b28501426/mtm_stats-0.4.4.2.tar.gz" } ], "0.4.4.3": [ { "comment_text": "", "digests": { "md5": "8f8e2035c84b72967095dbba4a7894f5", "sha256": "05ff18c6f1307d22ad718419215b1f8dd011bb23e6e08e646a0a730934394b82" }, "downloads": -1, "filename": "mtm_stats-0.4.4.3.tar.gz", "has_sig": false, "md5_digest": "8f8e2035c84b72967095dbba4a7894f5", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 80011, "upload_time": "2018-10-19T18:16:21", "url": "https://files.pythonhosted.org/packages/e7/cd/f51b7c3218960f08a5d8dd38b01f7d4c80a2efbcfb59df9ef00231815dc2/mtm_stats-0.4.4.3.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "8f8e2035c84b72967095dbba4a7894f5", "sha256": "05ff18c6f1307d22ad718419215b1f8dd011bb23e6e08e646a0a730934394b82" }, "downloads": -1, "filename": "mtm_stats-0.4.4.3.tar.gz", "has_sig": false, "md5_digest": "8f8e2035c84b72967095dbba4a7894f5", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 80011, "upload_time": "2018-10-19T18:16:21", "url": "https://files.pythonhosted.org/packages/e7/cd/f51b7c3218960f08a5d8dd38b01f7d4c80a2efbcfb59df9ef00231815dc2/mtm_stats-0.4.4.3.tar.gz" } ] }