{ "info": { "author": "Andrew Grigorev", "author_email": "andrew@ei-grad.ru", "bugtrack_url": null, "classifiers": [], "description": "bincount\n========\n\nNo-copy parallelized bincount returning dict.\n\nMotivation\n----------\n\nAs of Nov 2018, ``np.bincount`` is unusable with large memmaps:\n\n.. code-block::\n\n >>> import numpy as np\n >>> np.bincount(np.memmap('some-5gb-file.txt', mode='r'))\n Traceback (most recent call last):\n File \"\", line 1, in \n MemoryError\n\nThe most effective pure-python solution for ``wc -l`` is a bit slow:\n\n.. code-block::\n\n In [6]: %%time\n ...: sum(1 for i in open('some-5gb-file.txt', mode='rb'))\n ...:\n CPU times: user 3.5 s, sys: 878 ms, total: 4.38 s\n Wall time: 4.38 s\n Out[6]: 58941384\n\nIt is 3x times slower than ``wc -l``:\n\n.. code-block::\n\n In [1]: %%time\n ...: !wc -l some-5gb-file.txt\n ...:\n 58941384 some-5gb-file.txt\n CPU times: user 1.48 ms, sys: 3.48 ms, total: 4.96 ms\n Wall time: 1.24 s\n\nWhile it should be faster on modern multicore SMP systems:\n\n.. code-block::\n\n In [1]: import numpy as np\n\n In [2]: from bincount import bincount\n\n In [3]: %%time\n ...: bincount(np.memmap('some-5gb-file.txt', mode='r'))[10]\n ...:\n CPU times: user 6.83 s, sys: 354 ms, total: 7.19 s\n Wall time: 705 ms\n Out[4]: 58941384\n\nInstall\n-------\n\nPrequirements: C-compiler with OpenMP support.\n\nInstall with pip:\n\n.. code-block::\n\n pip install bincount\n\nUsage\n-----\n\nThere is a ``bincount`` (a parallel version) and a ``bincount_single`` (which don't\nparallelize the calculation) functions, both returning the dict containing the\nnumber of occurrences of each byte value in the passed bytes-like object:\n\n.. code-block::\n\n >>> from bincount import bincount\n >>> bincount(open('a-tiny-file.txt', 'rb').read())\n {59: 2, 65: 5, 66: 1, 67: 3, 68: 2, 69: 3, 73: 4, 76: 7, 84: 3, 86: 1, 95: 4}", "description_content_type": "", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/ei-grad/bincount", "keywords": "", "license": "", "maintainer": "", "maintainer_email": "", "name": "bincount", "package_url": "https://pypi.org/project/bincount/", "platform": "", "project_url": "https://pypi.org/project/bincount/", "project_urls": { "Homepage": "https://github.com/ei-grad/bincount" }, "release_url": "https://pypi.org/project/bincount/0.0.5/", "requires_dist": null, "requires_python": "", "summary": "No-copy parallelized bincount returning dict", "version": "0.0.5" }, "last_serial": 4503605, "releases": { "0.0.5": [ { "comment_text": "", "digests": { "md5": "b8bd6bebbaed35b2ce94171873eac35a", "sha256": "a898625ce9d90430d54283572ca179319720a1d0364a0998b9924ac9c65be350" }, "downloads": -1, "filename": "bincount-0.0.5.tar.gz", "has_sig": false, "md5_digest": "b8bd6bebbaed35b2ce94171873eac35a", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 105288, "upload_time": "2018-11-19T16:20:13", "url": "https://files.pythonhosted.org/packages/10/c4/61c04d56271fce78d6c6744f04b23bb124fafac0954ab3fd475732983834/bincount-0.0.5.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "b8bd6bebbaed35b2ce94171873eac35a", "sha256": "a898625ce9d90430d54283572ca179319720a1d0364a0998b9924ac9c65be350" }, "downloads": -1, "filename": "bincount-0.0.5.tar.gz", "has_sig": false, "md5_digest": "b8bd6bebbaed35b2ce94171873eac35a", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 105288, "upload_time": "2018-11-19T16:20:13", "url": "https://files.pythonhosted.org/packages/10/c4/61c04d56271fce78d6c6744f04b23bb124fafac0954ab3fd475732983834/bincount-0.0.5.tar.gz" } ] }