{ "info": { "author": "Trung Ngo.T.", "author_email": "trung@imito.ai", "bugtrack_url": null, "classifiers": [ "Development Status :: 4 - Beta", "Intended Audience :: Developers", "Intended Audience :: Education", "Intended Audience :: Science/Research", "License :: OSI Approved :: Apache Software License", "Natural Language :: English", "Operating System :: MacOS", "Operating System :: Microsoft :: Windows", "Operating System :: POSIX", "Operating System :: Unix", "Programming Language :: Python", "Programming Language :: Python :: 3", "Programming Language :: Python :: 3.5", "Programming Language :: Python :: 3.6", "Programming Language :: Python :: 3.7", "Topic :: Scientific/Engineering", "Topic :: Scientific/Engineering :: Artificial Intelligence", "Topic :: Scientific/Engineering :: Bio-Informatics", "Topic :: Software Development" ], "description": "# bigarray\nFast and scalable numpy array using Memory-mapped I/O\n\nStable build:\n> pip install bigarray\n\nNightly build from github:\n> pip install git+https://github.com/trungnt13/bigarray@master\n\n---\n## The three principles\n\n* **Transparency**: everything is `numpy.array`, metadata and support for extra features (e.g. multiprocessing, indexing, etc) are subtly implemented in the _background_.\n* **Pragmatism**: fast but easy, simplified A.P.I for common use cases\n* **Focus**: _\"Do one thing and do it well\"_\n\n## The benchmarks\n\nAbout **535 times faster** than HDF5 data (using `h5py`) and **223 times faster** than normal `numpy.array`\n[The detail benchmark code](https://github.com/trungnt13/bigarray/blob/master/benchmarks/mmap_vs_hdf5.py)\n```\nArray size: 1220.70 (MB)\n\nCreate HDF5 in: 0.0005580571014434099 s\nCreate Memmap in: 0.000615391880273819 s\n\nNumpy save in: 0.5713834380730987 s\nWriting data to HDF5 : 0.5530977640300989 s\nWriting data to Memmap: 0.7038380969315767 s\n\nNumpy saved size: 1220.70 (MB)\nHDF5 saved size: 1220.71 (MB)\nMmap saved size: 1220.70 (MB)\n\nLoad Numpy array: 0.3723734531085938 s\nLoad HDF5 data : 0.00041177100501954556 s\nLoad Memmap data: 0.00017150305211544037 s\n\nTest correctness of stored data\nNumpy : True\nHDF5 : True\nMemmap: True\n\nIterate Numpy data : 0.00020254682749509811 s\nIterate HDF5 data : 0.8945782391820103 s\nIterate Memmap data : 0.0014937107916921377 s\nIterate Memmap (2nd) : 0.0011746759992092848 s\n\nNumpy total time (open+iter): 0.3725759999360889 s\nH5py total time (open+iter): 0.8949900101870298 s\n**Mmap total time (open+iter): 0.001665213843807578 s**\n```\n\n## Example\n\n```python\nfrom multiprocessing import Pool\n\nimport numpy as np\n\nfrom bigarray import PointerArray, PointerArrayWriter\n\nn = 80 * 10 # total number of samples\njobs = [(i, i + 10) for i in range(0, n // 10, 10)]\npath = '/tmp/array'\n\n# ====== Multiprocessing writing ====== #\nwriter = PointerArrayWriter(path, shape=(n,), dtype='int32', remove_exist=True)\n\n\ndef fn_write(job):\n start, end = job\n # it is crucial to write at different position for different process\n writer.write(\n {\"name%i\" % i: np.arange(i * 10, i * 10 + 10) for i in range(start, end)},\n start_position=start * 10)\n\n\n# using 2 processes to generate and write data\nwith Pool(2) as p:\n p.map(fn_write, jobs)\nwriter.flush()\nwriter.close()\n\n# ====== Multiprocessing reading ====== #\nx = PointerArray(path)\nprint(x['name0'])\nprint(x['name66'])\nprint(x['name78'])\n\n# normal indexing\nfor name, (s, e) in x.indices.items():\n data = x[s:e]\n# fast indexing\nfor name in x.indices:\n data = x[name]\n\n\n# multiprocess indexing\ndef fn_read(job):\n start, end = job\n total = 0\n for i in range(start, end):\n total += np.sum(x['name%d' % i])\n return total\n\n# use multiprocessing to calculate the sum of all arrays\nwith Pool(2) as p:\n total_sum = sum(p.map(fn_read, jobs))\nprint(np.sum(x), total_sum)\n```\n\nOutput:\n```\n[0 1 2 3 4 5 6 7 8 9]\n[660 661 662 663 664 665 666 667 668 669]\n[780 781 782 783 784 785 786 787 788 789]\n319600 319600\n```\n\n\n", "description_content_type": "text/markdown", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/trungnt13/bigarray", "keywords": "bigarray", "license": "Apache License 2.0", "maintainer": "", "maintainer_email": "", "name": "bigarray", "package_url": "https://pypi.org/project/bigarray/", "platform": "", "project_url": "https://pypi.org/project/bigarray/", "project_urls": { "Homepage": "https://github.com/trungnt13/bigarray" }, "release_url": "https://pypi.org/project/bigarray/0.2.2/", "requires_dist": [ "numpy (>=1.15.0)", "six (>=1.12.0)" ], "requires_python": "", "summary": "Fast and scalable array for machine learning and artificial intelligence", "version": "0.2.2" }, "last_serial": 5982705, "releases": { "0.1.0": [ { "comment_text": "", "digests": { "md5": "5add37088293fb06a7d9581a38f74159", "sha256": "0559d537188f6e978b13e7e81be7d819694b7f579e0a0225ddfb33a9af9c49f0" }, "downloads": -1, "filename": "bigarray-0.1.0-py3-none-any.whl", "has_sig": false, "md5_digest": "5add37088293fb06a7d9581a38f74159", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 11194, "upload_time": "2019-10-08T12:19:58", "url": "https://files.pythonhosted.org/packages/01/96/1cc9bc081cfe58a4c8400f9d71262034a99e812b16814fb14b536ec5d8dd/bigarray-0.1.0-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "4be2fe610f304bac6db993ffac2c1d99", "sha256": "f2de0d28d079b422800363ccf14350ffc04a190d52d22915a510634a7f04b16e" }, "downloads": -1, "filename": "bigarray-0.1.0.tar.gz", "has_sig": false, "md5_digest": "4be2fe610f304bac6db993ffac2c1d99", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 6301, "upload_time": "2019-10-08T12:20:01", "url": "https://files.pythonhosted.org/packages/f5/9b/9832e3cfd4f0e2adbb8c74c89b3a9abbcd2ce875add7e1a5c4d710421621/bigarray-0.1.0.tar.gz" } ], "0.2.0": [ { "comment_text": "", "digests": { "md5": "ff65cb5e55aaed32390c2dff23420732", "sha256": "cba67d5f5140b8230b19d2c3600e78963542ee51aa088255d18103a2d875ef6f" }, "downloads": -1, "filename": "bigarray-0.2.0-py3-none-any.whl", "has_sig": false, "md5_digest": "ff65cb5e55aaed32390c2dff23420732", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 15195, "upload_time": "2019-10-08T21:24:28", "url": "https://files.pythonhosted.org/packages/5d/1c/443ee3debebe703b542dc96434c93caf6d6ac75d8b8d25ac42f91fd19ae1/bigarray-0.2.0-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "89165c3b908c6877b4000379f6828890", "sha256": "fdd0b106a498e14cf43e39c4a3e3a63c9860117d72bb45719207c1c464f3653b" }, "downloads": -1, "filename": "bigarray-0.2.0.tar.gz", "has_sig": false, "md5_digest": "89165c3b908c6877b4000379f6828890", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 8794, "upload_time": "2019-10-08T21:24:30", "url": "https://files.pythonhosted.org/packages/88/f8/53929ef02f606a2ad6c4be2db26894fa7b48ba039f2ce2acc498810a929c/bigarray-0.2.0.tar.gz" } ], "0.2.1": [ { "comment_text": "", "digests": { "md5": "709f5db5d8c9a51e63bdf9d1a1c5f972", "sha256": "8ad49bb232bc85a80dd592a448cfe07c7b883371e07601a54e1390f118694413" }, "downloads": -1, "filename": "bigarray-0.2.1-py3-none-any.whl", "has_sig": false, "md5_digest": "709f5db5d8c9a51e63bdf9d1a1c5f972", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 17070, "upload_time": "2019-10-09T12:38:08", "url": "https://files.pythonhosted.org/packages/78/09/decbebee351a27043476bbcd7e1a29351497da76463bf322281dab8bc2fe/bigarray-0.2.1-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "e8f57fbdeb2109fa4d64c73c9d52afb4", "sha256": "94208271b7c65176e8ee040bef6e01fecdea017b9c16160aa062841771e8599c" }, "downloads": -1, "filename": "bigarray-0.2.1.tar.gz", "has_sig": false, "md5_digest": "e8f57fbdeb2109fa4d64c73c9d52afb4", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 11339, "upload_time": "2019-10-09T12:38:10", "url": "https://files.pythonhosted.org/packages/3b/62/623af8c156741fefc86d9348172a76cc549c601d803ee411231f313b57cf/bigarray-0.2.1.tar.gz" } ], "0.2.2": [ { "comment_text": "", "digests": { "md5": "ad3a054ada1dd6fa4988210d3a59b77c", "sha256": "4b864ba98dcbb5695ca791c14c025e55c3bae5423171c3a673d6dff9b5daba7c" }, "downloads": -1, "filename": "bigarray-0.2.2-py3-none-any.whl", "has_sig": false, "md5_digest": "ad3a054ada1dd6fa4988210d3a59b77c", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 17269, "upload_time": "2019-10-16T11:37:52", "url": "https://files.pythonhosted.org/packages/89/37/b743fc2e7c18f81a8870b27d622b13b70a07d211b96a2bcd8de461aff59e/bigarray-0.2.2-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "82a9cbf196f95a30d7505e4775f5c8fe", "sha256": "0fd84a057549d54fb6bcc0662aef9981694178c757acf3f58f1558e8a2f7961f" }, "downloads": -1, "filename": "bigarray-0.2.2.tar.gz", "has_sig": false, "md5_digest": "82a9cbf196f95a30d7505e4775f5c8fe", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 11630, "upload_time": "2019-10-16T11:37:53", "url": "https://files.pythonhosted.org/packages/c8/5b/5238f40f8bf1068a94cc2a62be843bb887d778e5c5729065e0cc039dea4e/bigarray-0.2.2.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "ad3a054ada1dd6fa4988210d3a59b77c", "sha256": "4b864ba98dcbb5695ca791c14c025e55c3bae5423171c3a673d6dff9b5daba7c" }, "downloads": -1, "filename": "bigarray-0.2.2-py3-none-any.whl", "has_sig": false, "md5_digest": "ad3a054ada1dd6fa4988210d3a59b77c", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 17269, "upload_time": "2019-10-16T11:37:52", "url": "https://files.pythonhosted.org/packages/89/37/b743fc2e7c18f81a8870b27d622b13b70a07d211b96a2bcd8de461aff59e/bigarray-0.2.2-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "82a9cbf196f95a30d7505e4775f5c8fe", "sha256": "0fd84a057549d54fb6bcc0662aef9981694178c757acf3f58f1558e8a2f7961f" }, "downloads": -1, "filename": "bigarray-0.2.2.tar.gz", "has_sig": false, "md5_digest": "82a9cbf196f95a30d7505e4775f5c8fe", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 11630, "upload_time": "2019-10-16T11:37:53", "url": "https://files.pythonhosted.org/packages/c8/5b/5238f40f8bf1068a94cc2a62be843bb887d778e5c5729065e0cc039dea4e/bigarray-0.2.2.tar.gz" } ] }