{ "info": { "author": "Santi Villalba", "author_email": "sdvillal@gmail.com", "bugtrack_url": null, "classifiers": [ "Intended Audience :: Developers", "Intended Audience :: Science/Research", "License :: OSI Approved", "Operating System :: Unix", "Programming Language :: Python :: 2", "Programming Language :: Python :: 2.7", "Programming Language :: Python :: 3", "Programming Language :: Python :: 3.4", "Topic :: Scientific/Engineering", "Topic :: Software Development" ], "description": "jagged\n======\n\nEfficient storage of same-type, uneven-size arrays\n--------------------------------------------------\n\n|Pypi Version| \n\n*Jagged* is an ongoing amateur project exploring the storage panorama\nfor datasets containing (large amounts of) arrays with the same type\nand number of columns, but varying number of rows. Examples of such\ndatasets for which *jagged* has been used are collections of multivariate\ntimeseries (short animal behaviour snippets) and collections of molecules\n(represented as varying length strings).\n\nJagged aims to help analyzing data in the laptop and the cluster, in batch\nor interactively, providing a very lightweight store. Jagged provides fast\nretrieval of array subsets for many-GB datasets containing millions of rows.\n\nBy-design constraints\n---------------------\n\nFocus is on fast retrieval of arbitrary batch queries.\n\nJagged stores are append only.\n\nThere is no transaction, replication or distribution.\nIt is all files in your local or network disks.\n\nNot important efforts have been given yet to optimize\n(although some backends work quite smoothly).\n\nAt the moment, everything is simple algorithms implemented in pure python.\n\nInstallation\n------------\n\nIt should suffice to use pip::\n\n pip install jagged\n\nJagged stores builds on top of several high quality python libraries: numpy, blosc,\nbloscpack, bcolz and joblib. It also needs whatami and python-future.\nTesting relies on pytest (you need to install all dependencies to test at the moment,\nthis will change soon).\n\n\nShowcase\n--------\n\nUsing jagged is simple. There are different implementations that provide\ntwo basic methods: *append* adds a new array to the store, *get* retrieves\ncollections of arrays identified by their insertion order in the store.\n\n.. code:: python\n\n import os.path as op\n import numpy as np\n from jagged.mmap_backend import JaggedByMemmap\n\n # A Jagged instance is all you need\n jagged = JaggedByMemmap(op.expanduser(path='~/jagged-example/mmap'))\n # You can drop here any you want to\n\n # Generate a random dataset\n rng = np.random.RandomState(0)\n max_length = 2000\n num_arrays = 100\n originals = [rng.randn(rng.randint(0, max_length), 50)\n for _ in range(num_arrays))\n\n # Add these to the store (context is usually optional but recommended)\n with jagged:\n indices = map(jagged.append, originals)\n\n # What do we have in store?\n print('Number of arrays: %d, number of rows: %d' % (jbmm.narrays, jbmm.nrows))\n print('Jagged shape=%r, dtype=%r, order=%r' %\n (jagged.shape, jagged.dtype, jagged.order))\n\n # Check roundtrip\n roundtripped = jagged.get(indices)\n print('The store has %d arrays')\n\n # Jagged stores self-identified themselves (using whatami)\n print(jagged.what().id())\n\n # Jagged stores can be iterated in chunks\n # See iter\n\n # Jagged stores can be populated from other jagged stores\n\n # Some jagged stores allow to retrieve arbitrary rows as fast\n # as arbitrary arrays.\n\n\nBackends\n--------\n\nAlthough rapidly changing, *jagged* already provides the following storage backends\nthat can be considered as working and stable. Other backends are planned.\n\n+-------------------+------+-------+--------+------+-----+------+------+\n| Backend | comp | chunk | column | mmap | lin | lazy | cont |\n+===================+======+=======+========+======+=====+======+======+\n| JaggedByBlosc | X | | | X | | | |\n+-------------------+------+-------+--------+------+-----+------+------+\n| JaggedByCarray | X | X | | | X | | X |\n+-------------------+------+-------+--------+------+-----+------+------+\n| JaggedByH5Py | X | X | | | X | X | X |\n+-------------------+------+-------+--------+------+-----+------+------+\n| JaggedByJoblib | X | X | | | | | |\n+-------------------+------+-------+--------+------+-----+------+------+\n| JaggedByMemMap | | | | X | X | X | X |\n+-------------------+------+-------+--------+------+-----+------+------+\n| JaggedByNPY | | | | | | | |\n+-------------------+------+-------+--------+------+-----+------+------+\n| JaggedByBloscpack | X | | | | | | |\n+-------------------+------+-------+--------+------+-----+------+------+\n| JaggedByPickle | X | X | | | | | |\n+-------------------+------+-------+--------+------+-----+------+------+\n\n\n- comp: can be compressed\n- chunk: can be chunked\n- column: stores columns of the array contiguously (can be easily implemented by using a store per column)\n- mmap: can open a memmap to the data\n- lin: can retrieve any row without the need to retrieve the whole\n array it contains it\n- lazy: the arrays are not fetched immediatly; this can mean also that they can be managed\n as virtual-memory by the OS (JaggedByMemMap only)\n- cont: retrieved arrays can be forced to lie in contiguous memory segments\n\nBenchmarks\n----------\n\nWhat backend and parameters work best depends on whether your data is compressible or not and the\nsizes of the arrays. We have a good idea of what works best for our data and are working at\nproviding a benchmarking framework. Find here a preview_.\n\n\n.. |Pypi Version| image:: https://badge.fury.io/py/jagged.svg\n :target: http://badge.fury.io/py/jagged\n.. |Build Status| image:: https://travis-ci.org/sdvillal/jagged.svg?branch=master\n :target: https://travis-ci.org/sdvillal/jagged\n.. |Coverage Status| image:: http://codecov.io/github/sdvillal/jagged/coverage.svg?branch=master\n :target: http://codecov.io/github/sdvillal/jagged?branch=master\n.. |Scrutinizer Status| image:: https://scrutinizer-ci.com/g/sdvillal/jagged/badges/quality-score.png?b=master\n :target: https://scrutinizer-ci.com/g/sdvillal/jagged/?branch=master\n.. _preview: https://github.com/sdvillal/strawlab-examples/tree/master/strawlab_examples/benchmarks", "description_content_type": null, "docs_url": null, "download_url": "UNKNOWN", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/sdvillal/jagged", "keywords": null, "license": "BSD 3 clause", "maintainer": null, "maintainer_email": null, "name": "jagged", "package_url": "https://pypi.org/project/jagged/", "platform": "UNKNOWN", "project_url": "https://pypi.org/project/jagged/", "project_urls": { "Download": "UNKNOWN", "Homepage": "https://github.com/sdvillal/jagged" }, "release_url": "https://pypi.org/project/jagged/0.1.0/", "requires_dist": null, "requires_python": null, "summary": "Simple tricks for efficient loading or merging collections of unevenly sized elements", "version": "0.1.0" }, "last_serial": 1699158, "releases": { "0.1.0": [ { "comment_text": "", "digests": { "md5": "09aea20f86a7689bad446117edb19778", "sha256": "cbaae035d3f221953064131042b4ab3bbc7a9bbda752f9150dc37ee749986ba7" }, "downloads": -1, "filename": "jagged-0.1.0.tar.gz", "has_sig": false, "md5_digest": "09aea20f86a7689bad446117edb19778", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 23718, "upload_time": "2015-08-29T06:20:23", "url": "https://files.pythonhosted.org/packages/4a/df/9eb13cee1df8a2694f5a6a5997c447242c64edf7f95017c7466d69162104/jagged-0.1.0.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "09aea20f86a7689bad446117edb19778", "sha256": "cbaae035d3f221953064131042b4ab3bbc7a9bbda752f9150dc37ee749986ba7" }, "downloads": -1, "filename": "jagged-0.1.0.tar.gz", "has_sig": false, "md5_digest": "09aea20f86a7689bad446117edb19778", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 23718, "upload_time": "2015-08-29T06:20:23", "url": "https://files.pythonhosted.org/packages/4a/df/9eb13cee1df8a2694f5a6a5997c447242c64edf7f95017c7466d69162104/jagged-0.1.0.tar.gz" } ] }