{ "info": { "author": "Kiyoshi Wesley Masui", "author_email": "kiyo@physics.ubc.ca", "bugtrack_url": null, "classifiers": [], "description": "==========\nBitshuffle\n==========\n\nFilter for improving compression of typed binary data.\n\nBitshuffle is an algorithm that rearranges typed, binary data for improving\ncompression, as well as a python/C package that implements this algorithm\nwithin the Numpy framework.\n\nThe library can be used along side HDF5 to compress and decompress datasets and\nis integrated through the `dynamically loaded filters`_ framework. Bitshuffle\nis HDF5 filter number ``32008``.\n\nAlgorithmically, Bitshuffle is closely related to HDF5's `Shuffle filter`_\nexcept it operates at the bit level instead of the byte level. Arranging a\ntyped data array in to a matrix with the elements as the rows and the bits\nwithin the elements as the columns, Bitshuffle \"transposes\" the matrix,\nsuch that all the least-significant-bits are in a row, etc. This transpose\nis performed within blocks of data roughly 8kB long [1]_.\n\nThis does not in itself compress data, only rearranges it for more efficient\ncompression. To perform the actual compression you will need a compression\nlibrary. Bitshuffle has been designed to be well matched Marc Lehmann's\nLZF_ as well as LZ4_. Note that because Bitshuffle modifies the data at the bit\nlevel, sophisticated entropy reducing compression libraries such as GZIP and\nBZIP are unlikely to achieve significantly better compression than simpler and\nfaster duplicate-string-elimination algorithms such as LZF and LZ4. Bitshuffle\nthus includes routines (and HDF5 filter options) to apply LZ4 compression to\neach block after shuffling [2]_.\n\nThe Bitshuffle algorithm relies on neighbouring elements of a dataset being\nhighly correlated to improve data compression. Any correlations that span at\nleast 24 elements of the dataset may be exploited to improve compression.\n\nBitshuffle was designed with performance in mind. On most machines the\ntime required for Bitshuffle+LZ4 is insignificant compared to the time required\nto read or write the compressed data to disk. Because it is able to exploit the\nSSE and AVX instruction sets present on modern Intel and AMD processors, on\nthese machines compression is only marginally slower than an out-of-cache\nmemory copy. On modern x86 processors you can expect Bitshuffle to have a\nthroughput of roughly 1 byte per clock cycle, and on the Haswell generation of\nIntel processors (2013) and later, you can expect up to 2 bytes per clock\ncycle. In addition, Bitshuffle is parallelized using OpenMP.\n\nAs a bonus, Bitshuffle ships with a dynamically loaded version of\n`h5py`'s LZF compression filter, such that the filter can be transparently\nused outside of python and in command line utilities such as ``h5dump``.\n\n.. [1] Chosen to fit comfortably within L1 cache as well as be well matched\n window of the LZF compression library.\n\n.. [2] Over applying bitshuffle to the full dataset then applying LZ4\n compression, this has the tremendous advantage that the block is\n already in the L1 cache.\n\n.. _`dynamically loaded filters`: http://www.hdfgroup.org/HDF5/doc/Advanced/DynamicallyLoadedFilters/HDF5DynamicallyLoadedFilters.pdf\n\n.. _`Shuffle filter`: http://www.hdfgroup.org/HDF5/doc_resource/H5Shuffle_Perf.pdf\n\n.. _LZF: http://oldhome.schmorp.de/marc/liblzf.html\n\n.. _LZ4: https://code.google.com/p/lz4/\n\n\nApplications\n------------\n\nBitshuffle might be right for your application if:\n\n- You need to compress typed binary data.\n- Your data is arranged such that adjacent elements over the fastest varying\n index of your dataset are similar (highly correlated).\n- A special case of the previous point is if you are only exercising a subset\n of the bits in your data-type, as is often true of integer data.\n- You need both high compression ratios and high performance.\n\n\nComparing Bitshuffle to other compression algorithms and HDF5 filters:\n\n- Bitshuffle is less general than many other compression algorithms.\n To achieve good compression ratios, consecutive elements of your data must\n be highly correlated.\n- For the right datasets, Bitshuffle is one of the few compression\n algorithms that promises both high throughput and high compression ratios.\n- Bitshuffle should have roughly the same throughput as Shuffle, but\n may obtain higher compression ratios.\n- The MAFISC_ filter actually includes something similar to Bitshuffle as one of\n its prefilters, However, MAFICS's emphasis is on obtaining high compression\n ratios at all costs, sacrificing throughput.\n\n.. _MAFISC: http://wr.informatik.uni-hamburg.de/research/projects/icomex/mafisc\n\n\nInstallation for Python\n-----------------------\n\nInstallation requires python 2.7+ or 3.3+, HDF5 1.8.4 or later, HDF5 for python\n(h5py), Numpy and Cython. Bitshuffle must be linked against the same version of\nHDF5 as h5py, which in practice means h5py must be built from source_ rather\nthan pre-built wheels [3]_. To use the dynamically loaded HDF5 filter requires\nHDF5 1.8.11 or later.\n\nTo install::\n\n python setup.py install [--h5plugin [--h5plugin-dir=spam]]\n\nTo get finer control of installation options, including whether to compile\nwith OpenMP multi-threading, copy the ``setup.cfg.example`` to ``setup.cfg``\nand edit the values therein.\n\nIf using the dynamically loaded HDF5 filter (which gives you access to the\nBitshuffle and LZF filters outside of python), set the environment variable\n``HDF5_PLUGIN_PATH`` to the value of ``--h5plugin-dir`` or use HDF5's default\nsearch location of ``/usr/local/hdf5/lib/plugin``.\n\nIf you get an error about missing source files when building the extensions,\ntry upgrading setuptools. There is a weird bug where setuptools prior to 0.7\ndoesn't work properly with Cython in some cases.\n\n.. _source: http://docs.h5py.org/en/latest/build.html#source-installation\n\n.. [3] Typically you will be able to install Bitshuffle, but there will be\n errors when creating and reading datasets.\n\n\nUsage from Python\n-----------------\n\nThe `bitshuffle` module contains routines for shuffling and unshuffling\nNumpy arrays.\n\nIf installed with the dynamically loaded filter plugins, Bitshuffle can be used\nin conjunction with HDF5 both inside and outside of python, in the same way as\nany other filter; simply by specifying the filter number ``32008``. Otherwise\nthe filter will be available only within python and only after importing\n`bitshuffle.h5`. Reading Bitshuffle encoded datasets will be transparent.\nThe filter can be added to new datasets either through the `h5py` low level\ninterface or through the convenience functions provided in\n`bitshuffle.h5`. See the docstrings and unit tests for examples. For `h5py`\nversion 2.5.0 and later Bitshuffle can added to new datasets through the\nhigh level interface, as in the example below.\n\n\nExample h5py\n------------\n::\n\n import h5py\n import numpy\n import bitshuffle.h5\n\n print(h5py.__version__) # >= '2.5.0'\n\n f = h5py.File(filename, \"w\")\n\n # block_size = 0 let Bitshuffle choose its value\n block_size = 0\n\n dataset = f.create_dataset(\n \"data\",\n (100, 100, 100),\n compression=bitshuffle.h5.H5FILTER,\n compression_opts=(block_size, bitshuffle.h5.H5_COMPRESS_LZ4),\n dtype='float32',\n )\n\n # create some random data\n array = numpy.random.rand(100, 100, 100)\n array = array.astype('float32')\n\n dataset[:] = array\n\n f.close()\n\n\nUsage from C\n------------\n\nIf you wish to use Bitshuffle in your C program and would prefer not to use the\nHDF5 dynamically loaded filter, the C library in the ``src/`` directory is\nself-contained and complete.\n\n\nUsage from Java\n---------------\n\nYou can use Bitshuffle even in Java and the routines for shuffling and unshuffling\nare ported into `snappy-java`_. To use the routines, you need to add the following\ndependency to your pom.xml::\n\n \n org.xerial.snappy\n snappy-java\n 1.1.3-M1\n \n\nFirst, import org.xerial.snapy.BitShuffle in your Java code::\n\n import org.xerial.snappy.BitShuffle;\n\nThen, you use them like this::\n\n int[] data = new int[] {1, 3, 34, 43, 34};\n byte[] shuffledData = BitShuffle.bitShuffle(data);\n int[] result = BitShuffle.bitUnShuffleIntArray(shuffledData);\n\n.. _`snappy-java`: https://github.com/xerial/snappy-java\n\n\nAnaconda\n--------\n\nThe conda package can be build via::\n\n conda build conda-recipe\n\n\nFor Best Results\n----------------\n\nHere are a few tips to help you get the most out of Bitshuffle:\n\n- For multi-dimensional datasets, order your data such that the fastest varying\n dimension is the one over which your data is most correlated (have\n values that change the least), or fake this using chunks.\n- To achieve the highest throughput, use a data type that is 64 *bytes* or\n smaller. If you have a very large compound data type, consider adding a\n dimension to your datasets instead.\n- To make full use of the SSE2 instruction set, use a data type whose size\n is a multiple of 2 bytes. For the AVX2 instruction set, use a data type whose\n size is a multiple of 4 bytes.\n\n\nCiting Bitshuffle\n-----------------\n\nBitshuffle was initially described in\nhttp://dx.doi.org/10.1016/j.ascom.2015.07.002, pre-print available at\nhttp://arxiv.org/abs/1503.00638.", "description_content_type": "", "docs_url": null, "download_url": "https://github.com/kiyo-masui/bitshuffle/tarball/0.3.5", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/kiyo-masui/bitshuffle", "keywords": "compression,hdf5,numpy", "license": "MIT", "maintainer": "", "maintainer_email": "", "name": "bitshuffle", "package_url": "https://pypi.org/project/bitshuffle/", "platform": "", "project_url": "https://pypi.org/project/bitshuffle/", "project_urls": { "Download": "https://github.com/kiyo-masui/bitshuffle/tarball/0.3.5", "Homepage": "https://github.com/kiyo-masui/bitshuffle" }, "release_url": "https://pypi.org/project/bitshuffle/0.3.5/", "requires_dist": null, "requires_python": "", "summary": "Bitshuffle filter for improving typed data compression.", "version": "0.3.5" }, "last_serial": 4453427, "releases": { "0.2.3": [ { "comment_text": "", "digests": { "md5": "00b548101355e0f8521ab2a8e949b1aa", "sha256": "609fe3fb61d5c9713f36463de975d68cfef299867113fddd3f808e710ae78e2e" }, "downloads": -1, "filename": "bitshuffle-0.2.3.tar.gz", "has_sig": false, "md5_digest": "00b548101355e0f8521ab2a8e949b1aa", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 344576, "upload_time": "2016-07-05T22:33:20", "url": "https://files.pythonhosted.org/packages/e3/0f/c0d79aadaf90e5934631a8e77680ddb0a6df78eff049bbc4b1c69fd2d5a3/bitshuffle-0.2.3.tar.gz" } ], "0.2.4": [ { "comment_text": "", "digests": { "md5": "29fe4c8c8d698041bfa32273b503d2b4", "sha256": "88dc0289edf6724e7495d63ce381a82ef88548508aecaafa905e422d52221205" }, "downloads": -1, "filename": "bitshuffle-0.2.4.tar.gz", "has_sig": false, "md5_digest": "29fe4c8c8d698041bfa32273b503d2b4", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 214113, "upload_time": "2016-07-08T17:14:54", "url": "https://files.pythonhosted.org/packages/cc/a2/091fed01554c4a4f7d36e97f9ee08f6f24022415857fd008622e6fa4d2ea/bitshuffle-0.2.4.tar.gz" } ], "0.3.0": [ { "comment_text": "", "digests": { "md5": "63c4e204240059dceeb196f7a1a59568", "sha256": "eef48c619460cfca40c7a01ba12434a589f1397a759ecaf848bae681f5961364" }, "downloads": -1, "filename": "bitshuffle-0.3.0.tar.gz", "has_sig": false, "md5_digest": "63c4e204240059dceeb196f7a1a59568", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 357708, "upload_time": "2017-02-07T03:53:06", "url": "https://files.pythonhosted.org/packages/d8/ff/8ee9dfd5a8f3b5060c609f024bf0fb19646119190f5e014007520dfb698e/bitshuffle-0.3.0.tar.gz" } ], "0.3.1": [ { "comment_text": "", "digests": { "md5": "e44ad3b3074f6beff15822d2a2f0047b", "sha256": "14c522e94e4afb261bb5e0fafdedc58273ac837caf85bb16acc5cd58c0274d5f" }, "downloads": -1, "filename": "bitshuffle-0.3.1.tar.gz", "has_sig": false, "md5_digest": "e44ad3b3074f6beff15822d2a2f0047b", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 357695, "upload_time": "2017-02-07T04:11:32", "url": "https://files.pythonhosted.org/packages/43/c9/a870d8a42675d038cfc778ff3aa5cd0be9247f5b3ea5c6221fc0954da143/bitshuffle-0.3.1.tar.gz" } ], "0.3.2": [ { "comment_text": "", "digests": { "md5": "db0bec0a401d58840764f42cb5bc57df", "sha256": "db4c452dbb99682785ae10364f1adf60f39f8cbe393958fc217bcbae77966c07" }, "downloads": -1, "filename": "bitshuffle-0.3.2.tar.gz", "has_sig": false, "md5_digest": "db0bec0a401d58840764f42cb5bc57df", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 357741, "upload_time": "2017-02-10T03:55:21", "url": "https://files.pythonhosted.org/packages/48/1d/5096138e4919d88aed49e1ad4cadee23c8545afc23cd1fd34b59ee124a2e/bitshuffle-0.3.2.tar.gz" } ], "0.3.3": [ { "comment_text": "", "digests": { "md5": "bd8b12896fb03cfe7347ac352c7e4bde", "sha256": "5d2db08e5684f023f068f0348d2bbd4c4beeb0a8f955da2bfa38b15b4d8f808b" }, "downloads": -1, "filename": "bitshuffle-0.3.3.tar.gz", "has_sig": false, "md5_digest": "bd8b12896fb03cfe7347ac352c7e4bde", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 232862, "upload_time": "2017-11-08T23:51:04", "url": "https://files.pythonhosted.org/packages/13/73/717b17a1c8aad3574b2d496c83c52ed4021d36b66be23091c25435bb0645/bitshuffle-0.3.3.tar.gz" } ], "0.3.4": [ { "comment_text": "", "digests": { "md5": "ccb883448f3b9f841974f1745d1fb5f2", "sha256": "bbb4564749d7591634c6c1a1f4ee3a49cab95f869fa94c01af5bc317c0e2aa79" }, "downloads": -1, "filename": "bitshuffle-0.3.4.tar.gz", "has_sig": false, "md5_digest": "ccb883448f3b9f841974f1745d1fb5f2", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 232972, "upload_time": "2017-11-09T16:52:13", "url": "https://files.pythonhosted.org/packages/89/3d/7a158976ef9481ba9d25c9edf1499a9db0d301931b2f7e710a6dfb83b54d/bitshuffle-0.3.4.tar.gz" } ], "0.3.5": [ { "comment_text": "", "digests": { "md5": "57c98a2c0cb8b60205cc764d52b100f4", "sha256": "5e3224b34c026cbc07687d53e4470a21f6a955ee815509056ba42b3f83e943a0" }, "downloads": -1, "filename": "bitshuffle-0.3.5.tar.gz", "has_sig": false, "md5_digest": "57c98a2c0cb8b60205cc764d52b100f4", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 237193, "upload_time": "2018-11-05T15:51:45", "url": "https://files.pythonhosted.org/packages/af/d9/af5212bf0844c7c05c1c6493d95e16c1c18b7f2ac5f619c9c400a99aa401/bitshuffle-0.3.5.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "57c98a2c0cb8b60205cc764d52b100f4", "sha256": "5e3224b34c026cbc07687d53e4470a21f6a955ee815509056ba42b3f83e943a0" }, "downloads": -1, "filename": "bitshuffle-0.3.5.tar.gz", "has_sig": false, "md5_digest": "57c98a2c0cb8b60205cc764d52b100f4", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 237193, "upload_time": "2018-11-05T15:51:45", "url": "https://files.pythonhosted.org/packages/af/d9/af5212bf0844c7c05c1c6493d95e16c1c18b7f2ac5f619c9c400a99aa401/bitshuffle-0.3.5.tar.gz" } ] }