{ "info": { "author": "Cory Giles", "author_email": "mail@corygil.es", "bugtrack_url": null, "classifiers": [], "description": "====\nh5df\n====\n\nPython library and CLI for storing numeric data frames in HDF5.\n\nRationale\n=========\n\nPandas has utilities for storing data frames in HDF5, but it uses\nPyTables under the hood, which means it is limited to frames with a\nrelatively low number of columns (low 1000s).\n\nThis library is intended for storing and querying arbitrarily large\nnumeric matrices which have row and column names. It has a CLI\nwhich can export/import to/from delimited text, or it can be used\nfrom within Python with tight integration with Pandas.\n\nThis library stores only *numeric* matrices, so it cannot handle\ndata frames with mixed types (e.g., some strings and some numbers).\n\nInstallation\n============\n\nFrom PyPI:\n\n.. code-block:: bash\n\n pip install h5df\n\nLatest version:\n\n.. code-block:: bash\n\n git clone https://github.com/gilesc/h5df.git\n cd h5df\n python setup.py install --user\n\nThis installs the CLI script \"h5df\", and a Python module with the\nsame name.\n\nUsage\n=====\n\n.. code-block:: bash\n\n $ cat in.txt\n\n A B C\n X 1 2 3\n Y 4 5 5\n\n $ h5df load foo.h5 /my/path < in.txt\n $ h5df dump foo.h5 /my/path\n\n A B C\n ...\n\nTo select an individual row or column, use \"h5py row|column\":\n\n.. code-block:: bash\n\n $ h5df row foo.h5 X\n \n\nCLI flags\n=========\n\nUse ``h5df --help`` for a full listing of options, but a few useful ones:\n\n- ``h5df load -v`` : will output progress as a matrix is loaded (every 100 rows)\n- ``h5df -p N`` will output values with decimal precision N\n\nAPI\n===\n\nThe two main classes are ``h5df.Store`` and ``h5df.Frame``, representing a HDF5\nfile and individual data frame, respectively. Here is some example usage:\n\n.. code-block:: python\n\n >> from h5df import Store\n >> import pandas as pd\n >> import numpy as np\n >> np.random.seed(0)\n\n # Create a Store object; the default mode is read-only. \n # See http://docs.h5py.org/en/latest/high/file.html for available modes\n >> store = Store(\"test.h5df\", mode=\"a\")\n >> index = [\"A\",\"B\",\"C\"]\n >> columns = [\"V\",\"W\",\"X\",\"Y\",\"Z\"]\n >> mkdf = lambda: pd.DataFrame(np.random.random((3,5)), index=index, columns=columns)\n >> store.put(\"/frames/1\", mkdf())\n >> store.put(\"/frames/2\", mkdf())\n\n # Iterate through HDF5 paths corresponding to Frame objects\n >> for key in store: print(key)\n\n >> df1 = store[\"/frames/1\"]\n\n # Various selection options\n\n # returns pandas.Series\n >> df1.column(\"W\") \n >> df1.row(\"A\")\n\n # returns a pandas.DataFrame\n >> df1.rows([\"A\",\"C\"]) \n >> df1.cols([\"W\",\"Y\"])\n\n # Returns the whole Frame as a pandas.DataFrame\n >> df1.to_frame()\n\nThe full list of methods supported by ``h5df.Frame`` is:\n\n- ``Frame.row(key)`` and ``Frame.col(key)`` - return a ``pandas.Series``\n corresponding to the row/column\n\n- ``Frame.rows(keys)`` and ``Frame.cols(keys)`` - given a list of row/column\n index names, return an in-memory ``pandas.DataFrame`` corresponding to the\n subset of the overall ``Frame`` containing the desired rows or columns\n\n- ``Frame.shape`` - returns a tuple of (# rows, # columns)\n\n- ``Frame.to_frame()`` - return the entire ``Frame`` as an in-memory\n ``pandas.DataFrame``. Make sure you have enough memory!\n\n- ``Frame.add(key, data)`` - add a new row to the matrix with the given unique key. Due to the way of\n\nStorage format\n==============\n\nEach ``h5df.Frame`` is stored as an HDF5 Group containing 3 Datasets: ``index``\nand ``columns`` (both are 1D arrays of 8-byte integers or UTF-8 encoded binary\nstrings), and ``data`` (a 2D double array). \n\nThe Group also contains a few HDF5 attributes:\n- ``h5df.index_type`` and ``h5df.columns_type`` : a string, either \"str\" or \"int\", \n marking the data type of each of the corresponding indices\n- ``h5df.is_frame`` : a boolean, always set to true, which indicates that\n this Group contains valid ``Frame`` data\n\nBecause of this design, it is possible to store a ``Frame`` \"inside\" the Group\ncontaining another ``Frame``, but is not recommended in case of future format\nchanges (and because it is confusing).\n\nPerformance notes\n=================\n\nData is indexed row-major. Thus row-based queries will be much faster.\nGenerally you should pre-transpose your matrix before putting it into the\n``Store`` to ensure that the most frequently queried axis will be on the rows.\n\nThe ``h5df.Store()`` constructor takes a keyword argument, \"driver\". The full\ndescription of available drivers is at\nhttp://docs.h5py.org/en/latest/high/file.html . For Linux systems, the default\nstdio-based driver is \"sec2\", whereas \"core\" will memory-map the whole HDF5\nfile. If your system supports it and the file is frequently used (and therefore\nwill be in your OS page cache), \"core\" may be faster, especially for reads.\n\nLimitations\n===========\n\nCurrently there is no way to select rows by numeric index location (i.e., the\nequivalent to ``pandas.DataFrame.iloc``).\n\nEncoding and decoding indices (from unicode to binary) is a little slow,\nmeaning that quick queries are slower than they could be.\n\nIterating through the frames in a HDF5 file, ``Store.__iter__`` is quite\ninefficient if the file contains large numbers of frames.\n\nFor ``Frame.dump()``, output formatting is not vectorized (slower than\nnecessary).\n\nString indices are stored as ``np.dtype(\"|S100\")`` encoded as ``\"utf-8\"``.\nThis has several practical consequences: \n\n1. index and column names are currently limited to 100 UTF-8 characters\n2. UTF-8 encoding is hardcoded and other encodings are not supported \n (thus, characters from other encodings that will fail \n ``str.encode(\"utf-8\")`` will cause an error.\n\nThere are plans to fix these limitations in future versions.\n\nPotential gotchas\n=================\n\nWhen matrices are renamed or deleted using ``Store.rename()`` or\n``Store.delete()``, existing ``Frame`` objects based on this data will not be\nnotified of this change and their behavior is undefined. Most likely an attempt\nto use dangling ``Frame`` objects will result in an error, but may return\nerroneous results for some methods.\n\nIt is up to the user to avoid this situation. When renaming a ``Frame``, the\nuser should subsequently get a new ``Frame`` object from the destination path\nusing ``Store.__getitem__`` if they plan to continue to use the data.\n\nLicense\n=======\n\nAGPLv3+\n", "description_content_type": null, "docs_url": null, "download_url": "https://github.com/gilesc/h5df/tarball/0.1.5", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/gilesc/h5df", "keywords": null, "license": "AGPLv3+", "maintainer": null, "maintainer_email": null, "name": "h5df", "package_url": "https://pypi.org/project/h5df/", "platform": "UNKNOWN", "project_url": "https://pypi.org/project/h5df/", "project_urls": { "Download": "https://github.com/gilesc/h5df/tarball/0.1.5", "Homepage": "https://github.com/gilesc/h5df" }, "release_url": "https://pypi.org/project/h5df/0.1.5/", "requires_dist": null, "requires_python": null, "summary": "Library and CLI for storing numeric data frames in HDF5", "version": "0.1.5" }, "last_serial": 1685891, "releases": { "0.0.2": [ { "comment_text": "", "digests": { "md5": "bf37383ef97bd1cd28e33cd74f71a7de", "sha256": "ad5ca28ce56c695fad4aa13fc697f03ea659650093cdc2c1db15141d43425dbc" }, "downloads": -1, "filename": "h5df-0.0.2.tar.gz", "has_sig": false, "md5_digest": "bf37383ef97bd1cd28e33cd74f71a7de", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 1472, "upload_time": "2014-11-25T02:03:55", "url": "https://files.pythonhosted.org/packages/55/cd/9ff21ad65fcc4ca5254aaa17624e913edf5976e04b40c0cac39a5d195a37/h5df-0.0.2.tar.gz" } ], "0.0.3": [ { "comment_text": "", "digests": { "md5": "cd935fd46dc93bae19f012f061a81ec2", "sha256": "026010e43b81f4b305c6d8853ecfabfcd43dc99136bdc434e9afa5de8e1a1541" }, "downloads": -1, "filename": "h5df-0.0.3.tar.gz", "has_sig": false, "md5_digest": "cd935fd46dc93bae19f012f061a81ec2", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 1467, "upload_time": "2014-11-25T02:05:33", "url": "https://files.pythonhosted.org/packages/4d/f8/5b5d3f8285db3765b29aa64f0eeab1e6e10e6941a6927999cc98e4757036/h5df-0.0.3.tar.gz" } ], "0.0.3-dev": [ { "comment_text": "", "digests": { "md5": "f590a3e4e4cb1de338bb93ac43f6960d", "sha256": "8bded8937d9baaa11b0bbc89b8203ae6c4ca8b2cca0fffff52ff111fd0693b04" }, "downloads": -1, "filename": "h5df-0.0.3-dev.tar.gz", "has_sig": false, "md5_digest": "f590a3e4e4cb1de338bb93ac43f6960d", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 1475, "upload_time": "2014-11-25T02:03:26", "url": "https://files.pythonhosted.org/packages/dd/15/2ba58042f017865ccb8e3eabd21fedd8604e803b39603299014c962d2d8d/h5df-0.0.3-dev.tar.gz" } ], "0.0.4": [ { "comment_text": "", "digests": { "md5": "142e5bb05de52dd1b55d03bf6546883c", "sha256": "7ceab78d19c05d6408e9befb602477569ff8e3b5120d55d4c9c72dc49dc71b2d" }, "downloads": -1, "filename": "h5df-0.0.4.tar.gz", "has_sig": false, "md5_digest": "142e5bb05de52dd1b55d03bf6546883c", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 1469, "upload_time": "2014-11-25T17:14:37", "url": "https://files.pythonhosted.org/packages/5f/05/a1bf0df13f676b1a919321d3b42c11a0485938f73616ed13d9009c0135fb/h5df-0.0.4.tar.gz" } ], "0.0.5": [ { "comment_text": "", "digests": { "md5": "49975e74305bf0c2753151f64b9f6a40", "sha256": "62a2bf99f659a2311cafcf7239e5f172bfeefea947e2ade8b7ae0b7f9789acf1" }, "downloads": -1, "filename": "h5df-0.0.5.tar.gz", "has_sig": false, "md5_digest": "49975e74305bf0c2753151f64b9f6a40", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 1477, "upload_time": "2014-11-25T17:18:29", "url": "https://files.pythonhosted.org/packages/10/04/8f7408b6d1c4b9d0a3805a823f78196085e5bb5a356d41776debcb90bb05/h5df-0.0.5.tar.gz" } ], "0.0.6": [ { "comment_text": "", "digests": { "md5": "d6248497e434d54e4672367b9ccb3301", "sha256": "b4eeaf576d3d1ebcdc3577e928c7aedca5aaa4f5a79654e732139b66be2e5909" }, "downloads": -1, "filename": "h5df-0.0.6.tar.gz", "has_sig": false, "md5_digest": "d6248497e434d54e4672367b9ccb3301", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 3459, "upload_time": "2014-11-25T17:19:31", "url": "https://files.pythonhosted.org/packages/e0/15/18dc5c5c2fedb0fc919362bb83c6cc67c34ea66a1f13d14297c996cc1b7d/h5df-0.0.6.tar.gz" } ], "0.0.7": [ { "comment_text": "", "digests": { "md5": "b1896d1d3c2568c7c316b13dd4adfc30", "sha256": "7c2318a51c7a1465c092a99762f8e28372558e36406cb79cbb6ce6d2b034613f" }, "downloads": -1, "filename": "h5df-0.0.7.tar.gz", "has_sig": false, "md5_digest": "b1896d1d3c2568c7c316b13dd4adfc30", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 3524, "upload_time": "2014-11-25T17:24:22", "url": "https://files.pythonhosted.org/packages/e3/52/2b6f7b880a598cee476964d69303b0203c7e05a827f487afd2a10354b1c5/h5df-0.0.7.tar.gz" } ], "0.1.0": [ { "comment_text": "", "digests": { "md5": "c82d8a97e76c8d1a8cfdab71d7298a61", "sha256": "9ab68c29502333765258b9caa6152f8834a8dff947318b86261042c296785394" }, "downloads": -1, "filename": "h5df-0.1.0.tar.gz", "has_sig": false, "md5_digest": "c82d8a97e76c8d1a8cfdab71d7298a61", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 5466, "upload_time": "2015-07-07T00:34:39", "url": "https://files.pythonhosted.org/packages/c9/0b/347fb8b46c7ed3e2f89121db7b7e1b5d9cb371d89aa1a68ee5cda443b0b8/h5df-0.1.0.tar.gz" } ], "0.1.1": [ { "comment_text": "", "digests": { "md5": "57332d78ce693a959f1f80b07753a18e", "sha256": "eb09e51319f108d2b2d5c0ac82b2509a151be55273736dc4ca4152c5fc598b59" }, "downloads": -1, "filename": "h5df-0.1.1.tar.gz", "has_sig": false, "md5_digest": "57332d78ce693a959f1f80b07753a18e", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 6044, "upload_time": "2015-07-07T21:55:07", "url": "https://files.pythonhosted.org/packages/f7/ef/702f49ae2ad7946f2be7d041d6d5192939efae3aa64c0fe21d0ad3cacb23/h5df-0.1.1.tar.gz" } ], "0.1.2": [ { "comment_text": "", "digests": { "md5": "dba1709fc1125420f05df635cb930781", "sha256": "907267e0566e5edaa8c8063b84b871bf5783d265b130e8218f794e1f7f4b0725" }, "downloads": -1, "filename": "h5df-0.1.2.tar.gz", "has_sig": false, "md5_digest": "dba1709fc1125420f05df635cb930781", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 6038, "upload_time": "2015-07-07T21:56:26", "url": "https://files.pythonhosted.org/packages/3b/a5/992f777a0019d591681c525142003e32a8547a82f28e74e1658f44bc19ff/h5df-0.1.2.tar.gz" } ], "0.1.5": [ { "comment_text": "", "digests": { "md5": "9b8b7cc53e14b60c1f620211e533da47", "sha256": "c1cbdc2723a1416abb0a58c69b6be4931b10c25f74962f6d5e48c149b8c08032" }, "downloads": -1, "filename": "h5df-0.1.5.tar.gz", "has_sig": false, "md5_digest": "9b8b7cc53e14b60c1f620211e533da47", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 7784, "upload_time": "2015-08-20T15:50:31", "url": "https://files.pythonhosted.org/packages/db/f8/702d2970c166c3a17e16080806ada7e088e603a3e7825561ef52dec5e5f8/h5df-0.1.5.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "9b8b7cc53e14b60c1f620211e533da47", "sha256": "c1cbdc2723a1416abb0a58c69b6be4931b10c25f74962f6d5e48c149b8c08032" }, "downloads": -1, "filename": "h5df-0.1.5.tar.gz", "has_sig": false, "md5_digest": "9b8b7cc53e14b60c1f620211e533da47", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 7784, "upload_time": "2015-08-20T15:50:31", "url": "https://files.pythonhosted.org/packages/db/f8/702d2970c166c3a17e16080806ada7e088e603a3e7825561ef52dec5e5f8/h5df-0.1.5.tar.gz" } ] }