{ "info": { "author": "Matthew Rocklin", "author_email": "mrocklin@gmail.com", "bugtrack_url": null, "classifiers": [], "description": "Castra\n======\n\n|Build Status|\n\nCastra is an on-disk, partitioned, compressed, column store.\nCastra provides efficient columnar range queries.\n\n* **Efficient on-disk:** Castra stores data on your hard drive in a way that you can load it quickly, increasing the comfort of inconveniently large data.\n* **Partitioned:** Castra partitions your data along an index, allowing rapid loads of ranges of data like \"All records between January and March\"\n* **Compressed:** Castra uses Blosc_ to compress data, increasing effective disk bandwidth and decreasing storage costs\n* **Column-store:** Castra stores columns separately, drastically reducing I/O costs for analytic queries\n* **Tabular data:** Castra plays well with Pandas and is an ideal fit for append-only applications like time-series\n\nExample\n-------\n\nConsider some Pandas DataFrames\n\n.. code-block:: python\n\n In [1]: import pandas as pd\n In [2]: A = pd.DataFrame({'price': [10.0, 11.0], 'volume': [100, 200]},\n ...: index=pd.DatetimeIndex(['2010', '2011']))\n\n In [3]: B = pd.DataFrame({'price': [12.0, 13.0], 'volume': [300, 400]},\n ...: index=pd.DatetimeIndex(['2012', '2013']))\n\nWe create a Castra with a filename and a template dataframe from which to get\ncolumn name, index, and dtype information\n\n.. code-block:: python\n\n In [4]: from castra import Castra\n In [5]: c = Castra('data.castra', template=A)\n\nThe castra starts empty but we can extend it with new dataframes:\n\n.. code-block:: python\n\n In [6]: c.extend(A)\n\n In [7]: c[:]\n Out[7]:\n price volume\n 2010-01-01 10 100\n 2011-01-01 11 200\n\n In [8]: c.extend(B)\n\n In [9]: c[:]\n Out[9]:\n price volume\n 2010-01-01 10 100\n 2011-01-01 11 200\n 2012-01-01 12 300\n 2013-01-01 13 400\n\nWe can select particular columns\n\n.. code-block:: python\n\n In [10]: c[:, 'price']\n Out[10]:\n 2010-01-01 10\n 2011-01-01 11\n 2012-01-01 12\n 2013-01-01 13\n Name: price, dtype: float64\n\nParticular ranges\n\n.. code-block:: python\n\n In [12]: c['2011':'2013']\n Out[12]:\n price volume\n 2011-01-01 11 200\n 2012-01-01 12 300\n 2013-01-01 13 400\n\nOr both\n\n.. code-block:: python\n\n In [13]: c['2011':'2013', 'volume']\n Out[13]:\n 2011-01-01 200\n 2012-01-01 300\n 2013-01-01 400\n Name: volume, dtype: int64\n\nStorage\n-------\n\nCastra stores your dataframes as they arrived, you can see the divisions along\nwhich you data is divided.\n\n.. code-block:: python\n\n In [14]: c.partitions\n Out[14]:\n 2011-01-01 2009-12-31T16:00:00.000000000-0800--2010-12-31...\n 2013-01-01 2011-12-31T16:00:00.000000000-0800--2012-12-31...\n dtype: object\n\nEach column in each partition lives in a separate compressed file::\n\n $ ls -a data.castra/2011-12-31T16:00:00.000000000-0800--2012-12-31T16:00:00.000000000-0800\n . .. .index price volume\n\nRestrictions\n------------\n\nCastra is both fast and restrictive.\n\n* You must always give it dataframes that match its template (same column\n names, index type, dtypes).\n* You can only give castra dataframes with **increasing index values**. For\n example you can give it one dataframe a day for values on that day. You can\n not go back and update previous days.\n\nText and Categoricals\n---------------------\n\nCastra tries to encode text and object dtype columns with\nmsgpack_, using the implementation found in\nthe Pandas library. It falls back to `pickle` with a high protocol if that\nfails.\n\nAlternatively, Castra can categorize your data as it receives it\n\n.. code-block:: python\n\n >>> c = Castra('data.castra', template=df, categories=['list', 'of', 'columns'])\n\n or\n\n >>> c = Castra('data.castra', template=df, categories=True) # all object dtype columns\n\nCategorizing columns that have repetitive text, like ``'sex'`` or\n``'ticker-symbol'`` can greatly improve both read times and computational\nperformance with Pandas. See this blogpost_ for more information.\n\n.. _msgpack: http://msgpack.org/index.html\n\n\nDask dataframe\n--------------\n\nCastra interoperates smoothly with dask.dataframe_\n\n.. code-block:: python\n\n >>> import dask.dataframe as dd\n >>> df = dd.read_csv('myfiles.*.csv')\n >>> df.set_index('timestamp', compute=False).to_castra('myfile.castra', categories=True)\n\n >>> df = dd.from_castra('myfile.castra')\n\nWork in Progress\n----------------\n\nCastra is immature and largely for experimental use.\n\nThe developers do not promise backwards compatibility with future versions.\nYou should treat castra as a very efficient temporary format and archive your\ndata with some other system.\n\n\n\n.. _Blosc: https://github.com/Blosc\n\n.. _dask.dataframe: https://dask.pydata.org/en/latest/dataframe.html\n\n.. _blogpost: http://matthewrocklin.com/blog/work/2015/06/18/Categoricals/\n\n.. |Build Status| image:: https://travis-ci.org/blaze/castra.svg\n :target: https://travis-ci.org/blaze/castra", "description_content_type": null, "docs_url": null, "download_url": "UNKNOWN", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "http://github.com/blaze/Castra/", "keywords": "", "license": "BSD", "maintainer": null, "maintainer_email": null, "name": "castra", "package_url": "https://pypi.org/project/castra/", "platform": "UNKNOWN", "project_url": "https://pypi.org/project/castra/", "project_urls": { "Download": "UNKNOWN", "Homepage": "http://github.com/blaze/Castra/" }, "release_url": "https://pypi.org/project/castra/0.1.7/", "requires_dist": null, "requires_python": null, "summary": "On-disk partitioned store", "version": "0.1.7" }, "last_serial": 2002025, "releases": { "0.1.0": [ { "comment_text": "", "digests": { "md5": "9fe277b64a8ba19d312c8fd737295bb4", "sha256": "3a9989b50b2fa44c4977438b15c0951ad6895e2c40b35a240de5e63df1209ded" }, "downloads": -1, "filename": "castra-0.1.0-py2.7.egg", "has_sig": false, "md5_digest": "9fe277b64a8ba19d312c8fd737295bb4", "packagetype": "bdist_egg", "python_version": "2.7", "requires_python": null, "size": 12654, "upload_time": "2015-09-18T00:32:41", "url": "https://files.pythonhosted.org/packages/a8/ea/70939f0818ccd26c2c160f195e5351b8da52e658299d176883941b7e9df2/castra-0.1.0-py2.7.egg" }, { "comment_text": "", "digests": { "md5": "03d0f9ef416283118aaa2054c40e26be", "sha256": "b90d1cec076f542f55c27666882178278ed4ec8de3b579a668f86663108e2ae6" }, "downloads": -1, "filename": "castra-0.1.0.tar.gz", "has_sig": false, "md5_digest": "03d0f9ef416283118aaa2054c40e26be", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 5051, "upload_time": "2015-06-18T18:25:22", "url": "https://files.pythonhosted.org/packages/d4/09/9303da7e6eeb87f4f0f6e258908abceff54629840c40cdd736360418fa93/castra-0.1.0.tar.gz" } ], "0.1.1": [ { "comment_text": "", "digests": { "md5": "8db058b03e6d429d498f7dc1ff6aedfc", "sha256": "03a3911226cb35e3936e6123c74a0e1ed7897d2dbbe92a67218496938fe94097" }, "downloads": -1, "filename": "castra-0.1.1.tar.gz", "has_sig": false, "md5_digest": "8db058b03e6d429d498f7dc1ff6aedfc", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 6089, "upload_time": "2015-06-30T18:27:54", "url": "https://files.pythonhosted.org/packages/93/aa/c9b369723ca8363db9eed7422398130a8d28af9a43103c640d0cfabff657/castra-0.1.1.tar.gz" } ], "0.1.2": [ { "comment_text": "", "digests": { "md5": "090b36acec6a68c49f85d1ea3558b938", "sha256": "6a75e694f2b104167d24a1eb95f50282455549ebe37c6a598b026eb7fc177a18" }, "downloads": -1, "filename": "castra-0.1.2.tar.gz", "has_sig": false, "md5_digest": "090b36acec6a68c49f85d1ea3558b938", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 7190, "upload_time": "2015-07-23T21:42:57", "url": "https://files.pythonhosted.org/packages/10/c1/47dd4254ccc22bbaea87e6d11d1b735596050bd7533a871f3d54472b47b8/castra-0.1.2.tar.gz" } ], "0.1.3": [ { "comment_text": "", "digests": { "md5": "27013bda535f7903d360e397b9cf7dee", "sha256": "7d853355d46f11f1157c6e0a55e41a7a6690cad303bb05190472c861342ba12e" }, "downloads": -1, "filename": "castra-0.1.3.tar.gz", "has_sig": false, "md5_digest": "27013bda535f7903d360e397b9cf7dee", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 7525, "upload_time": "2015-08-14T16:20:12", "url": "https://files.pythonhosted.org/packages/02/c1/4499f5f0c51f29aac62b36d4effd9b84137593f2426ff10d6aaf15a749f6/castra-0.1.3.tar.gz" } ], "0.1.4": [ { "comment_text": "", "digests": { "md5": "e32ebb372a6be8ff2bdbefbed62a36af", "sha256": "b790843850f6c6a102a9742e109150245aec45ec86b2a1e50e934f5a864d3f7f" }, "downloads": -1, "filename": "castra-0.1.4.tar.gz", "has_sig": false, "md5_digest": "e32ebb372a6be8ff2bdbefbed62a36af", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 10540, "upload_time": "2015-08-28T02:11:01", "url": "https://files.pythonhosted.org/packages/f5/17/d34bc54cd63c47f8691bb005392d6f1d24abf88dd015216b50e7a13230d7/castra-0.1.4.tar.gz" } ], "0.1.5": [ { "comment_text": "", "digests": { "md5": "3301169caacd1ed6590c94d2334b9d6e", "sha256": "e4b69624a4ece1d7fd2bfa1446af8c7cba83af0205e6846222cb7968edba2c3b" }, "downloads": -1, "filename": "castra-0.1.5.tar.gz", "has_sig": false, "md5_digest": "3301169caacd1ed6590c94d2334b9d6e", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 11089, "upload_time": "2015-09-04T14:18:26", "url": "https://files.pythonhosted.org/packages/4a/f8/613c44e2b3c6e1b81d6e213ef6ef242dd911b9158cd0dfcbb2ee95b38510/castra-0.1.5.tar.gz" } ], "0.1.6": [ { "comment_text": "", "digests": { "md5": "f8c1f73ac5329390fd1c2cea62d06f96", "sha256": "da97349df91ed62e6f109cd32571fa0eef3a74a3e754edb86b38d735e2c03c33" }, "downloads": -1, "filename": "castra-0.1.6-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "f8c1f73ac5329390fd1c2cea62d06f96", "packagetype": "bdist_wheel", "python_version": "py2.py3", "requires_python": null, "size": 14821, "upload_time": "2015-09-18T00:32:37", "url": "https://files.pythonhosted.org/packages/b3/69/77cb6e51ad446d0256926bca3f87e964507b009840c45417cafce255cd3b/castra-0.1.6-py2.py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "42a698b19c2cfe531bc3a219c9e85d8f", "sha256": "1f647bb554dbce4e51bd5ee3f9ff26726762a9fae895119063b77c6b328947ae" }, "downloads": -1, "filename": "castra-0.1.6.tar.gz", "has_sig": false, "md5_digest": "42a698b19c2cfe531bc3a219c9e85d8f", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 13245, "upload_time": "2015-09-18T00:34:44", "url": "https://files.pythonhosted.org/packages/8e/5e/db6aa005371976f0e9cee742a2ca41f5d4f00e748ce48d210285f4480d43/castra-0.1.6.tar.gz" } ], "0.1.7": [ { "comment_text": "built for Darwin-14.5.0", "digests": { "md5": "20aadfae1b488ccf5e24594a397bd5ec", "sha256": "7b0ad8bb8b2ec07cfc93dbfadcd7a31604b93fca23849e2a35043887f4ca291d" }, "downloads": -1, "filename": "castra-0.1.7.macosx-10.5-x86_64.tar.gz", "has_sig": false, "md5_digest": "20aadfae1b488ccf5e24594a397bd5ec", "packagetype": "bdist_dumb", "python_version": "any", "requires_python": null, "size": 26119, "upload_time": "2016-03-11T20:33:34", "url": "https://files.pythonhosted.org/packages/60/cf/5411e7d6391f5503f54926a0955092b544b22721a13185aa4dfd27326365/castra-0.1.7.macosx-10.5-x86_64.tar.gz" }, { "comment_text": "", "digests": { "md5": "fc25b84394e8a600873a318205e3043d", "sha256": "76178775fc5f1fc25c68693bcfe75b7d20c745c69979bd8db3db694f6d22d48a" }, "downloads": -1, "filename": "castra-0.1.7.tar.gz", "has_sig": false, "md5_digest": "fc25b84394e8a600873a318205e3043d", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 12019, "upload_time": "2016-03-11T20:33:28", "url": "https://files.pythonhosted.org/packages/f1/f6/c8cd85d0c671cc7c314bbca85b638dba4250931146efdbab374a6044874c/castra-0.1.7.tar.gz" } ] }, "urls": [ { "comment_text": "built for Darwin-14.5.0", "digests": { "md5": "20aadfae1b488ccf5e24594a397bd5ec", "sha256": "7b0ad8bb8b2ec07cfc93dbfadcd7a31604b93fca23849e2a35043887f4ca291d" }, "downloads": -1, "filename": "castra-0.1.7.macosx-10.5-x86_64.tar.gz", "has_sig": false, "md5_digest": "20aadfae1b488ccf5e24594a397bd5ec", "packagetype": "bdist_dumb", "python_version": "any", "requires_python": null, "size": 26119, "upload_time": "2016-03-11T20:33:34", "url": "https://files.pythonhosted.org/packages/60/cf/5411e7d6391f5503f54926a0955092b544b22721a13185aa4dfd27326365/castra-0.1.7.macosx-10.5-x86_64.tar.gz" }, { "comment_text": "", "digests": { "md5": "fc25b84394e8a600873a318205e3043d", "sha256": "76178775fc5f1fc25c68693bcfe75b7d20c745c69979bd8db3db694f6d22d48a" }, "downloads": -1, "filename": "castra-0.1.7.tar.gz", "has_sig": false, "md5_digest": "fc25b84394e8a600873a318205e3043d", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 12019, "upload_time": "2016-03-11T20:33:28", "url": "https://files.pythonhosted.org/packages/f1/f6/c8cd85d0c671cc7c314bbca85b638dba4250931146efdbab374a6044874c/castra-0.1.7.tar.gz" } ] }