{ "info": { "author": "Martin Durant", "author_email": "mdurant@continuum.io", "bugtrack_url": null, "classifiers": [ "Development Status :: 3 - Alpha", "Intended Audience :: Developers", "Intended Audience :: System Administrators", "License :: OSI Approved :: Apache Software License", "Programming Language :: Python", "Programming Language :: Python :: 2.7", "Programming Language :: Python :: 3", "Programming Language :: Python :: 3.4", "Programming Language :: Python :: 3.5", "Programming Language :: Python :: 3.6", "Programming Language :: Python :: Implementation :: CPython" ], "description": "fastparquet\n===========\n\n.. image:: https://travis-ci.org/jcrobak/parquet-python.svg?branch=master\n :target: https://github.com/dask/fastparquet\n\nfastparquet is a python implementation of the `parquet\nformat `_, aiming integrate\ninto python-based big data work-flows.\n\nNot all parts of the parquet-format have been implemented yet or tested\ne.g. see the Todos linked below. With that said,\nfastparquet is capable of reading all the data files from the\n`parquet-compatability `_\nproject.\n\nIntroduction\n------------\n\nDetails of this project can be found in the documentation_.\n\n.. _documentation: https://fastparquet.readthedocs.io\n\nThe original plan listing expected features can be found in\n`this issue`_.\nPlease feel free to comment on that list as to missing items and priorities,\nor raise new issues with bugs or requests.\n\n.. _this issue: https://github.com/dask/fastparquet/issues/1\n\n\n\nRequirements\n------------\n\n(all development is against recent versions in the default anaconda channels)\n\nRequired:\n\n- numba (requires `LLVM 4.0.x`_)\n- numpy\n- pandas\n- cython\n- six\n\n.. _LLVM 4.0.x: https://github.com/llvm-mirror/llvm \n\nOptional (compression algorithms; gzip is always available):\n\n- snappy (aka python-snappy)\n- lzo\n- brotli\n- lz4\n- zstandard\n\n\nInstallation\n------------\n\nInstall using conda::\n\n conda install -c conda-forge fastparquet\n\ninstall from pypi::\n\n pip install fastparquet\n\nor install latest version from github::\n\n pip install git+https://github.com/dask/fastparquet\n\nFor the pip methods, numba must have been previously installed (using conda).\n\nUsage\n-----\n\n*Reading*\n\n.. code-block:: python\n\n from fastparquet import ParquetFile\n pf = ParquetFile('myfile.parq')\n df = pf.to_pandas()\n df2 = pf.to_pandas(['col1', 'col2'], categories=['col1'])\n\nYou may specify which columns to load, which of those to keep as categoricals\n(if the data uses dictionary encoding). The file-path can be a single file,\na metadata file pointing to other data files, or a directory (tree) containing\ndata files. The latter is what is typically output by hive/spark.\n\n*Writing*\n\n.. code-block:: python\n\n from fastparquet import write\n write('outfile.parq', df)\n write('outfile2.parq', df, row_group_offsets=[0, 10000, 20000],\n compression='GZIP', file_scheme='hive')\n\nThe default is to produce a single output file with a single row-group\n(i.e., logical segment) and no compression. At the moment, only simple\ndata-types and plain encoding are supported, so expect performance to be\nsimilar to *numpy.savez*.\n\nHistory\n-------\n\nSince early October 2016, this fork of `parquet-python`_ has been\nundergoing considerable redevelopment. The aim is to have a small and simple\nand performant library for reading and writing the parquet format from python.\n\n.. _parquet-python: https://github.com/jcrobak/parquet-python", "description_content_type": "", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/dask/fastparquet/", "keywords": "", "license": "Apache License 2.0", "maintainer": "", "maintainer_email": "", "name": "workbenchdata-fastparquet", "package_url": "https://pypi.org/project/workbenchdata-fastparquet/", "platform": "", "project_url": "https://pypi.org/project/workbenchdata-fastparquet/", "project_urls": { "Homepage": "https://github.com/dask/fastparquet/" }, "release_url": "https://pypi.org/project/workbenchdata-fastparquet/0.1.6a2/", "requires_dist": null, "requires_python": ">=2.7, !=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*", "summary": "Python support for Parquet file format", "version": "0.1.6a2" }, "last_serial": 4142088, "releases": { "0.1.6a1": [ { "comment_text": "", "digests": { "md5": "10a041c98584a0d6d6bc8fc6a2737838", "sha256": "e5ec1c5cd34c2bdb16e06a8f14808814198f01ef45d7a82f307360e71a52a313" }, "downloads": -1, "filename": "workbenchdata-fastparquet-0.1.6a1.linux-x86_64.tar.gz", "has_sig": false, "md5_digest": "10a041c98584a0d6d6bc8fc6a2737838", "packagetype": "sdist", "python_version": "source", "requires_python": ">=2.7, !=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*", "size": 343371, "upload_time": "2018-08-06T22:13:40", "url": "https://files.pythonhosted.org/packages/12/f8/504dd99013ba538b2fb3ed2d8896fa883e00630cb64ddc4da583959efe7a/workbenchdata-fastparquet-0.1.6a1.linux-x86_64.tar.gz" } ], "0.1.6a2": [ { "comment_text": "", "digests": { "md5": "07974b6bad748182db91903316feed70", "sha256": "76deb9a96f4ba6064b992ee30132a14f7bb9cef3757d778de9354adb2a5744ed" }, "downloads": -1, "filename": "workbenchdata-fastparquet-0.1.6a2.tar.gz", "has_sig": false, "md5_digest": "07974b6bad748182db91903316feed70", "packagetype": "sdist", "python_version": "source", "requires_python": ">=2.7, !=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*", "size": 145250, "upload_time": "2018-08-06T22:15:23", "url": "https://files.pythonhosted.org/packages/91/73/f580613cd6fea4f097aa1d065ff9b96156d217e4cdd1f20566f92cc5a9c9/workbenchdata-fastparquet-0.1.6a2.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "07974b6bad748182db91903316feed70", "sha256": "76deb9a96f4ba6064b992ee30132a14f7bb9cef3757d778de9354adb2a5744ed" }, "downloads": -1, "filename": "workbenchdata-fastparquet-0.1.6a2.tar.gz", "has_sig": false, "md5_digest": "07974b6bad748182db91903316feed70", "packagetype": "sdist", "python_version": "source", "requires_python": ">=2.7, !=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*", "size": 145250, "upload_time": "2018-08-06T22:15:23", "url": "https://files.pythonhosted.org/packages/91/73/f580613cd6fea4f097aa1d065ff9b96156d217e4cdd1f20566f92cc5a9c9/workbenchdata-fastparquet-0.1.6a2.tar.gz" } ] }