{ "info": { "author": "Ray Holder", "author_email": null, "bugtrack_url": null, "classifiers": [ "Intended Audience :: Developers", "License :: OSI Approved :: Apache Software License", "Natural Language :: English", "Programming Language :: Python", "Programming Language :: Python :: 2.6", "Programming Language :: Python :: 2.7", "Programming Language :: Python :: 3.3", "Programming Language :: Python :: 3.4", "Topic :: Internet", "Topic :: Utilities" ], "description": "csv2es\n=========================\n\n.. image:: https://img.shields.io/pypi/v/csv2es.svg\n :target: https://pypi.python.org/pypi/csv2es\n\n.. image:: https://img.shields.io/travis/rholder/csv2es.svg\n :target: https://travis-ci.org/rholder/csv2es\n\n.. image:: https://img.shields.io/pypi/dm/csv2es.svg\n :target: https://pypi.python.org/pypi/csv2es\n\nThe csv2es project is an Apache 2.0 licensed commandline utility, written in\nPython, to load a CSV (or TSV) file into an Elasticsearch instance. That's\npretty much it. That's all it does. The first row of the file should contain\nthe field names intended to be used for Elasticsearch documents otherwise things\nwill get weird. There's a little trick documented below to add a header row in\ncase the file is missing it.\n\n\nFeatures\n--------\n\n- Minimal commandline interface\n- Load CSV's or TSV's\n- Customize the delimiter to something else\n- Uses the Elasticsearch bulk API\n- Parallel bulk uploads\n- Retry on errors with exponential backoff\n\n\nInstallation\n------------\n\nTo install csv2es, simply:\n\n.. code-block:: bash\n\n $ pip install csv2es\n\n\nUsage\n-----\n::\n\n Usage: csv2es [OPTIONS]\n\n Bulk import a delimited file into a target Elasticsearch instance. Common\n delimited files include things like CSV and TSV.\n\n Load a CSV file:\n csv2es --index-name potatoes --doc-type potato --import-file potatoes.csv\n\n For a TSV file, note the tab delimiter option\n csv2es --index-name tomatoes --doc-type tomato --import-file tomatoes.tsv --tab\n\n For a nifty pipe-delimited file (delimiters must be one character):\n csv2es --index-name pipes --doc-type pipe --import-file pipes.psv --delimiter '|'\n\n Options:\n --index-name TEXT Index name to load data into [required]\n --doc-type TEXT The document type (like user_records) [required]\n --import-file TEXT File to import (or '-' for stdin) [required]\n --mapping-file TEXT JSON mapping file for index\n --delimiter TEXT The field delimiter to use, defaults to CSV\n --tab Assume tab-separated, overrides delimiter\n --host TEXT The Elasticsearch host (http://127.0.0.1:9200/)\n --docs-per-chunk INTEGER The documents per chunk to upload (5000)\n --bytes-per-chunk INTEGER The bytes per chunk to upload (100000)\n --parallel INTEGER Parallel uploads to send at once, defaults to 1\n --delete-index Delete existing index if it exists\n --quiet Minimize console output\n --version Show the version and exit.\n --help Show this message and exit.\n\n\nExamples\n--------\n\nLet's say we've got a potatoes.csv file with a nice header that looks like this::\n\n potato_id,potato_type,description\n 33,sweet,\"kinda oval\"\n 17,regular,bumpy\n 91,regular,\"perfectly round\"\n 18,sweet,delightful\n 42,fried,crispy\n 37,\"extra special\",crispy\n\nNow we can stuff it into Elasticsearch:\n\n.. code-block:: bash\n\n csv2es --index-name potatoes --doc-type potato --import-file potatoes.csv\n\nBut what if it was tomatoes.tsv and separated by tabs? Well, we can do this:\n\n.. code-block:: bash\n\n csv2es --index-name tomatoes --doc-type tomato --import-file tomatoes.tsv --tab\n\n\nAdvanced Examples\n-----------------\n\nWhat if we have a super cool pipe-delimited file and want to wipe out the\nexisting \"pipes\" index every time we load it up? This ought to handle that case:\n\n.. code-block:: bash\n\n csv2es --index-name pipes --delete-index --doc-type pipe --import-file pipes.psv --delimiter '|'\n\nElasticsearch is great, but it's doing something strange to our documents when\nwe try to facet by certain fields. Let's create our own custom mapping file to\nspecify the fields used in Elasticsearch for that potatoes.csv called\npotatoes.mapping.json:\n\n.. code-block:: json\n\n {\n \"dynamic\": \"true\",\n \"properties\": {\n \"potato_id\": {\"type\": \"long\"},\n \"potato_type\": {\"type\": \"string\", \"index\" : \"not_analyzed\"},\n \"description\": {\"type\": \"string\", \"index\" : \"not_analyzed\"},\n }\n }\n\nNow let's load the data with a custom mapping file:\n\n.. code-block:: bash\n\n csv2es --index-name potatoes --doc-type potato --mapping-file potatoes.mapping.json --import-file potatoes.csv\n\nWhat if my file is missing the header row, and it's super huge because there are\nso many potatoes in it, and everything is terrible? We can use sed to tack on a\nnice header with something like this:\n\n.. code-block:: bash\n\n sed -i 1i\"potato_id,potato_type,description\" potatoes.csv\n\nAs long as you have more disk space than the size of the file, this should be fine.\n\n\nContribute\n----------\n\n#. Check for open issues or open a fresh issue to start a discussion around a feature idea or a bug.\n#. Fork `the repository`_ on GitHub to start making your changes to the **master** branch (or branch off of it).\n#. Write a test which shows that the bug was fixed or that the feature works as expected.\n#. Send a pull request and bug the maintainer until it gets merged and published. :) Make sure to add yourself to AUTHORS_.\n\n.. _`the repository`: https://github.com/rholder/csv2es\n.. _AUTHORS: https://github.com/rholder/csv2es/blob/master/AUTHORS.rst\n\n\n.. :changelog:\n\nHistory\n-------\n\n1.0.1 (2015-06-02)\n++++++++++++++++++\n- Add option to stream from stdin\n\n1.0.0 (2015-04-23)\n++++++++++++++++++\n- Add retrying support with exponential backoff per chunk for bulk uploads\n- Add parallel bulk uploading via joblib\n- Stable release\n\n1.0.0.dev3 (2015-04-19)\n+++++++++++++++++++++++\n- Switch over to Click for handling executable\n- Fix --delete-index flag\n- Add --version option\n\n1.0.0.dev2 (2015-04-19)\n+++++++++++++++++++++++\n- Fix import errors\n\n1.0.0.dev1 (2015-04-18)\n+++++++++++++++++++++++\n- Tinkering with documentation and PyPI updates\n\n1.0.0.dev0 (2015-04-18)\n+++++++++++++++++++++++\n- First dev version now exists\n- Apache 2.0 license applied\n- Finalize commandline interface\n- Sanitizing some setup.py and test suite running\n- Added Travis CI support", "description_content_type": null, "docs_url": null, "download_url": null, "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/rholder/csv2es", "keywords": "elasticsearch es pyelasticsearch csv tsv bulk import kibana", "license": "Apache 2.0", "maintainer": null, "maintainer_email": null, "name": "csv2es", "package_url": "https://pypi.org/project/csv2es/", "platform": "UNKNOWN", "project_url": "https://pypi.org/project/csv2es/", "project_urls": { "Homepage": "https://github.com/rholder/csv2es" }, "release_url": "https://pypi.org/project/csv2es/1.0.1/", "requires_dist": null, "requires_python": null, "summary": "Bulk import a CSV or TSV into Elastic Search", "version": "1.0.1" }, "last_serial": 1575985, "releases": { "1.0.0": [ { "comment_text": "", "digests": { "md5": "48504319baa0d05507210bc6e7b040ee", "sha256": "ec379f28936dbd2418e1c5de7b54be03610c261678e1845e961539850bfcabf4" }, "downloads": -1, "filename": "csv2es-1.0.0.tar.gz", "has_sig": false, "md5_digest": "48504319baa0d05507210bc6e7b040ee", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 11230, "upload_time": "2015-04-24T00:06:30", "url": "https://files.pythonhosted.org/packages/e5/1d/9ddafcca595d21dc481433ad547ea376e6dbc59204258c11ae0a5a5b142f/csv2es-1.0.0.tar.gz" } ], "1.0.0.dev0": [ { "comment_text": "", "digests": { "md5": "655d1dcf82716a6051ed88107b091694", "sha256": "af2a3afbd4f2976439564dacd4e722baa1a7e5f0a256817452b0fb97d58f703b" }, "downloads": -1, "filename": "csv2es-1.0.0.dev0.tar.gz", "has_sig": false, "md5_digest": "655d1dcf82716a6051ed88107b091694", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 9834, "upload_time": "2015-04-19T04:39:00", "url": "https://files.pythonhosted.org/packages/ac/dc/1b3e76cdd35035bdc75f508864f333361e26375e2084ed6fd6c949c444c2/csv2es-1.0.0.dev0.tar.gz" } ], "1.0.0.dev1": [ { "comment_text": "", "digests": { "md5": "90e33579f459e95834943d296c78f94f", "sha256": "cfb388bd3a7dfd8ad68ffd45fc797f886a3b3b73c81429c6ddce5903bcf1ce5b" }, "downloads": -1, "filename": "csv2es-1.0.0.dev1.tar.gz", "has_sig": false, "md5_digest": "90e33579f459e95834943d296c78f94f", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 9829, "upload_time": "2015-04-19T04:50:01", "url": "https://files.pythonhosted.org/packages/34/b8/c73dc94fb9eb9c1c353aafe1dbe136439f266561b4040c292f5e0a69da42/csv2es-1.0.0.dev1.tar.gz" } ], "1.0.0.dev3": [ { "comment_text": "", "digests": { "md5": "22617d178d354b48c638a1cf3f2241c0", "sha256": "2b61bc5f0e05a414af70e02ad219ecc94b05450cece1a003c8a8d48b20a81685" }, "downloads": -1, "filename": "csv2es-1.0.0.dev3.tar.gz", "has_sig": false, "md5_digest": "22617d178d354b48c638a1cf3f2241c0", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 9980, "upload_time": "2015-04-19T06:17:19", "url": "https://files.pythonhosted.org/packages/74/67/db387772d74c1bcd61c8a5cf6c13587ff6682bf005aff5ebe58ec180b4a8/csv2es-1.0.0.dev3.tar.gz" } ], "1.0.1": [ { "comment_text": "", "digests": { "md5": "e36c44cc974e99338e622872dd170f95", "sha256": "bf85e165d684696aa4a91a3585892b60f20575b1efe58fac0223973ee7b1bf81" }, "downloads": -1, "filename": "csv2es-1.0.1.tar.gz", "has_sig": false, "md5_digest": "e36c44cc974e99338e622872dd170f95", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 11299, "upload_time": "2015-06-02T22:42:05", "url": "https://files.pythonhosted.org/packages/76/24/211496cc1f2159b12fbea157156f06d1ba9272d93d3a2cc97d4e797d4b6b/csv2es-1.0.1.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "e36c44cc974e99338e622872dd170f95", "sha256": "bf85e165d684696aa4a91a3585892b60f20575b1efe58fac0223973ee7b1bf81" }, "downloads": -1, "filename": "csv2es-1.0.1.tar.gz", "has_sig": false, "md5_digest": "e36c44cc974e99338e622872dd170f95", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 11299, "upload_time": "2015-06-02T22:42:05", "url": "https://files.pythonhosted.org/packages/76/24/211496cc1f2159b12fbea157156f06d1ba9272d93d3a2cc97d4e797d4b6b/csv2es-1.0.1.tar.gz" } ] }