{ "info": { "author": "Tim Sherratt", "author_email": "tim@discontents.com.au", "bugtrack_url": null, "classifiers": [], "description": "# TroveHarvester\n\nThis is a tool for harvesting large quantities of digitised newspaper articles from [Trove](http://trove.nla.gov.au).\n\nIt has been tested on MacOS and Windows 7, and should work ok with Python 2.7 and Python 3.\n\n## Installation options\n\n### No installation required!\n\nIf you want to use the harvester without installing anything, just head over to the [Trove Newspaper Harvester](https://github.com/GLAM-Workbench/trove-newspaper-harvester) repository in my GLAM Workbench.\n\n### Installation via Docker\n\nAssuming you have Docker installed and running, just spin up a troveharvester container:\n\n``` bash\n\n $ docker run -v $(pwd):/troveharvester/data -it wragge/troveharvester /bin/bash\n\n```\n\nNote that this will store the harvested data in the current working directory on your local filesystem.\n\n### Installation via pip\n\nAssuming you have Python and [Virtualenv](https://virtualenv.pypa.io/en/latest/) installed just:\n\n``` bash\n $ virtualenv mytroveharvests\n $ cd mytroveharvests\n $ source bin/activate\n $ pip install troveharvester\n```\n\nOn Windows it should be:\n\n``` bash\n\n > virtualenv mytroveharvests\n > cd mytroveharvests\n > Scripts\\activate\n > pip install troveharvester\n```\n\n## Basic usage\n\nBefore you do any harvesting you need to get yourself a [Trove API key](http://help.nla.gov.au/trove/building-with-trove/api).\n\nThere are three basic commands:\n\n* **start** -- start a new harvest\n* **restart** -- restart a stalled harvest\n* **report** -- view harvest details\n\n### Start a harvest\n\nTo start a new harvest you can just do:\n\n``` bash\n $ cd mytroveharvests\n $ source bin/activate\n $ troveharvester start \"[Trove query]\" [Trove API key]\n```\n\nOr on Windows:\n\n``` bash\n > cd mytroveharvests\n > Scripts\\activate\n > troveharvester start \"[Trove query]\" [Trove API key]\n```\n\nThe Trove query can either be a url copy and pasted from a search in the [Trove web interface](http://trove.nla.gov.au/newspaper/), or a Trove API query url constructed using something like the [Trove API Console](https://troveconsole.herokuapp.com/). Enclose the url in double quotes.\n\nA `data` directory will be automatically created to hold all of your harvests. Each harvest will be saved into a directory named with a current timestamp. Details of harvested articles are written to a CSV file named `results.csv`. The harvest configuration details are also saved to a `metadata.json` file.\n\nOptions:\n\n--max [integer]\n specify a maximum number of articles to harvest (multiples of 20)\n\n--pdf\n save a copy of each each as a PDF (this makes the harvest a *lot* slower as you have to allow a couple of seconds for each PDF to generate)\n\n--text\n save the OCRd text of each article into a separate ``.txt`` file\n\n### Restart a harvest\n\nThings go wrong and harvests get interrupted. If your harvest stops before it should, you can just do:\n\n``` bash\n $ troveharvester restart\n```\n\nBy default the script will try to restart the most recent harvest. You can also restart an earlier harvest:\n\n``` bash\n $ troveharvester restart --harvest [harvest timestamp]\n```\n\n### Get a summary of a harvest\n\nIf you'd like to quickly check the status of a harvest, just try:\n\n``` bash\n $ troveharvester report\n```\n\nBy default the script will report on the most recent harvest. You can get a summary for an earlier harvest:\n\n``` bash\n $ troveharvester report --harvest [harvest timestamp]\n```\n\n\n", "description_content_type": "text/markdown", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/wragge/troveharvester", "keywords": "", "license": "CC0", "maintainer": "", "maintainer_email": "", "name": "troveharvester", "package_url": "https://pypi.org/project/troveharvester/", "platform": "", "project_url": "https://pypi.org/project/troveharvester/", "project_urls": { "Homepage": "https://github.com/wragge/troveharvester" }, "release_url": "https://pypi.org/project/troveharvester/0.2.2/", "requires_dist": [ "unicodecsv", "requests", "arrow", "tqdm" ], "requires_python": "", "summary": "Tool for harvesting Trove digitised newspaper articles.", "version": "0.2.2" }, "last_serial": 4714447, "releases": { "0.1.0": [ { "comment_text": "", "digests": { "md5": "4ec5214a8fc5e04bf6189e9c93de4163", "sha256": "48cd48d9f5b73810a58b7992afb742dc75688ef25eac5ca93ed8c79380650ca0" }, "downloads": -1, "filename": "troveharvester-0.1.0.tar.gz", "has_sig": false, "md5_digest": "4ec5214a8fc5e04bf6189e9c93de4163", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 9695, "upload_time": "2016-04-23T12:14:27", "url": "https://files.pythonhosted.org/packages/89/ee/f9c3ce6c40da65bfe632d6df224bd1dd75def469302e8b1419c6d466f119/troveharvester-0.1.0.tar.gz" } ], "0.1.1": [ { "comment_text": "", "digests": { "md5": "3941b430fde6d095fa0944d247b5858d", "sha256": "23941be5c6751a15b060dff7ea0790ed4711f48bca79e242369095bee2d68a9e" }, "downloads": -1, "filename": "troveharvester-0.1.1.tar.gz", "has_sig": false, "md5_digest": "3941b430fde6d095fa0944d247b5858d", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 12688, "upload_time": "2016-04-24T01:27:13", "url": "https://files.pythonhosted.org/packages/8e/02/0b79db5a1b781caff8a2aa59b1c6b5208a61e75d1c3cca41aa862ee73a71/troveharvester-0.1.1.tar.gz" } ], "0.1.10": [ { "comment_text": "", "digests": { "md5": "8fc47f684ce0c9d6ec654dcb41be7eef", "sha256": "e1c9d86c92046a0508943cfe1d14234bdfe1833b688e5539f07e777da5e2c6ca" }, "downloads": -1, "filename": "troveharvester-0.1.10.tar.gz", "has_sig": false, "md5_digest": "8fc47f684ce0c9d6ec654dcb41be7eef", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 13277, "upload_time": "2017-04-14T06:03:20", "url": "https://files.pythonhosted.org/packages/54/21/e16351124746d011b9812dca27960a9b12d45e13d51336190bd687b46007/troveharvester-0.1.10.tar.gz" } ], "0.1.11": [ { "comment_text": "", "digests": { "md5": "4e14550f22c2323c85bee570b4ed1234", "sha256": "c3bf38c0cc5afa9557934b26c8f0d8682fb3f93540de2a44ba866165018868d6" }, "downloads": -1, "filename": "troveharvester-0.1.11.tar.gz", "has_sig": false, "md5_digest": "4e14550f22c2323c85bee570b4ed1234", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 13470, "upload_time": "2018-02-23T05:33:28", "url": "https://files.pythonhosted.org/packages/bd/d7/b14dfe054863d1094083f95719c06041021d516ba71ddc9c5afffdeb699b/troveharvester-0.1.11.tar.gz" } ], "0.1.12": [ { "comment_text": "", "digests": { "md5": "06d4504673c326ad66817cb5623fdf1b", "sha256": "027f065c0822f89c01a2a2154a7fe0ce0315903d5f3c713a907304c783829a93" }, "downloads": -1, "filename": "troveharvester-0.1.12.tar.gz", "has_sig": false, "md5_digest": "06d4504673c326ad66817cb5623fdf1b", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 13488, "upload_time": "2018-02-23T05:44:54", "url": "https://files.pythonhosted.org/packages/1e/ac/0a9203027344f8e95816943030c2f81eaab13e8cfb6a03581f3c6e72b859/troveharvester-0.1.12.tar.gz" } ], "0.1.13": [ { "comment_text": "", "digests": { "md5": "76e86d26656d2f61ffc6374fd905c42b", "sha256": "d1deb1fea48664c5172c5badd943038fbb865280765ec4ebd36f9ae090883c7e" }, "downloads": -1, "filename": "troveharvester-0.1.13.tar.gz", "has_sig": false, "md5_digest": "76e86d26656d2f61ffc6374fd905c42b", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 13629, "upload_time": "2018-05-10T06:00:28", "url": "https://files.pythonhosted.org/packages/cc/1b/f5384967d020f5d6d5536fde23eb2e8ffe404157f6eac62bf268994cc09d/troveharvester-0.1.13.tar.gz" } ], "0.1.2": [ { "comment_text": "", "digests": { "md5": "7f852128e8869c0b71693cd5b702c071", "sha256": "5e011333dbd1066a21daf3b0abe2c3dc0b520b57d0327d096024b9e9c1407c6a" }, "downloads": -1, "filename": "troveharvester-0.1.2.tar.gz", "has_sig": false, "md5_digest": "7f852128e8869c0b71693cd5b702c071", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 12786, "upload_time": "2016-04-25T04:25:47", "url": "https://files.pythonhosted.org/packages/0a/ec/9b6fe31dd3d6bc4e80703570ccb8a7c57202cc16f8415b4653aabeb1148f/troveharvester-0.1.2.tar.gz" } ], "0.1.3": [ { "comment_text": "", "digests": { "md5": "3fd7752e09dc8ec4ec85dd9afc3baeb6", "sha256": "606d8e6b0b3fc810824265a1e2ba24962838ec666d2e694fd242496ac4186d97" }, "downloads": -1, "filename": "troveharvester-0.1.3.tar.gz", "has_sig": false, "md5_digest": "3fd7752e09dc8ec4ec85dd9afc3baeb6", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 13006, "upload_time": "2016-04-25T05:34:03", "url": "https://files.pythonhosted.org/packages/23/be/4e910566d560f2ebe641797da027d8a26ca6c4ac365ea350508af2f58070/troveharvester-0.1.3.tar.gz" } ], "0.1.4": [ { "comment_text": "", "digests": { "md5": "b50c6f10e2cc15d1684211b4ec08dba8", "sha256": "c61b9e13f2afdc7778b6f24d0fdc03f150c8cdeb0337d3e4fd4b88b104dbbea6" }, "downloads": -1, "filename": "troveharvester-0.1.4.tar.gz", "has_sig": false, "md5_digest": "b50c6f10e2cc15d1684211b4ec08dba8", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 12999, "upload_time": "2016-04-25T05:50:30", "url": "https://files.pythonhosted.org/packages/aa/4a/7d1b08964e4729cecc3e2fe5b5f23c9b1b7961091b734882757b3dd4685f/troveharvester-0.1.4.tar.gz" } ], "0.1.5": [ { "comment_text": "", "digests": { "md5": "cb8b170e5ee962cbd896f26da406874b", "sha256": "0eafa3b35b11ee8184be0a89c234d68db9eb7d31c12f5e21771703f42b8cabb4" }, "downloads": -1, "filename": "troveharvester-0.1.5.tar.gz", "has_sig": false, "md5_digest": "cb8b170e5ee962cbd896f26da406874b", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 13086, "upload_time": "2016-05-23T08:48:06", "url": "https://files.pythonhosted.org/packages/b1/89/0c60520c014b34a91fd291092e88ac40039e369d362fb5149ad3e12610c5/troveharvester-0.1.5.tar.gz" } ], "0.1.6": [ { "comment_text": "", "digests": { "md5": "a5848550091d32db20704aa0ced59405", "sha256": "2db64e1c37b931c790e9bf993f727d13a8a4f586a5cb36bf1a54992be2035cac" }, "downloads": -1, "filename": "troveharvester-0.1.6.tar.gz", "has_sig": false, "md5_digest": "a5848550091d32db20704aa0ced59405", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 13124, "upload_time": "2016-11-05T11:15:51", "url": "https://files.pythonhosted.org/packages/84/98/4a52ed64fdf626987deae93a30bb08401de52a2e28494990690b978fa5d6/troveharvester-0.1.6.tar.gz" } ], "0.1.7": [ { "comment_text": "", "digests": { "md5": "39b6ad4035f432a6fb58075fcc10e536", "sha256": "81d1b0f0f038d3a8fd34b103b7fb1d404c0298b03a796b220e008fa7cb05e64f" }, "downloads": -1, "filename": "troveharvester-0.1.7.tar.gz", "has_sig": false, "md5_digest": "39b6ad4035f432a6fb58075fcc10e536", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 13200, "upload_time": "2016-12-03T00:51:09", "url": "https://files.pythonhosted.org/packages/a8/87/1476c1db885f184f62aef07c02468885e0efbc44b355c6ac0159f78e8247/troveharvester-0.1.7.tar.gz" } ], "0.1.8": [ { "comment_text": "", "digests": { "md5": "8d1c85a0c0b17872b5a3e1cd00fec0bc", "sha256": "a522ff0f90c7fbed17eae80454958c0111d650b0f66d3d407abeeffa84036556" }, "downloads": -1, "filename": "troveharvester-0.1.8.tar.gz", "has_sig": false, "md5_digest": "8d1c85a0c0b17872b5a3e1cd00fec0bc", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 13269, "upload_time": "2016-12-03T01:29:48", "url": "https://files.pythonhosted.org/packages/84/29/14433ecdf566fbb7550d35744807f4168763424703c6a4fd25efd4865e3e/troveharvester-0.1.8.tar.gz" } ], "0.2.1": [ { "comment_text": "", "digests": { "md5": "f37c6c10cd4b3a89d1bc9b110bfbd06c", "sha256": "b9807db54ec91e8b0d277f39b061f979f720cf7c31638c8fd4dd653013ab14ef" }, "downloads": -1, "filename": "troveharvester-0.2.1-py3-none-any.whl", "has_sig": false, "md5_digest": "f37c6c10cd4b3a89d1bc9b110bfbd06c", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 7479, "upload_time": "2019-01-18T10:25:01", "url": "https://files.pythonhosted.org/packages/e3/d9/f42b49a40a2df2341308b4dd7e424863f506873ebd5e133edfd7e5ef7307/troveharvester-0.2.1-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "1b1855da20e04a1191911713d395b11e", "sha256": "564edf71cf1211985d73a7e5fcdd913d1e12f186a93b5f30772c8695d3ee11fa" }, "downloads": -1, "filename": "troveharvester-0.2.1.tar.gz", "has_sig": false, "md5_digest": "1b1855da20e04a1191911713d395b11e", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 10680, "upload_time": "2019-01-18T10:25:03", "url": "https://files.pythonhosted.org/packages/ed/53/9d53259b10f5cc5d225f758dd2ec950edbf98ec125b4e901dfebd60bf986/troveharvester-0.2.1.tar.gz" } ], "0.2.2": [ { "comment_text": "", "digests": { "md5": "144b042e97f9639d0de4a25a3782f255", "sha256": "b372bad765ed18e423ace7e792e919b4af516b8ba3ce4e0fcee60838452a214d" }, "downloads": -1, "filename": "troveharvester-0.2.2-py3-none-any.whl", "has_sig": false, "md5_digest": "144b042e97f9639d0de4a25a3782f255", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 8834, "upload_time": "2019-01-19T03:01:45", "url": "https://files.pythonhosted.org/packages/00/12/57284df3a8555ce2e227058017a631b6ea142be698a92e32a58bdb3a299a/troveharvester-0.2.2-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "3b8f3ba89b39ffbb5be40c5a7c4f728c", "sha256": "1e790a44c79534f9b5ba31a57704b4e73410dbde0eb800a63394866ee49bbe53" }, "downloads": -1, "filename": "troveharvester-0.2.2.tar.gz", "has_sig": false, "md5_digest": "3b8f3ba89b39ffbb5be40c5a7c4f728c", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 12079, "upload_time": "2019-01-19T03:01:47", "url": "https://files.pythonhosted.org/packages/a8/8c/dd6c4aa77d72c144c0197080a44057b3f390e5447fff199b66d06c419ccc/troveharvester-0.2.2.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "144b042e97f9639d0de4a25a3782f255", "sha256": "b372bad765ed18e423ace7e792e919b4af516b8ba3ce4e0fcee60838452a214d" }, "downloads": -1, "filename": "troveharvester-0.2.2-py3-none-any.whl", "has_sig": false, "md5_digest": "144b042e97f9639d0de4a25a3782f255", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 8834, "upload_time": "2019-01-19T03:01:45", "url": "https://files.pythonhosted.org/packages/00/12/57284df3a8555ce2e227058017a631b6ea142be698a92e32a58bdb3a299a/troveharvester-0.2.2-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "3b8f3ba89b39ffbb5be40c5a7c4f728c", "sha256": "1e790a44c79534f9b5ba31a57704b4e73410dbde0eb800a63394866ee49bbe53" }, "downloads": -1, "filename": "troveharvester-0.2.2.tar.gz", "has_sig": false, "md5_digest": "3b8f3ba89b39ffbb5be40c5a7c4f728c", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 12079, "upload_time": "2019-01-19T03:01:47", "url": "https://files.pythonhosted.org/packages/a8/8c/dd6c4aa77d72c144c0197080a44057b3f390e5447fff199b66d06c419ccc/troveharvester-0.2.2.tar.gz" } ] }