{ "info": { "author": "Clemens Wolff", "author_email": "clemens.wolff+pypi@gmail.com", "bugtrack_url": null, "classifiers": [ "Development Status :: 5 - Production/Stable", "License :: OSI Approved :: Apache Software License", "Programming Language :: Python :: 2", "Programming Language :: Python :: 2.7", "Programming Language :: Python :: 3", "Programming Language :: Python :: 3.5", "Programming Language :: Python :: 3.6", "Programming Language :: Python :: 3.7", "Topic :: Utilities" ], "description": "*********\nGutenberg\n*********\n\n.. image:: https://travis-ci.org/c-w/gutenberg.svg?branch=master\n :target: https://travis-ci.org/c-w/gutenberg\n\n.. image:: https://clewolff.visualstudio.com/gutenberg/_apis/build/status/c-w.gutenberg?branchName=master\n :target: https://clewolff.visualstudio.com/gutenberg/\n\n.. image:: https://codecov.io/gh/c-w/gutenberg/branch/master/graph/badge.svg\n :target: https://codecov.io/gh/c-w/gutenberg\n\n.. image:: https://img.shields.io/pypi/v/gutenberg.svg\n :target: https://pypi.python.org/pypi/gutenberg/\n\n.. image:: https://img.shields.io/pypi/pyversions/gutenberg.svg\n :target: https://pypi.python.org/pypi/gutenberg/\n\n\nOverview\n========\n\nThis package contains a variety of scripts to make working with the `Project\nGutenberg `_ body of public domain texts easier.\n\nThe functionality provided by this package includes:\n\n* Downloading texts from Project Gutenberg.\n* Cleaning the texts: removing all the crud, leaving just the text behind.\n* Making meta-data about the texts easily accessible.\n\nThe package has been tested with Python 2.7 and 3.5+.\n\nAn HTTP interface to this package exists too.\n`Try it out! `_\n\n\nInstallation\n============\n\nThis project is on `PyPI `_, so I'd\nrecommend that you just install everything from there using your favourite\nPython package manager.\n\n.. sourcecode :: sh\n\n pip install gutenberg\n\nIf you want to install from source or modify the package, you'll need to clone\nthis repository:\n\n.. sourcecode :: sh\n\n git clone https://github.com/c-w/Gutenberg.git\n\nNow, you should probably install the dependencies for the package and verify\nyour checkout by running the tests.\n\n.. sourcecode :: sh\n\n cd Gutenberg\n\n virtualenv --no-site-packages virtualenv\n source virtualenv/bin/activate\n pip install -r requirements-dev.pip\n pip install .\n\n nose2\n\nAlternatively, you can also run the project via Docker:\n\n.. sourcecode :: sh\n\n docker build -t gutenberg -f Dockerfile-py3 .\n\n docker run -it -v /some/mount/path:/data gutenberg python\n\n\nPython 3\n--------\n\nThis package depends on BSD-DB. The bsddb module was removed from the Python\nstandard library since version 2.7. This means that if you wish to use gutenberg\non Python 3, you will need to manually install BSD-DB.\n\nLinux\n*****\n\nOn Linux, you can usually install BSD-DB using your distribution's package\nmanager. For example, on Ubuntu, you can use apt-get:\n\n.. sourcecode :: sh\n\n sudo apt-get install libdb++-dev\n export BERKELEYDB_DIR=/usr\n pip install .\n\nMacOS\n*****\n\nOn Mac, you can install BSD-DB using `homebrew `_:\n\n.. sourcecode :: sh\n\n brew install berkeley-db4\n pip install .\n\nWindows\n*******\n\nOn Windows, it's easiest to download a pre-compiled version of BSD-DB from\n`pythonlibs `_.\n\nFor example, if you have Python 3.5 on a 64-bit version of Windows, you\nshould download :code:`bsddb3\u20116.2.1\u2011cp35\u2011cp35m\u2011win_amd64.whl`.\n\nAfter you download the wheel, install it and you're good to go:\n\n.. sourcecode :: bash\n\n pip install bsddb3\u20116.2.1\u2011cp35\u2011cp35m\u2011win_amd64.whl\n pip install .\n\nLicense conflicts\n*****************\n\nSince its v6.x releases, BSD-DB switched to the `AGPL3 `_\nlicense which is stricter than this project's `Apache v2 `_\nlicense. This means that unless you're happy to comply to the `terms `_\nof the AGPL3 license, you'll have to install an ealier version of BSD-DB\n(anything between 4.8.30 and 5.x should be fine). If you are happy to use this\nproject under AGPL3 (or if you have a commercial license for BSD-DB), set the\nfollowing environment variable before attempting to install BSD-DB:\n\n.. sourcecode :: bash\n\n YES_I_HAVE_THE_RIGHT_TO_USE_THIS_BERKELEY_DB_VERSION=1\n\n\nApache Jena Fuseki\n------------------\n\nAs an alternative to the BSD-DB backend, this package can also use `Apache Jena Fuseki `_\nfor the metadata store. The Apache Jena Fuseki backend is activated by\nsetting the :code:`GUTENBERG_FUSEKI_URL` environment variable to the HTTP\nendpoint at which Fuseki is listening. If the Fuseki server has HTTP basic\nauthentication enabled, the username and password can be provided via the\n:code:`GUTENBERG_FUSEKI_USER` and :code:`GUTENBERG_FUSEKI_PASSWORD` environment\nvariables.\n\nFor local development, the Fuseki server can be run via Docker:\n\n.. sourcecode :: bash\n\n docker run \\\n --detach \\\n --publish 3030:3030 \\\n --env ADMIN_PASSWORD=some-password \\\n --volume /some/mount/location:/fuseki \\\n stain/jena-fuseki:3.6.0 \\\n /jena-fuseki/fuseki-server --loc=/fuseki --update /ds\n\n export GUTENBERG_FUSEKI_URL=http://localhost:3030/ds\n export GUTENBERG_FUSEKI_USER=admin\n export GUTENBERG_FUSEKI_PASSWORD=some-password\n\n\nUsage\n=====\n\nDownloading a text\n------------------\n\n.. sourcecode :: python\n\n from gutenberg.acquire import load_etext\n from gutenberg.cleanup import strip_headers\n\n text = strip_headers(load_etext(2701)).strip()\n print(text) # prints 'MOBY DICK; OR THE WHALE\\n\\nBy Herman Melville ...'\n\n.. sourcecode :: sh\n\n python -m gutenberg.acquire.text 2701 moby-raw.txt\n python -m gutenberg.cleanup.strip_headers moby-raw.txt moby-clean.txt\n\n\nLooking up meta-data\n--------------------\n\nA bunch of meta-data about ebooks can be queried:\n\n.. sourcecode :: python\n\n from gutenberg.query import get_etexts\n from gutenberg.query import get_metadata\n\n print(get_metadata('title', 2701)) # prints frozenset([u'Moby Dick; Or, The Whale'])\n print(get_metadata('author', 2701)) # prints frozenset([u'Melville, Hermann'])\n\n print(get_etexts('title', 'Moby Dick; Or, The Whale')) # prints frozenset([2701, ...])\n print(get_etexts('author', 'Melville, Hermann')) # prints frozenset([2701, ...])\n\nYou can get a full list of the meta-data that can be queried by calling:\n\n.. sourcecode :: python\n\n from gutenberg.query import list_supported_metadatas\n\n print(list_supported_metadatas()) # prints (u'author', u'formaturi', u'language', ...)\n\nBefore you use one of the :code:`gutenberg.query` functions you must populate the\nlocal metadata cache. This one-off process will take quite a while to complete\n(18 hours on my machine) but once it is done, any subsequent calls to\n:code:`get_etexts` or :code:`get_metadata` will be *very* fast. If you fail to populate the\ncache, the calls will raise an exception.\n\nTo populate the cache:\n\n.. sourcecode :: python\n\n from gutenberg.acquire import get_metadata_cache\n cache = get_metadata_cache()\n cache.populate()\n\n\nIf you need more fine-grained control over the cache (e.g. where it's stored or\nwhich backend is used), you can use the :code:`set_metadata_cache` function to switch\nout the backend of the cache before you populate it. For example, to use the\nSqlite cache backend instead of the default Sleepycat backend and store the\ncache at a custom location, you'd do the following:\n\n.. sourcecode :: python\n\n from gutenberg.acquire import set_metadata_cache\n from gutenberg.acquire.metadata import SqliteMetadataCache\n\n cache = SqliteMetadataCache('/my/custom/location/cache.sqlite')\n cache.populate()\n set_metadata_cache(cache)\n\n\nLimitations\n===========\n\nThis project *deliberately* does not include any natural language processing\nfunctionality. Consuming and processing the text is the responsibility of the\nclient; this library merely focuses on offering a simple and easy to use\ninterface to the works in the Project Gutenberg corpus. Any linguistic\nprocessing can easily be done client-side e.g. using the `TextBlob\n`_ library.", "description_content_type": "", "docs_url": null, "download_url": "https://pypi.python.org/pypi/Gutenberg", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/c-w/Gutenberg", "keywords": "", "license": "Apache Software License", "maintainer": "", "maintainer_email": "", "name": "Gutenberg", "package_url": "https://pypi.org/project/Gutenberg/", "platform": "", "project_url": "https://pypi.org/project/Gutenberg/", "project_urls": { "Download": "https://pypi.python.org/pypi/Gutenberg", "Homepage": "https://github.com/c-w/Gutenberg" }, "release_url": "https://pypi.org/project/Gutenberg/0.8.0/", "requires_dist": null, "requires_python": ">=2.7.*,!=3.0.*,!=3.1.*,!=3.2.*,!=3.3.*,!=3.4.*", "summary": "Library to interface with Project Gutenberg", "version": "0.8.0" }, "last_serial": 5725563, "releases": { "0.0.0": [], "0.1.0": [ { "comment_text": "", "digests": { "md5": "4ca8af323220626ab3eb95791ee2ee50", "sha256": "0655e466766a51ee07cfb9e82c438427f94a8feecd134b1c27eb9e946101c2d7" }, "downloads": -1, "filename": "Gutenberg-0.1.0.tar.gz", "has_sig": false, "md5_digest": "4ca8af323220626ab3eb95791ee2ee50", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 20831, "upload_time": "2014-08-03T05:52:18", "url": "https://files.pythonhosted.org/packages/b5/c9/45e865b52f79ac6d22365fb7f12b7e3d18859fbd01addd78036272446630/Gutenberg-0.1.0.tar.gz" } ], "0.1.1": [ { "comment_text": "", "digests": { "md5": "76697c27c6ac68d5a5e3b6a0e49d887c", "sha256": "58bcc83d909d1d1d90ca71c531b259f2240e7d44c7bf69156567b0c4285f46ff" }, "downloads": -1, "filename": "Gutenberg-0.1.1.tar.gz", "has_sig": false, "md5_digest": "76697c27c6ac68d5a5e3b6a0e49d887c", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 25035, "upload_time": "2014-08-03T16:35:08", "url": "https://files.pythonhosted.org/packages/23/3f/c9da6c9244f18caa1a431851610bee08ae7d682d6912849f94d192ba4daa/Gutenberg-0.1.1.tar.gz" } ], "0.2.0": [ { "comment_text": "", "digests": { "md5": "98d554fb03fbd4a883859d9b6c107d06", "sha256": "ac6e7bce651be7f41278f1f9772fc7dfb9a17b5e4a09f565c4d8d80fd3ae702b" }, "downloads": -1, "filename": "Gutenberg-0.2.0.tar.gz", "has_sig": false, "md5_digest": "98d554fb03fbd4a883859d9b6c107d06", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 15761, "upload_time": "2014-09-29T07:04:38", "url": "https://files.pythonhosted.org/packages/07/87/db25ab2b95573b4c56786449a7b418d262ba06b3c728a58ab63d4ebd495f/Gutenberg-0.2.0.tar.gz" } ], "0.2.1": [ { "comment_text": "", "digests": { "md5": "7e22ca98cb14b7a6b3f58aaece617046", "sha256": "2e5dd76fba71f9f13b0c60a5c2cf4e162c62724285a0f1374abb2467f2c1c224" }, "downloads": -1, "filename": "Gutenberg-0.2.1.tar.gz", "has_sig": false, "md5_digest": "7e22ca98cb14b7a6b3f58aaece617046", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 15724, "upload_time": "2014-11-18T22:33:13", "url": "https://files.pythonhosted.org/packages/ef/0a/79659f52ec0dd05818a3754bef7c9b4ddc8e8258189c43e8c54a74f76668/Gutenberg-0.2.1.tar.gz" } ], "0.2.2": [ { "comment_text": "", "digests": { "md5": "ddc3064d7fa51c41a1e1e4dcb9f1a2fa", "sha256": "dca85f33f03ae0822ab2c37ce8ff44ffda6c7a0d94adc2032941a074101e3dc9" }, "downloads": -1, "filename": "Gutenberg-0.2.2.tar.gz", "has_sig": false, "md5_digest": "ddc3064d7fa51c41a1e1e4dcb9f1a2fa", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 16594, "upload_time": "2015-01-03T13:14:02", "url": "https://files.pythonhosted.org/packages/28/e1/fe593b5a2c7a6aa03a60bf98633ef9d998e9f2dd7e1fa2271126a41c0d04/Gutenberg-0.2.2.tar.gz" } ], "0.3": [], "0.3.1": [ { "comment_text": "", "digests": { "md5": "22ac02a95da346a1a62c234cad5f4664", "sha256": "b1c992180ce15f5fb4871499ed0c65fcb34e48297cacb18539832accb9c7c122" }, "downloads": -1, "filename": "Gutenberg-0.3.1.tar.gz", "has_sig": false, "md5_digest": "22ac02a95da346a1a62c234cad5f4664", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 10248, "upload_time": "2015-02-28T01:32:16", "url": "https://files.pythonhosted.org/packages/c3/66/4a8b2ad5971b5dacd60447a50a68307283c8679f0d2d9011455252841d59/Gutenberg-0.3.1.tar.gz" } ], "0.3.2": [ { "comment_text": "", "digests": { "md5": "2c6540213f3f0162baa27530c1e975b4", "sha256": "09675c5110e18a2ccbe5871de8d08901d2165e9e5428699f81933d64f050812f" }, "downloads": -1, "filename": "Gutenberg-0.3.2.tar.gz", "has_sig": false, "md5_digest": "2c6540213f3f0162baa27530c1e975b4", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 10256, "upload_time": "2015-02-28T01:34:32", "url": "https://files.pythonhosted.org/packages/c0/27/9f817a81defb64c5b1d766394cf1e4bf90231f467d9e95895a1f7f8ae13a/Gutenberg-0.3.2.tar.gz" } ], "0.3.3": [ { "comment_text": "", "digests": { "md5": "6cf431cb3ca69c00c549159c97ec62c2", "sha256": "0b47ccc03235a1df0005e269e5d69b9897e0ce0e59ab596d2e18c9ddceddaf10" }, "downloads": -1, "filename": "Gutenberg-0.3.3.tar.gz", "has_sig": false, "md5_digest": "6cf431cb3ca69c00c549159c97ec62c2", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 14289, "upload_time": "2015-02-28T01:37:11", "url": "https://files.pythonhosted.org/packages/b6/50/8e2f99c5133d6518e50034354e44a59735d9a049d2d178f99b3241603429/Gutenberg-0.3.3.tar.gz" } ], "0.4.0": [ { "comment_text": "", "digests": { "md5": "0e359008237d5f2b6306289aba8f0d49", "sha256": "990e1b2c2a7e7ebb1c9f29a04f4a9449130fd126793039a005f1f1c23830118c" }, "downloads": -1, "filename": "Gutenberg-0.4.0.tar.gz", "has_sig": false, "md5_digest": "0e359008237d5f2b6306289aba8f0d49", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 14392, "upload_time": "2015-03-11T23:35:25", "url": "https://files.pythonhosted.org/packages/22/1d/748812d70f05cb07f31f7e8c533fc5f17ebeeab78003e22e21110e99569c/Gutenberg-0.4.0.tar.gz" } ], "0.4.1": [ { "comment_text": "", "digests": { "md5": "1fc39f6e550917dbd18af799cde5315b", "sha256": "286525e599303c09a803616426647229bef65bca0cfe94e222b751ccbb591230" }, "downloads": -1, "filename": "Gutenberg-0.4.1.tar.gz", "has_sig": false, "md5_digest": "1fc39f6e550917dbd18af799cde5315b", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 9933, "upload_time": "2015-12-01T03:11:13", "url": "https://files.pythonhosted.org/packages/6e/52/9d00f84986e30f39d3926b9e58ca418431f0497ace818b30b1bf8b6bb3c5/Gutenberg-0.4.1.tar.gz" } ], "0.4.2": [ { "comment_text": "", "digests": { "md5": "1ca983181424476f023d120f2896049a", "sha256": "b3601a2fa2d32df76cfefd4851eca622747212edc2ed2646876bfd3626fc559b" }, "downloads": -1, "filename": "Gutenberg-0.4.2.tar.gz", "has_sig": false, "md5_digest": "1ca983181424476f023d120f2896049a", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 10805, "upload_time": "2016-01-09T17:42:49", "url": "https://files.pythonhosted.org/packages/f2/5b/759ff8d1732f8e2d5334a0918b256fbef9f66530cab1f0e470551b842f5d/Gutenberg-0.4.2.tar.gz" } ], "0.4.4": [ { "comment_text": "", "digests": { "md5": "a3e75e1cb01ef53d4919e28601159402", "sha256": "0ad7f200ce14017849e7912d76842dc41bfcc4425413736f54a3463019399390" }, "downloads": -1, "filename": "Gutenberg-0.4.4.tar.gz", "has_sig": false, "md5_digest": "a3e75e1cb01ef53d4919e28601159402", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 14556, "upload_time": "2017-02-19T07:51:28", "url": "https://files.pythonhosted.org/packages/8a/66/4924f00056d0377e2b73190ba28e64328ecfb2a210a1808681c36ac4a120/Gutenberg-0.4.4.tar.gz" } ], "0.4.5": [ { "comment_text": "", "digests": { "md5": "22b3a6c53fd71bd817373a11871f1ece", "sha256": "8065df2fcf6bfdd5686b68cb31664be9f1639b29da0a5e3596da8358d7f39caf" }, "downloads": -1, "filename": "Gutenberg-0.4.5-py3.6.egg", "has_sig": false, "md5_digest": "22b3a6c53fd71bd817373a11871f1ece", "packagetype": "bdist_egg", "python_version": "3.6", "requires_python": null, "size": 44823, "upload_time": "2018-01-11T00:44:10", "url": "https://files.pythonhosted.org/packages/a7/98/0a5ad3e75fc099759ff2737db08d8e7ec9f0142e26fd4781c4b4331a423b/Gutenberg-0.4.5-py3.6.egg" }, { "comment_text": "", "digests": { "md5": "0ec2391bdd14e2d7e22537361639c353", "sha256": "08feb04f77d46bff52732d0551a678f2c16151ab94cfa3148c9da1a6680d9c85" }, "downloads": -1, "filename": "Gutenberg-0.4.5.tar.gz", "has_sig": false, "md5_digest": "0ec2391bdd14e2d7e22537361639c353", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 14548, "upload_time": "2017-02-19T07:53:55", "url": "https://files.pythonhosted.org/packages/f8/4c/97c1854f619c21b6a9d6b0ab0e3502e0aa82d827448cb60252333b52c76a/Gutenberg-0.4.5.tar.gz" } ], "0.5.0": [ { "comment_text": "", "digests": { "md5": "09e9df79671d3af07d7fcd8a1e8198e5", "sha256": "9d6dab6b57a79033f4a63145aabe5ffe01a542ea6e82e643f4f9f8dc776cc709" }, "downloads": -1, "filename": "Gutenberg-0.5.0.tar.gz", "has_sig": false, "md5_digest": "09e9df79671d3af07d7fcd8a1e8198e5", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 16841, "upload_time": "2017-04-28T04:24:07", "url": "https://files.pythonhosted.org/packages/08/ca/94eee4b5ffef01f973c31fe66c9e3c0f1b3bd786a8a67f38c8c113b281d4/Gutenberg-0.5.0.tar.gz" } ], "0.6.1": [ { "comment_text": "", "digests": { "md5": "1c66bd7d56baf0f515926d1e7ad98612", "sha256": "351256dc33319a2b371fa68ec9019be409823cde5f3fd0b318ea50ccfc41c6d8" }, "downloads": -1, "filename": "Gutenberg-0.6.1.tar.gz", "has_sig": false, "md5_digest": "1c66bd7d56baf0f515926d1e7ad98612", "packagetype": "sdist", "python_version": "source", "requires_python": ">=2.7.*,!=3.0.*,!=3.1.*,!=3.2.*,!=3.3.*", "size": 18085, "upload_time": "2018-01-11T01:13:29", "url": "https://files.pythonhosted.org/packages/f0/1e/46097d6f4240533c6d8a98161da0ebfa7b03fd8e61fed50866ed7686df2e/Gutenberg-0.6.1.tar.gz" } ], "0.7.0": [ { "comment_text": "", "digests": { "md5": "f995b1f646a6fb30932eb82a0373f632", "sha256": "f033d038ca77a9fa367d4b655be89280758b039bbd80fdcc433ef2462d7b52db" }, "downloads": -1, "filename": "Gutenberg-0.7.0.tar.gz", "has_sig": false, "md5_digest": "f995b1f646a6fb30932eb82a0373f632", "packagetype": "sdist", "python_version": "source", "requires_python": ">=2.7.*,!=3.0.*,!=3.1.*,!=3.2.*,!=3.3.*", "size": 18474, "upload_time": "2018-05-18T16:32:00", "url": "https://files.pythonhosted.org/packages/14/b1/6e99867c38e70d46366966a0a861c580377f38312cf9dbad38b82ed1823d/Gutenberg-0.7.0.tar.gz" } ], "0.8.0": [ { "comment_text": "", "digests": { "md5": "a1a080a53398dc22422a915f33286b64", "sha256": "a6b5f9ec3df9ab9362653942cf87efc7c27ce16fd6b46224cfd36642b18ecd4b" }, "downloads": -1, "filename": "Gutenberg-0.8.0.tar.gz", "has_sig": false, "md5_digest": "a1a080a53398dc22422a915f33286b64", "packagetype": "sdist", "python_version": "source", "requires_python": ">=2.7.*,!=3.0.*,!=3.1.*,!=3.2.*,!=3.3.*,!=3.4.*", "size": 20117, "upload_time": "2019-08-24T23:13:21", "url": "https://files.pythonhosted.org/packages/73/55/1f3065df8299ebaf5df50ca70da885bf0c9f7e358a3b4166b4aeb9d7ced7/Gutenberg-0.8.0.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "a1a080a53398dc22422a915f33286b64", "sha256": "a6b5f9ec3df9ab9362653942cf87efc7c27ce16fd6b46224cfd36642b18ecd4b" }, "downloads": -1, "filename": "Gutenberg-0.8.0.tar.gz", "has_sig": false, "md5_digest": "a1a080a53398dc22422a915f33286b64", "packagetype": "sdist", "python_version": "source", "requires_python": ">=2.7.*,!=3.0.*,!=3.1.*,!=3.2.*,!=3.3.*,!=3.4.*", "size": 20117, "upload_time": "2019-08-24T23:13:21", "url": "https://files.pythonhosted.org/packages/73/55/1f3065df8299ebaf5df50ca70da885bf0c9f7e358a3b4166b4aeb9d7ced7/Gutenberg-0.8.0.tar.gz" } ] }