{ "info": { "author": "Radu Angelescu", "author_email": "raduangelescu+pypi@gmail.com", "bugtrack_url": null, "classifiers": [], "description": "*********\nGutenbergPy\n*********\n\n\nOverview\n========\n\n.. image:: https://github.com/raduangelescu/gutenbergpy/blob/master/dblogos.png\n :alt: MONGODB\n :align: center\n :width: 100%\n \nThis package makes filtering and getting information from `Project\nGutenberg `_ easier from python.\n\nIt's target audience is machine learning guys that need data for their project,\nbut may be freely used by anybody.\n\nThe package:\n\n- Generates a local cache (of all gutenberg informations) that you can interogate to get book ids. The Local cache may be sqlite (default) or mongodb (for wich you need to have installed the pymongodb packet)\n\n- Downloads and cleans raw text from gutenberg books\n\n\nThe package has been tested with Python 2.7 on both Windows and Linux\nIt is faster, smaller and less third-party intensive alternative to https://github.com/c-w/Gutenberg \n\nAbout development:\nhttp://www.raduangelescu.com/gutenbergpy.html\n\nInstallation\n============\n\n\n.. sourcecode :: sh\n\n pip install gutenbergpy\n\nor just install it from source (it's all just python code)\n\n.. sourcecode :: sh\n\n git clone https://github.com/raduangelescu/gutenbergpy\n python setup.py install\n \nUsage\n=====\n\nDownloading a text\n------------------\n\n.. sourcecode :: python\n\n import gutenbergpy.textget\n #this gets a book by its gutenberg id\n raw_book = gutenbergpy.textget.get_text_by_id(1000)\n print raw_book\n #this strips the headers from the book\n clean_book = gutenbergpy.textget.strip_headers(raw_book)\n print clean_book\n\nQuery the cache\n--------------------\nTo do this you first need to create the cache (this is a one time thing per os, until you decide to redo it)\n\n.. sourcecode :: python\n\n from gutenbergpy.gutenbergcache import GutenbergCache\n #for sqlite\n GutenbergCache.create()\n #for mongodb\n GutenbergCache.create(type=GutenbergCacheTypes.CACHE_TYPE_MONGODB)\n \nfor debugging/better control you have these boolean options on create\n\n - *refresh* deletes the old cache\n - *download* property downloads the rdf file from the gutenberg project\n - *unpack* unpacks it\n - *parse* parses it in memory\n - *cache* writes the cache\n\n.. sourcecode :: python\n \n GutenbergCache.create(refresh=True, download=True, unpack=True, parse=True, cache=True, deleteTemp=True)\n\nfor even better control you may set the GutenbergCacheSettings\n - *CacheFilename*\n - *CacheUnpackDir*\n - *CacheArchiveName*\n - *ProgressBarMaxLength*\n - *CacheRDFDownloadLink*\n - *TextFilesCacheFolder*\n - *MongoDBCacheServer*\n.. sourcecode :: python\n\n GutenbergCacheSettings.set( CacheFilename=\"\", CacheUnpackDir=\"\", \n CacheArchiveName=\"\", ProgressBarMaxLength=\"\", CacheRDFDownloadLink=\"\", TextFilesCacheFolder=\"\", MongoDBCacheServer=\"\")\n\nAfter doing a create you need to wait, it will be over in about 5 minutes depending on your internet speed and computer power\n(On a i7 with gigabit connection and ssd it finishes in about 1 minute)\n\nGet the cache\n\n.. sourcecode :: python\n #for mongodb\n cache = GutenbergCache.get_cache(GutenbergCacheTypes.CACHE_TYPE_MONGODB)\n #for sqlite\n cache = GutenbergCache.get_cache()\n\nNow you can do queries\n\nGet the book Gutenberg unique indices by using this query function\n\nStandard query fields:\n - languages\n - authors \n - types \n - titles \n - subjects \n - publishers \n - bookshelves \n - downloadtype\n \n.. sourcecode :: python\n\n print cache.query(downloadtype=['application/plain','text/plain','text/html; charset=utf-8'])\n\nOr do a native query on the sqlite database\n\n.. sourcecode :: python\n #python\n cache.native_query(\"SELECT * FROM books\")\n #mongodb\n cache.native_query({type:'Text'}}\n \nFor SQLITE custom queries take a look at the SQLITE database scheme:\n\n.. image:: https://github.com/raduangelescu/gutenbergpy/blob/master/sqlitecheme.png\n :alt: SQLITE database scheme\n :width: 100%\n :align: center\n \nFor MongoDB queries you have all the books collection. Each book with the following fields:\n\n - book(publisher, rights, language, book_shelf, gutenberg_book_id, date_issued, num_downloads, titles, subjects, authors, files ,type)", "description_content_type": null, "docs_url": null, "download_url": "http://pypi.python.org/pypi/GutenbergPy", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/raduangelescu/gutenbergpy", "keywords": null, "license": "LICENSE.txt", "maintainer": null, "maintainer_email": null, "name": "GutenbergPy", "package_url": "https://pypi.org/project/GutenbergPy/", "platform": "UNKNOWN", "project_url": "https://pypi.org/project/GutenbergPy/", "project_urls": { "Download": "http://pypi.python.org/pypi/GutenbergPy", "Homepage": "https://github.com/raduangelescu/gutenbergpy" }, "release_url": "https://pypi.org/project/GutenbergPy/0.2.0/", "requires_dist": null, "requires_python": null, "summary": "Library to create and interogate local cache for Project Gutenberg", "version": "0.2.0" }, "last_serial": 2672428, "releases": { "0.1.6": [ { "comment_text": "", "digests": { "md5": "12938ba941ccc271883b95c267daab4d", "sha256": "8bedd694de84273ef820145f5cdc753c23234f8b4f409c5b4e7f8ae4ba1f46b7" }, "downloads": -1, "filename": "GutenbergPy-0.1.6.zip", "has_sig": false, "md5_digest": "12938ba941ccc271883b95c267daab4d", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 24563, "upload_time": "2017-02-19T21:44:36", "url": "https://files.pythonhosted.org/packages/cb/00/d9f98158b7b4f51dab0cb7454f57b27880da25cce70ff6ef8ad87ad83a51/GutenbergPy-0.1.6.zip" } ], "0.1.7": [ { "comment_text": "", "digests": { "md5": "3d9cce1d0edff3433ae09b8c26cf4a26", "sha256": "85ff12a00c1f50efd8e889298a4166196e0419f81f572a592b8db7a30c40328a" }, "downloads": -1, "filename": "GutenbergPy-0.1.7.zip", "has_sig": false, "md5_digest": "3d9cce1d0edff3433ae09b8c26cf4a26", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 25301, "upload_time": "2017-02-26T05:48:02", "url": "https://files.pythonhosted.org/packages/33/f5/b5728ba15a0855c658c2f464f42f620f69029373ed8600028b219cf0b6a6/GutenbergPy-0.1.7.zip" } ], "0.2.0": [ { "comment_text": "", "digests": { "md5": "39391fc9701afcf9ab5adfa8c4915b76", "sha256": "e668263bf34e5fb8a635bcd4ab892fd65bd8a81a7da735638ecbb8b99078edc6" }, "downloads": -1, "filename": "GutenbergPy-0.2.0.zip", "has_sig": false, "md5_digest": "39391fc9701afcf9ab5adfa8c4915b76", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 25404, "upload_time": "2017-02-28T05:34:59", "url": "https://files.pythonhosted.org/packages/61/06/5a5b69ba76d8c143cf6c7411b447f86cb20a5dc2515d07059acd3ce25061/GutenbergPy-0.2.0.zip" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "39391fc9701afcf9ab5adfa8c4915b76", "sha256": "e668263bf34e5fb8a635bcd4ab892fd65bd8a81a7da735638ecbb8b99078edc6" }, "downloads": -1, "filename": "GutenbergPy-0.2.0.zip", "has_sig": false, "md5_digest": "39391fc9701afcf9ab5adfa8c4915b76", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 25404, "upload_time": "2017-02-28T05:34:59", "url": "https://files.pythonhosted.org/packages/61/06/5a5b69ba76d8c143cf6c7411b447f86cb20a5dc2515d07059acd3ce25061/GutenbergPy-0.2.0.zip" } ] }