{ "info": { "author": "Felipe Ochoa", "author_email": "felipeochoa@github.com", "bugtrack_url": null, "classifiers": [ "Development Status :: 4 - Beta", "Intended Audience :: Developers", "License :: OSI Approved :: MIT License", "Operating System :: OS Independent", "Programming Language :: Python :: 3.5", "Programming Language :: Python :: 3.6", "Programming Language :: Python :: 3 :: Only" ], "description": "minecart: A Pythonic interface to PDF documents\n===============================================\n\n|Travis CI build status (Linux)| |Coverage Status| |Req-status|\n\n``minecart`` is a Python package that simplifies the extraction of text,\nimages, and shapes from a PDF document. It provides a very Pythonic\ninterface to extract positioning, color, and font metadata for all of\nthe objects in the PDF. It is a pure-Python package (it depends on\n|pdfminer|_ for the low-level parsing). ``minecart`` takes\ninspiration from Tim McNamara\u2019s |slate|_, but aims to provide more\ndetailed information:\n\n.. code:: python\n\n >>> pdffile = open('example.pdf', 'rb')\n >>> doc = minecart.Document(pdffile)\n >>> page = doc.get_page(3)\n >>> for shape in page.shapes.iter_in_bbox((0, 0, 100, 200)):\n ... print shape.path, shape.fill.color.as_rgb()\n >>> im = page.images[0].as_pil() # requires pillow\n >>> im.show()\n\nInstallation\n------------\n\nAs of version ``0.3.0``, only Python 3 is support, using |pdfminer3k|.\n\n1. The easy way: ``pip install minecart``\n2. The hard way: download the source code, change into the working\n directory, and run ``python setup.py install``\n\n**For CJK languages**: Supporting the CJK languages requires an\naddtional step, as detailed_ in |pdfminer|.\n\nFeatures\n--------\n\n- **Shapes**: You can extract path information, bounding box, stroke\n parameters, and stroke/fill colors. Color support is fairly robust,\n allowing the simple ``.as_rgb()`` in most cases. (To be concrete,\n ``minecart`` supports the ``DeviceRGB``, ``DeviceCMYK``,\n ``DeviceGray``, and ``CIE-based`` color spaces. ``Indexed`` colors\n are supported if they index into one of the above.)\n- **Images**: ``minecart`` can easily extract images to ``PIL.Image``\n objects.\n- **Text**: (Called ``Lettering`` in the source) In addition to\n extracting plain text from the PDF, you can access the\n position/bounding box information and the font used.\n\nIf there\u2019s a feature you\u2019d like to extract from a PDF that\u2019s not\ncurrently supported, open up an issue or submit a pull request! I\u2019m\nespecially interested in hearing whether there are many PDFs using color\nspaces outside of the ones currently supported.\n\nDocumentation\n-------------\n\nThe main entry point will always be ``minecart.Document``, which accepts\na single parameter, an open file-like object which will be read to\ncreate the document. The ``Document`` has two primary methods for\naccessing its contents: ``.get_page(num)`` and ``.iter_pages()``. Both\nmethods return ``minecart.Page`` objects, which provide access to the\ngraphical elements found on the page. ``Page`` objects have three main\nattributes:\n\n- ``.images``: A list of all the ``minecart.Image`` objects found on\n the page.\n\n- ``.letterings``: A list of all the text objects found on the page, as\n ``Lettering`` objects. ``Lettering`` is a ``unicode`` subclass which\n adds bounding box and font information (using ``.get_bbox()`` or\n ``.font``).\n\n- ``.shapes``: A list of all the squares, circles, lines, etc. found on\n the page as ``Shape`` objects. ``Shape`` objects have three main\n attributes of interest:\n\n - ``.stroke``: An object containing the stroke parameters used to\n draw the shape. ``.stroke`` has ``.color``, ``.linewidth``,\n ``.linecap``, ``.linejoin``, ``.miterlimit``, and ``.dash``\n attributes. If the shape was not stroked, ``.stroke`` will be\n ``None``.\n\n - ``.fill``: An object containing the fill parameters used to draw\n the shape. Right now, ``.fill`` only has a ``.color``\\ parameter.\n\n - ``.path``: A list with the coordinates used to defined the shape,\n as well as the type of line segment each set of coordinates\n defines. Refer to the ``minecart.Shape`` documentation for more\n details\n\n**Note on color**: The PDF spec spends a fair amount of time dealing\nwith color specifications, defining color spaces, and transforms and\nthe like. ``minecart``'s approach is to simplify things down with sensible\ndefaults, so that every color has an ``.as_rgb()`` method, which returns\na 3-tuple with component values between 0 (black) and 1 (white). If you\nare interested in extracting colorspace families and parameters, you can\ndo that too, though!\n\nWe try to keep docstrings complete and up to date, so you can read\nthrough the source or use ``dir`` and ``help`` to see what methods are\navailable.\n\nSupport\n-------\n\nIf you are having trouble working with ``minecart``, feel free to create\na new issue.\n\nContributing\n------------\n\nBug reports are always welcome (using the GitHub tracker) as are feature\nrequests. The PDF spec has so many corners, it is hard to\nprioritize which features to implement. If there\u2019s\nsomething you\u2019d like to extract from a document but isn\u2019t currently\nsupported, please `create a new issue`_.\n\nIf you\u2019d like to contribute code, you can either create an issue and\ninclude a patch (if the changes are small) or fork the project and\ncreate a pull request.\n\nLicense\n-------\n\nThis project is licensed under the MIT license.\n\n.. _create a new issue: https://github.com/felipeochoa/minecart/issues/new\n.. _pdfminer: https://github.com/euske/pdfminer\n.. _slate: https://github.com/timClicks/slate\n.. _pdfminer3k: https://github.com/jaepil/pdfminer3k\n.. _detailed: https://github.com/euske/pdfminer#for-cjk-languages\n.. |Travis CI build status (Linux)| image:: https://travis-ci.org/felipeochoa/minecart.svg?branch=master\n :target: https://travis-ci.org/felipeochoa/minecart\n.. |Coverage Status| image:: https://coveralls.io/repos/felipeochoa/minecart/badge.svg\n :target: https://coveralls.io/r/felipeochoa/minecart\n.. |Req-status| image:: https://requires.io/github/felipeochoa/minecart/requirements.svg?branch=master\n :target: https://requires.io/github/felipeochoa/minecart/requirements/?branch=master\n.. |pdfminer| replace:: ``pdfminer``\n.. |slate| replace:: ``slate``\n.. |pdfminer3k| replace:: ``pdfminer3k``\n.. |contact email| replace:: minecart@googlegroups.com\n\n\n", "description_content_type": null, "docs_url": null, "download_url": "https://github.com/felipeochoa/minecart/tarball/0.3.0", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/felipeochoa/minecart", "keywords": "pdf pdfminer extract mining images", "license": "MIT", "maintainer": "", "maintainer_email": "", "name": "minecart", "package_url": "https://pypi.org/project/minecart/", "platform": "", "project_url": "https://pypi.org/project/minecart/", "project_urls": { "Download": "https://github.com/felipeochoa/minecart/tarball/0.3.0", "Homepage": "https://github.com/felipeochoa/minecart" }, "release_url": "https://pypi.org/project/minecart/0.3.0/", "requires_dist": [ "pdfminer3k", "six", "Pillow; extra == 'PIL'" ], "requires_python": "", "summary": "Simple, Pythonic extraction of images, text, and shapes from PDFs", "version": "0.3.0" }, "last_serial": 3606534, "releases": { "0.2": [ { "comment_text": "", "digests": { "md5": "ef7cf49531df9cee5e6e1c19bd10501c", "sha256": "872ed0e0d91e148ef3d8f2a4a625f78a7f4afa97e7b70d794986152a070c5f0c" }, "downloads": -1, "filename": "minecart-0.2-py2-none-any.whl", "has_sig": false, "md5_digest": "ef7cf49531df9cee5e6e1c19bd10501c", "packagetype": "bdist_wheel", "python_version": "py2", "requires_python": null, "size": 21937, "upload_time": "2015-06-16T21:52:32", "url": "https://files.pythonhosted.org/packages/43/d7/0ae2e3f70665f7cda42cbe61126d68b7d577bd0036589fef8aeee071c25c/minecart-0.2-py2-none-any.whl" }, { "comment_text": "", "digests": { "md5": "c331dd1ee19e1cdbfc77a6d268d9418c", "sha256": "b38331bb3a25551e6c56d1b31875a1b61c6bff775f80bf0e88a5770fd94d93d1" }, "downloads": -1, "filename": "minecart-0.2.zip", "has_sig": false, "md5_digest": "c331dd1ee19e1cdbfc77a6d268d9418c", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 24796, "upload_time": "2015-06-16T21:52:35", "url": "https://files.pythonhosted.org/packages/62/fb/14e8cfb8455db1d902276cce5e1984d45d474628cfb8a70f94fa1bc762fb/minecart-0.2.zip" } ], "0.2.1": [ { "comment_text": "", "digests": { "md5": "7e9b08016ba0fcec2c30bb17617136f5", "sha256": "eda78bb6dabf7e172fe538b39c43f6aa4bb308ea33760eef0cab34f98b0a1cbc" }, "downloads": -1, "filename": "minecart-0.2.1.zip", "has_sig": false, "md5_digest": "7e9b08016ba0fcec2c30bb17617136f5", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 25884, "upload_time": "2015-12-03T05:13:00", "url": "https://files.pythonhosted.org/packages/7a/f7/aa85c79eca9c6f01bcae7847f97af3e35419b31a5db4ed6c35325bed5e22/minecart-0.2.1.zip" } ], "0.3.0": [ { "comment_text": "", "digests": { "md5": "90846201ce234c3d2005e2b905a02725", "sha256": "dc3a257bc8ad7998ec4ce52f3039b91c7bd5f220ba3e9eaed915089a356a6a7c" }, "downloads": -1, "filename": "minecart-0.3.0-py3-none-any.whl", "has_sig": false, "md5_digest": "90846201ce234c3d2005e2b905a02725", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 23013, "upload_time": "2018-02-22T19:58:35", "url": "https://files.pythonhosted.org/packages/14/a9/6a59d0902138e33e17974bfbc377089364bcf4e3fa0f6d0f8d0578c7326f/minecart-0.3.0-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "b1486efc4a7216e8b4dee6a1964e4672", "sha256": "980e2f85af868d865580ea70b79ead9a97cd76c9fa8b7334024dd6241685381f" }, "downloads": -1, "filename": "minecart-0.3.0.tar.gz", "has_sig": false, "md5_digest": "b1486efc4a7216e8b4dee6a1964e4672", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 18188, "upload_time": "2018-02-22T19:58:36", "url": "https://files.pythonhosted.org/packages/60/0e/59ca7b33a812d652fdd0faadbc701f9dd04ec1faaed29ea0b80ed98604e2/minecart-0.3.0.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "90846201ce234c3d2005e2b905a02725", "sha256": "dc3a257bc8ad7998ec4ce52f3039b91c7bd5f220ba3e9eaed915089a356a6a7c" }, "downloads": -1, "filename": "minecart-0.3.0-py3-none-any.whl", "has_sig": false, "md5_digest": "90846201ce234c3d2005e2b905a02725", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 23013, "upload_time": "2018-02-22T19:58:35", "url": "https://files.pythonhosted.org/packages/14/a9/6a59d0902138e33e17974bfbc377089364bcf4e3fa0f6d0f8d0578c7326f/minecart-0.3.0-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "b1486efc4a7216e8b4dee6a1964e4672", "sha256": "980e2f85af868d865580ea70b79ead9a97cd76c9fa8b7334024dd6241685381f" }, "downloads": -1, "filename": "minecart-0.3.0.tar.gz", "has_sig": false, "md5_digest": "b1486efc4a7216e8b4dee6a1964e4672", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 18188, "upload_time": "2018-02-22T19:58:36", "url": "https://files.pythonhosted.org/packages/60/0e/59ca7b33a812d652fdd0faadbc701f9dd04ec1faaed29ea0b80ed98604e2/minecart-0.3.0.tar.gz" } ] }