{
    "info": {
        "author": "Andrew Marks",
        "author_email": "ajmarks@gmail.com",
        "bugtrack_url": null,
        "classifiers": [
            "Development Status :: 3 - Alpha",
            "Intended Audience :: Developers",
            "License :: OSI Approved :: MIT License",
            "Natural Language :: English",
            "Operating System :: OS Independent",
            "Programming Language :: Python :: 3.3",
            "Programming Language :: Python :: 3.4",
            "Programming Language :: Python :: 3.5",
            "Programming Language :: Python :: 3 :: Only",
            "Topic :: Text Processing",
            "Topic :: Utilities"
        ],
        "description": "Gymnast: It's not Acrobat\r\n----------------------------\r\n\r\n|GitHub license| |Code Issues|\r\n\r\nPDF parser written in Python 3 (backport to 2.7 in the works). This was\r\ndesigned to provide a Pythonic interface to access (and, eventually,\r\nwrite) Adobe PDF files. Some of attributes have non-Pythonic\r\ncapitalization, but that is to match the underlying structure of the PDF\r\ndocument (doing otherwise would get very confusing).\r\n\r\nUsage\r\n-----\r\n\r\n.. code:: python\r\n\r\n    import io\r\n    from gymnast          import PdfDocument\r\n    from gymnast.renderer import PdfBaseRenderer\r\n\r\n    class PdfSimpleRenderer(PdfBaseRenderer):\r\n        \"\"\"Simple renderer example that just extracts text with no processing\"\"\"\r\n        def __init__(self, page):\r\n            super().__init__(page)\r\n            self._text = io.StringIO()\r\n        def _render_text(self, text, new_state):\r\n            self._text.write(self.active_font.decode_string(text))\r\n        def _return(self):\r\n            return self._text.getvalue()\r\n\r\n    fname = '/path/to/file.pdf'\r\n    pdf   = PdfDocument(fname).parse()\r\n    text  = SimpleRenderer(pdf.Pages[-3]).render()\r\n\r\nTODO (in no particular order)\r\n-----------------------------\r\n\r\n-  **Features and functionality**\r\n-  [x] Rewrite the parser and document class to lazy-load the document\r\n   based on the xrefs table\r\n-  [x] Complete the base page renderer\r\n-  [ ] Page Rendering\r\n\r\n   -  [x] Getting the ``BaseRenderer`` class working\r\n   -  [x] Implement a proof of concept extractor that just dumps strings\r\n   -  [ ] Get a bit fancier, assigning textblocks to lines and such\r\n\r\n-  [ ] Handle page numbering more fully\r\n\r\n   -  [ ] Add a method to ``PdfDocument`` to get a page by number\r\n   -  [ ] Add propreties to ``PdfPage`` for the page number (both as an\r\n      ``int`` and a formatted ``str`` according to\r\n      ``PdfDocument.Root.PageLabels['Nums']``)\r\n\r\n-  [ ] Backport to Python 2.7 (about 80% done or so)\r\n-  [ ] Font stuff\r\n\r\n   -  [x] Carve the ``PdfFont`` class into an abstract ``PdfBaseFont``\r\n      and a ``PdfType1Font`` implementation\r\n   -  [x] ``PdfFont.__new__`` will pick the correct subclass based on\r\n      the font's Subtype element\r\n   -  [x] PdfBasefFont class will also have an abstract method for the\r\n      glyph space to text space transformation\r\n   -  [ ] Add subcless for Type3 fonts\r\n   -  [x] Add subcless for TrueType fonts\r\n   -  [ ] Add subcless for composite fonts\r\n   -  [x] Add legacy support for the 14 standard fonts\r\n   -  [ ] Font-to-unicode CMAPs\r\n\r\n-  [ ] Implement the remaining ``StreamFilter``\\ s (will probably have\r\n   the image ones return a ``PIL.Image``)\r\n\r\n   -  [ ] ``RunLengthDecode``\r\n   -  [ ] ``CCITTFaxDecode``\r\n   -  [ ] ``JBIG2Decode``\r\n   -  [ ] ``DCTDecode``\r\n   -  [ ] ``JPXDecode``\r\n   -  [ ] ``Crypt``\r\n\r\n-  [ ] Implement remaining object types\r\n\r\n   -  [ ] ``ObjStm``\r\n   -  [x] ``XRef``\r\n   -  [ ] ``Filespec``\r\n   -  [ ] ``EmbeddedFile``\r\n   -  [ ] ``CollectionItem`` / ``CollectionSubitem``\r\n   -  [ ] ``XObject``\r\n\r\n-  [ ] Handle document encryption\r\n-  [ ] Start on graphics stuff (maybe)\r\n-  [ ] Interactive forms (AcroForms)\r\n-  **Administrative**\r\n-  [ ] Write tests for existing code\r\n-  [x] Come up with a better name\r\n-  [ ] Document everything much, much better internally\r\n-  [ ] Package it up neatly and pypi it\r\n-  [ ] Write some proper documentation\r\n\r\n.. |GitHub license| image:: https://img.shields.io/github/license/mashape/apistatus.svg\r\n   :target: https://github.com/ajmarks/pdf_parser/blob/master/LICENSE\r\n.. |Code Issues| image:: https://www.quantifiedcode.com/api/v1/project/d0106c63f4f8467586aae7498f148e94/badge.svg\r\n   :target: https://www.quantifiedcode.com/app/project/d0106c63f4f8467586aae7498f148e94",
        "description_content_type": null,
        "docs_url": null,
        "download_url": "https://github.com/ajmarks/gymnast/tarball/0.1a5",
        "downloads": {
            "last_day": -1,
            "last_month": -1,
            "last_week": -1
        },
        "home_page": "https://github.com/ajmarks/gymnast/",
        "keywords": "pdf,acrobat",
        "license": "MIT License",
        "maintainer": "",
        "maintainer_email": "",
        "name": "gymnast",
        "package_url": "https://pypi.org/project/gymnast/",
        "platform": "UNKNOWN",
        "project_url": "https://pypi.org/project/gymnast/",
        "project_urls": {
            "Download": "https://github.com/ajmarks/gymnast/tarball/0.1a5",
            "Homepage": "https://github.com/ajmarks/gymnast/"
        },
        "release_url": "https://pypi.org/project/gymnast/0.1a5/",
        "requires_dist": null,
        "requires_python": null,
        "summary": "Gymnast: PDF document parser in Python 3",
        "version": "0.1a5"
    },
    "last_serial": 1824960,
    "releases": {
        "0.1a5": [
            {
                "comment_text": "",
                "digests": {
                    "md5": "09ad7634c63246c7bc128d55ec9abae4",
                    "sha256": "66eeb12762d7af83acacf2c3af69ab0af282cb7150167106ac6aef2e05de0b51"
                },
                "downloads": -1,
                "filename": "gymnast-0.1a5.zip",
                "has_sig": false,
                "md5_digest": "09ad7634c63246c7bc128d55ec9abae4",
                "packagetype": "sdist",
                "python_version": "source",
                "requires_python": null,
                "size": 212024,
                "upload_time": "2015-11-18T23:22:10",
                "url": "https://files.pythonhosted.org/packages/0d/47/2bf455d7817f0fe22de578d3467892fbb9eca551684b3cc6ae963153f96c/gymnast-0.1a5.zip"
            }
        ]
    },
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "md5": "09ad7634c63246c7bc128d55ec9abae4",
                "sha256": "66eeb12762d7af83acacf2c3af69ab0af282cb7150167106ac6aef2e05de0b51"
            },
            "downloads": -1,
            "filename": "gymnast-0.1a5.zip",
            "has_sig": false,
            "md5_digest": "09ad7634c63246c7bc128d55ec9abae4",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 212024,
            "upload_time": "2015-11-18T23:22:10",
            "url": "https://files.pythonhosted.org/packages/0d/47/2bf455d7817f0fe22de578d3467892fbb9eca551684b3cc6ae963153f96c/gymnast-0.1a5.zip"
        }
    ]
}