{ "info": { "author": "Pankaj Rawat", "author_email": "pankajr141@gmail.com", "bugtrack_url": null, "classifiers": [ "License :: OSI Approved :: MIT License", "Operating System :: OS Independent", "Programming Language :: Python :: 3" ], "description": "# pdfutil [Under Development]\nLibrary provides a lot of operations over PDF/Image.\n\n## Input and Output \nThe Libarary expose each function with a standard set of argument which are fixed for eevry function\n```\nimport pdfutil\ncoordinates = pdfutil.detect_*(pdf_location, [save_result=False], [show_result=False], [result_location='.'], [args={}])\n```\n\n| Name | Description |\n| --- | --- |\n|**pdf_location** | input location of PDF, image can also be passed libaray will autodetect the image|\n|**save_result**| Default False, If True will save the result pdf/img in location specified by result_location|\n|**show_result**| Default False, This is used for debugging only when True will popup a matplotlib plot highlighting the regions which are detected with corresponding labels|\n|**result_location**| Default current directory, location where ouptut needs to be saved, ignored if save_result is set as False|\n|**args**| custom set of args in form of dictionaty specific to each function|\n|**coordinates**| Output returned by the function call, this will contain json output in following format|\n```\n[\n {\n \"type\": \"text\",\n \"output\": {\n \"coord\": [\n [\"pageno_1\", \"startx_1\", \"starty_1\", \"width_1\", \"height_1\"],\n [\"pageno_2\", \"startx_2\", \"starty_2\", \"width_2\", \"height_2\"]\n ]\n }\n },\n {\n \"type\": \"table\",\n \"output\": {\n \"coord\": [\n [\"pageno_1\", \"startx_1\", \"starty_1\", \"width_1\", \"height_1\"],\n ]\n }\n }\n]\n```\n\n## operations\n\n### Detecting Tables\n```\nimport pdfutil\ncoordinates = pdfutil.detect_tables(pdf_location)\n```\n\n### Detecting Text Regions [Paragrahs / Unstructured Content]\n```\nimport pdfutil\ncoordinates = pdfutil.detect_text(pdf_location)\n```\n\n### Detecting Non-Text Regions [Images / Logos]\n```\nimport pdfutil\ncoordinates = pdfutil.detect_non_text(pdf_location)\n```\n\n### Detecting Language\n```\nimport pdfutil\ncoordinates = pdfutil.detect_non_language(pdf_location)\n```\n\n### Detecting Key Value Pairs\n```\nimport pdfutil\ncoordinates = pdfutil.detect_key_value_pairs(pdf_location)\n```\n\n\n\n", "description_content_type": "text/markdown", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/pankajr141/pdfutil", "keywords": "", "license": "", "maintainer": "", "maintainer_email": "", "name": "pdfutil", "package_url": "https://pypi.org/project/pdfutil/", "platform": "", "project_url": "https://pypi.org/project/pdfutil/", "project_urls": { "Homepage": "https://github.com/pankajr141/pdfutil" }, "release_url": "https://pypi.org/project/pdfutil/0.0.1/", "requires_dist": [ "pdf2jpg (==0.0.9)" ], "requires_python": "", "summary": "Library provides a useful operations over PDF/Image", "version": "0.0.1" }, "last_serial": 5539063, "releases": { "0.0.1": [ { "comment_text": "", "digests": { "md5": "98c30cc98ba8035411120b2da97d7d6a", "sha256": "83b635a34207db8d4d4b6f57956cb00d54ce63ba2ab0d3c0380d7bcb6cf8f749" }, "downloads": -1, "filename": "pdfutil-0.0.1-py3-none-any.whl", "has_sig": false, "md5_digest": "98c30cc98ba8035411120b2da97d7d6a", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 1938, "upload_time": "2019-07-16T07:58:19", "url": "https://files.pythonhosted.org/packages/06/71/b94fa5e1cd14ce1cf1296dbd59a362ca81ecd41fd7e52d1e71fd57ae7022/pdfutil-0.0.1-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "cf4ad266aafb06816cd6764e21d35244", "sha256": "1cf034e71e888c7993ece8a84d4ee0721ecb2aa6a276504a7720af9b00b8de99" }, "downloads": -1, "filename": "pdfutil-0.0.1.tar.gz", "has_sig": false, "md5_digest": "cf4ad266aafb06816cd6764e21d35244", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 2004, "upload_time": "2019-07-16T07:58:21", "url": "https://files.pythonhosted.org/packages/47/be/a0c846f9d6976ce892a176319b1cf2fc9ea40b81f825b179f2a457de2740/pdfutil-0.0.1.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "98c30cc98ba8035411120b2da97d7d6a", "sha256": "83b635a34207db8d4d4b6f57956cb00d54ce63ba2ab0d3c0380d7bcb6cf8f749" }, "downloads": -1, "filename": "pdfutil-0.0.1-py3-none-any.whl", "has_sig": false, "md5_digest": "98c30cc98ba8035411120b2da97d7d6a", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 1938, "upload_time": "2019-07-16T07:58:19", "url": "https://files.pythonhosted.org/packages/06/71/b94fa5e1cd14ce1cf1296dbd59a362ca81ecd41fd7e52d1e71fd57ae7022/pdfutil-0.0.1-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "cf4ad266aafb06816cd6764e21d35244", "sha256": "1cf034e71e888c7993ece8a84d4ee0721ecb2aa6a276504a7720af9b00b8de99" }, "downloads": -1, "filename": "pdfutil-0.0.1.tar.gz", "has_sig": false, "md5_digest": "cf4ad266aafb06816cd6764e21d35244", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 2004, "upload_time": "2019-07-16T07:58:21", "url": "https://files.pythonhosted.org/packages/47/be/a0c846f9d6976ce892a176319b1cf2fc9ea40b81f825b179f2a457de2740/pdfutil-0.0.1.tar.gz" } ] }