{
    "info": {
        "author": "Riccardo Di Maio",
        "author_email": "riccardodimaio11@gmail.com",
        "bugtrack_url": null,
        "classifiers": [
            "Development Status :: 5 - Production/Stable",
            "Environment :: Console",
            "Intended Audience :: Developers",
            "Intended Audience :: Information Technology",
            "Intended Audience :: Other Audience",
            "Intended Audience :: Science/Research",
            "Intended Audience :: System Administrators",
            "License :: OSI Approved :: MIT License",
            "Operating System :: POSIX",
            "Programming Language :: Python :: 2.7",
            "Programming Language :: Python :: 3",
            "Topic :: Text Processing",
            "Topic :: Utilities"
        ],
        "description": "<div align=\"center\">\n&nbsp;&nbsp;\n  <a href=\"https://github.com/rdimaio/parsa\">\n    <img src=\"parsa/img/logo.png?raw=true\" alt=\"Logo\"/>\n  </a>\n\n  <strong>A text parser that doesn't care about your file extensions</strong>\n\n  <!-- Build Status -->\n  <a href=\"https://travis-ci.com/rdimaio/parsa\">\n    <img src=\"https://travis-ci.com/rdimaio/parsa.svg?branch=master\"\n      alt=\"Build Status\" />\n  </a>\n  <!-- Code Coverage -->\n  <a href=\"https://codecov.io/gh/rdimaio/parsa\">\n    <img src=\"https://codecov.io/gh/rdimaio/parsa/branch/master/graph/badge.svg\"\n      alt=\"Code Coverage\" />\n  </a>\n  <!-- SemVer Version -->\n  <a href=\"https://github.com/rdimaio/parsa\">\n    <img src=\"https://img.shields.io/badge/Version-1.1.5-blue.svg\"\n      alt=\"SemVer Version\" />\n  </a>\n\n  <a href=\"#key-features\">Key Features</a> \u2022\n  <a href=\"#supported-formats\">Supported Formats</a> \u2022\n  <a href=\"#installation\">Installation</a> \u2022\n  <a href=\"#usage\">Usage</a> \u2022\n  <a href=\"#related-projects\">Related projects</a> \u2022\n  <a href=\"#contributing\">Contributing</a> \u2022\n  <a href=\"#license\">MIT License</a>\n</div>\n\n![Demo GIF](parsa/img/demo.gif?raw=true \"Demo GIF\")\n\n<p style=\"text-align: center;\">\n  <strong>\n    Parsa is a <a href=\"https://github.com/deanmalmgren/textract\">textract</a>-based CLI text parser that supports multiple file extensions.\n    It takes any number of inputs, and outputs them to .txt files in a directory of choice, preserving the structure of the original text.\n  </strong>\n</p>\n\n# Key features\n- Extends [textract](https://github.com/deanmalmgren/textract)'s functionalities to work with multiple inputs and to automatically save the output\n- Takes an arbitrary number of inputs of different filetypes, and processess them all equally when supported\n- Outputs the parsed text from the input files individually to corresponding .txt files, with the option of selecting a custom output path\n- Includes a naming system that always avoids overwriting existing files, instead naming new files in a simple manner\n- Supports over 20 of the most common formats (see [Supported formats](#supported-formats) for more)\n- Preserves the structure of document file formats (.docx, .pdf, ...)\n- Supports audio formats (.wav, .mp3, ...) via the speech recognition tools [sox](https://github.com/chirlu/sox), [SpeechRecognition](https://github.com/Uberi/speech_recognition) and [pocketsphinx](https://github.com/cmusphinx/pocketsphinx/)\n- Supports image formats (.jpg, .png, ...), via the optical character recognition (OCR) tool [tesseract-ocr](https://github.com/tesseract-ocr/tesseract)\n- Prompts the user for an input file's extension if it's not explicitly present; this feature can be turned off via `--noprompt`\n\n# Supported formats\nSee [this page](https://textract.readthedocs.io/en/stable/#currently-supporting) from textract's documentation for a full list of the supported formats and their linked dependencies.\n\n# Installation\n## System requirements\n- Linux\n- Python 2.7/3.x (any Python 3 version)\n## Linux\nVia `pip`:\n```bash\n$ pip install parsa\n```\n\nOr, if you prefer, you can install it from source:\n```bash\n# Clone the repository\n$ git clone https://github.com/rdimaio/parsa\n\n# Go into the parsa folder\n$ cd parsa\n\n# Install parsa\n$ python setup.py install\n```\n\n### Tests\n```bash\n$ python -m unittest discover tests\n```\n\n# Usage\n## Single input\n```bash\n# Basic usage\n$ parsa path/to/input_file\n# The output will be saved inside the input file's parent folder.\n```\n\n## Multi input\n```bash\n# Basic usage\n$ parsa path/to/input_folder\n# The output will be saved inside a folder named `parsaoutput` in the input folder.\n```\n\n### Optional: custom output folder\n```bash\n# Basic usage\n$ parsa path/to/input -o path/to/output_folder\n# Works with both single and multi input.\n```\n\n### Optional: ignore files without an explicit extension\n```bash\n# Basic usage\n$ parsa --noprompt path/to/input\n# Useful for situations where your input includes log/system files without an extension.\n```\n\n## Full help message\n```\n$ parsa --help\nusage: parsa [-h] [--noprompt] [--output [OUTPUT]] input\n\nTextract-based text parser that supports most text file extensions. Parsa can\nparse multiple formats at once, writing them to .txt files in the directory of\nchoice.\n\npositional arguments:\n  input                 input file or folder; if a folder is passed as input,\n                        parsa will scan every file inside it recursively\n                        (scanning subfolders as well)\n\noptional arguments:\n  -h, --help            show this help message and exit\n  --noprompt, -n        ignore files without an extension and don't prompt the\n                        user to input their extension\n  --output [OUTPUT], -o [OUTPUT]\n                        folder where the output files will be stored. The default folder is:\n                        (a) the input file's parent folder, if the input is a file, or\n                        (b) a folder named 'parsaoutput' located in the input folder, if the input is a folder.\n```\n\n# Related projects\n- [parsa-gui](https://github.com/rdimaio/parsa-gui) - Graphical version of parsa (WIP)\n- [xparsa](https://github.com/rdimaio/xparsa) - Extended parsa, enhanced with statistics about the parsed files (WIP)\n- [xparsa-gui](https://github.com/rdimaio/xparsa-gui) - GUI for xparsa (WIP)\n\n# Contributing\nPull requests are welcome! If you would like to include/remove/change a major feature, please open an issue first.\n\n# License\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.",
        "description_content_type": "text/markdown",
        "docs_url": null,
        "download_url": "",
        "downloads": {
            "last_day": -1,
            "last_month": -1,
            "last_week": -1
        },
        "home_page": "http://github.com/rdimaio/parsa",
        "keywords": "parsa",
        "license": "MIT",
        "maintainer": "",
        "maintainer_email": "",
        "name": "parsa",
        "package_url": "https://pypi.org/project/parsa/",
        "platform": "",
        "project_url": "https://pypi.org/project/parsa/",
        "project_urls": {
            "Homepage": "http://github.com/rdimaio/parsa"
        },
        "release_url": "https://pypi.org/project/parsa/1.1.5/",
        "requires_dist": null,
        "requires_python": "",
        "summary": "A multiformat text parser",
        "version": "1.1.5"
    },
    "last_serial": 4613030,
    "releases": {
        "1.1.5": [
            {
                "comment_text": "",
                "digests": {
                    "md5": "26065fb2f539792912bb6beadf2c84f9",
                    "sha256": "443f77b3ca810d4faadad70d03a2d68b3b1928076fdcb27547e7458a2620ae07"
                },
                "downloads": -1,
                "filename": "parsa-1.1.5.tar.gz",
                "has_sig": false,
                "md5_digest": "26065fb2f539792912bb6beadf2c84f9",
                "packagetype": "sdist",
                "python_version": "source",
                "requires_python": null,
                "size": 7211,
                "upload_time": "2018-12-18T17:06:29",
                "url": "https://files.pythonhosted.org/packages/c8/ec/2ef5d03b49665f10109934bbf095178ec6728c3a77700d4ab3f89f2a85b8/parsa-1.1.5.tar.gz"
            }
        ]
    },
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "md5": "26065fb2f539792912bb6beadf2c84f9",
                "sha256": "443f77b3ca810d4faadad70d03a2d68b3b1928076fdcb27547e7458a2620ae07"
            },
            "downloads": -1,
            "filename": "parsa-1.1.5.tar.gz",
            "has_sig": false,
            "md5_digest": "26065fb2f539792912bb6beadf2c84f9",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 7211,
            "upload_time": "2018-12-18T17:06:29",
            "url": "https://files.pythonhosted.org/packages/c8/ec/2ef5d03b49665f10109934bbf095178ec6728c3a77700d4ab3f89f2a85b8/parsa-1.1.5.tar.gz"
        }
    ]
}