{ "info": { "author": "Alejandro Saucedo", "author_email": "a@e-x.io", "bugtrack_url": null, "classifiers": [ "Intended Audience :: Developers", "Programming Language :: Python :: 3", "Programming Language :: Python :: 3.3", "Programming Language :: Python :: 3.4" ], "description": "# WordCount Python - [ wc.py ]\n\nUtility tool to count the occurrences of words in text using an NLP tokenizer.\n\n## Overview\n\nThis repository contains the CLI and SDK for the WordCount Python [ wc.py ].\n\n`wc.py` provides a set of tools to analyse the number of occurences of words across a single or multiple documents. It can be accessed through the CLI, or directly through the SDK provided by the `WCExtractor` and the `WCCore` classes in the `wcpy` module.\n\nFor the **CLI interface quickstart** please refer to the **User Guide below**.\n\nFor the **SDK interface quickstart** please refer to the **SDK Interface below**.\n\nFor more advanced documnetation please refer to the official [WCPY documentation](https://axsauze.github.io/wcpy/).\n\n# Installation\n\nYou can install it from pip by running:\n\n```\npip install wc.py\n```\n\nThis will install the script in your computer so you'll be able to call it directly with `wc.py`.\n\n# CLI User Guide\n\n## Usage\n\nAfter installing it you can view the usage and options with `wc.py -h`:\n\n```\nusage: wc.py [-h] [-v] [--limit LIMIT] [--reverse]\n [--filter-words FILTER_WORDS [FILTER_WORDS ...]]\n [--file-ext FILE_EXT] [--truncate TRUNCATE]\n [--columns COLUMNS [COLUMNS ...]] [--output-file OUTPUT_FILE]\n paths [paths ...]\n\nCount the number of words in the files on a folder\n\npositional arguments:\n paths (REQUIRED) Path(s) to folders and/or files to count words from\n\noptional arguments:\n -h, --help show this help message and exit\n -v, --version show program's version number and exit\n --limit LIMIT (Optional) Limit the number of results that you would like to display.\n --reverse (Optional) List is sorted in ascending order by default, use this flag to reverse sorting to descending order.\n --filter-words FILTER_WORDS [FILTER_WORDS ...]\n (Optional) You can get results filtered to only the list of words provided.\n --file-ext FILE_EXT (Optional) This is the default file extention for the files being used\n --truncate TRUNCATE (Optional) Output is often quite large, you can truncate the output by passing a number greater than 5\n --columns COLUMNS [COLUMNS ...]\n (Optional) This argument allows you to choose the columns to be displayed in the output. Options are: word, count, files and sentences.\n --output-file OUTPUT_FILE\n (Optional) Define an output file to save the output\n\nEXAMPLE USAGE:\n wc.py ./\n wc.py ./ --limit 10\n wc.py doc1.txt doc2.txt --filter-words tool awesome an\n wc.py docs/ tests/ --truncate 100 --columns word count\n wc.py ./ --filter-words tool awesome an --truncate 50 --output output.txt\n```\n\n## Examples\n\n#### Counts of word occurences in documents in this folder recusively\n\n```\nwc.py ./\n```\n\n#### Word occurrences in this folder docs with limit of the top 10\n\n```\nwc.py ./ --limit 10\n```\n\n#### Word occurences in multiple files showing only specific words\n\n```\nwc.py doc2.txt doc1.txt --filter-words tool awesome an\n```\n\n#### Word occurences in folder with output truncated and only 2 columns\n\n```\nwc.py tests/test_data/ --truncate 20 --columns word count\n```\n\n#### Saving output to file\n\n```\nwc.py ./ --filter-words tool awesome an --truncate 50 --output output.txt\n```\n\n#### Get the current version\n\n```\nwc.py -v\n```\n\n# SDK Interface\n\nIt is possible to interact with the SDK in multiple levels, the two most common usecases will be:\n\n* WCCore class - Interact with filepaths\n* WCExtractor class - Interact with files and text\n\n## WCCore class\n\n### generate_wc_dict(self, paths)\n\nThis function finds all the files in a given set of paths, and builds a dictionary with the following structure:\n\n\n### generate_wc_list(self, paths)\n\nThis function finds all the files in a given set of paths, and builds a sorted list (by word count) of the following structure\n\n\n## WCExtractor class\n\n### extract_wc_from_file\n\nThis function extracts all the text from a file and builds a wc_dict object\n\n### extract_wc_from_line\n\nThis function extracts all the words from a line and adds it to a wc_dict object\n\n## WCExtractorProcessor class\n\nThis class does all the processing to convert a WC Dict into a sorted WCList object.\n\n### process_dict_wc_to_list\n\nAs function name suggest, this function converts a WCDict object into a sorted WCList object.\n\n## Core WC Types\n\n### WCDict\n\n```\n{\n : {\n word_count: ,\n files: {\n : [\n ,\n ,\n \u2026\n ]\n },\n {\n \u2026\n }\n },\n : ...\n}\n```\n\n### WCList\n\n```\n[\n {\n \"word\": ,\n\n word_count: ,\n files: {\n : [\n ,\n ,\n \u2026\n ]\n }\n },\n {\n \"word\": ,\n ...\n\n }\n]\n```\n\n# Contributing\n\nIf you'd like to contribute, feel free to submit a pull request, open bugs/issues and join the discussion.\n\n## Install VirtualEnv and Requirements\n\nPython 3.X is used, and it's strongly recommended to set up the project in a virtual environment:\n\n```\nvirtualenv --no-site-packages -p python3 venv\n```\n\nThen install it using the setup.py command\n\n```\npython setup.py install_data\n```\n\nYou can also install the requirements directly by running\n\n```\npython -r requirements.txt\n```\n\n## NLTK\n\nThis package uses the NLTK `english.pickle` dataset. The dataset includes in both, the repository and the PyPi package, however if you want to donwload more of the languages you can do so with the following command:\n\n```\npython -c \"import nltk; nltk.download('punkt')\"\n```\n\n## Testing\n\n`py.test` is used to run the tests, in order to run it simply run:\n\n```\npython setup.py test\n```\n\n## Cleaning\n\nTo clean all the files generated during runtime simply run:\n\n```\npython setup.py clean\n```\n\n# Roadmap\n\n* Support multiple types of documents", "description_content_type": null, "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/axsauze/wcpy", "keywords": "Word count (wcpy) on steroids", "license": "MIT", "maintainer": "", "maintainer_email": "", "name": "wcpy", "package_url": "https://pypi.org/project/wcpy/", "platform": "UNKNOWN", "project_url": "https://pypi.org/project/wcpy/", "project_urls": { "Homepage": "https://github.com/axsauze/wcpy" }, "release_url": "https://pypi.org/project/wcpy/1.2/", "requires_dist": [ "mock (==2.0.0)", "nltk (==3.2.4)", "pbr (==3.1.1)", "py (==1.4.34)", "pytest (==3.1.2)", "pytest-runner (==2.11.1)", "six (==1.10.0)" ], "requires_python": "", "summary": "WordCount in Python with a lot more functionality", "version": "1.2" }, "last_serial": 2977824, "releases": { "1.0": [ { "comment_text": "", "digests": { "md5": "6e50805735b37f3ce34fd258f316052d", "sha256": "dc6f759801e7367a69d3312fb30c4ac088c6114919bb05c4ae6837b4c4d6cb14" }, "downloads": -1, "filename": "wcpy-1.0-py3-none-any.whl", "has_sig": false, "md5_digest": "6e50805735b37f3ce34fd258f316052d", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 12651, "upload_time": "2017-06-24T12:35:44", "url": "https://files.pythonhosted.org/packages/31/16/9a7e332d555dd7bd64f9bb4e0005232aa53feba1e401bafabbd984317faa/wcpy-1.0-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "ac12ab4678d9227721fa714062f572f3", "sha256": "e0c97cd44a2a83a4f904c4b1f8f7ed10f120914568ba8147a1a52f3436c04d9c" }, "downloads": -1, "filename": "wcpy-1.0.tar.gz", "has_sig": false, "md5_digest": "ac12ab4678d9227721fa714062f572f3", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 10778, "upload_time": "2017-06-24T12:35:46", "url": "https://files.pythonhosted.org/packages/d2/31/8907fa268fbe8c8eec18e294ddb0bf62162f787b47ee98abc9ff71bb53c6/wcpy-1.0.tar.gz" } ], "1.1": [ { "comment_text": "", "digests": { "md5": "2330e90eb27908464579795e1bc2ecf6", "sha256": "d793787604ededa728e8ae05a27a7e31d29f4aec8f6493a198e3aed29fdd4fb0" }, "downloads": -1, "filename": "wcpy-1.1-py3-none-any.whl", "has_sig": false, "md5_digest": "2330e90eb27908464579795e1bc2ecf6", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 349504, "upload_time": "2017-06-24T13:22:32", "url": "https://files.pythonhosted.org/packages/ee/df/09002962ad81822188c03a3aeb162b6a514db4cbba35e353b922e548c44b/wcpy-1.1-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "1f443515cb1fce1c867afd4e81af8169", "sha256": "7414399eef1fd0fc0762774cb63f07cf555e880e6e6d4238b43d7fe9a6d8d98b" }, "downloads": -1, "filename": "wcpy-1.1.tar.gz", "has_sig": false, "md5_digest": "1f443515cb1fce1c867afd4e81af8169", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 346271, "upload_time": "2017-06-24T13:22:34", "url": "https://files.pythonhosted.org/packages/3d/3a/67833f1837a77e73193cae4e2fd4570a2dc419c0b0a7be664f51414172a4/wcpy-1.1.tar.gz" } ], "1.2": [ { "comment_text": "", "digests": { "md5": "e8f5f7c2fb8549592cc730e8786c880f", "sha256": "d4d8d97c1b87589b8c1a708ed7688a9327881231fd460841687dd14c5a7870e3" }, "downloads": -1, "filename": "wcpy-1.2-py3-none-any.whl", "has_sig": false, "md5_digest": "e8f5f7c2fb8549592cc730e8786c880f", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 352599, "upload_time": "2017-06-25T18:19:27", "url": "https://files.pythonhosted.org/packages/d8/bd/b5720a66061c8b9fb118c1d317833ec94d219da427de4cf1f5d1c1e4e0b5/wcpy-1.2-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "3bed6fddf5e75ed3d668fd6d2e1f543e", "sha256": "e08d65c07a00605894f233da25c0fb79d02ba49792eddd1854dbde2ca9da078c" }, "downloads": -1, "filename": "wcpy-1.2.tar.gz", "has_sig": false, "md5_digest": "3bed6fddf5e75ed3d668fd6d2e1f543e", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 349320, "upload_time": "2017-06-25T18:19:30", "url": "https://files.pythonhosted.org/packages/17/11/1b08765fcf0b4f3cb23cc95352709ee1e1dbf5d186f6f6d4e8a7006ea44a/wcpy-1.2.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "e8f5f7c2fb8549592cc730e8786c880f", "sha256": "d4d8d97c1b87589b8c1a708ed7688a9327881231fd460841687dd14c5a7870e3" }, "downloads": -1, "filename": "wcpy-1.2-py3-none-any.whl", "has_sig": false, "md5_digest": "e8f5f7c2fb8549592cc730e8786c880f", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 352599, "upload_time": "2017-06-25T18:19:27", "url": "https://files.pythonhosted.org/packages/d8/bd/b5720a66061c8b9fb118c1d317833ec94d219da427de4cf1f5d1c1e4e0b5/wcpy-1.2-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "3bed6fddf5e75ed3d668fd6d2e1f543e", "sha256": "e08d65c07a00605894f233da25c0fb79d02ba49792eddd1854dbde2ca9da078c" }, "downloads": -1, "filename": "wcpy-1.2.tar.gz", "has_sig": false, "md5_digest": "3bed6fddf5e75ed3d668fd6d2e1f543e", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 349320, "upload_time": "2017-06-25T18:19:30", "url": "https://files.pythonhosted.org/packages/17/11/1b08765fcf0b4f3cb23cc95352709ee1e1dbf5d186f6f6d4e8a7006ea44a/wcpy-1.2.tar.gz" } ] }