{ "info": { "author": "Nheeng", "author_email": "contact@nheeng.com", "bugtrack_url": null, "classifiers": [], "description": "\n# Aruana NLP for Machine Learning\n\n* [Aruana](https://aruana.nheeng.com/)\n\n* [Nhe'eng](https://nheeng.com/)\n\nAruana is a collection of methods that can be used for simple NLP tasks. It can be used for tasks involving text preprocessing for machine learning. Aruana works mainly with text strings and lists of strings. \n\nThe library is developed in Python 3.\n\n## Installing Aruana\n\n### pip\n\n $ pip3 install aruana\n\nIf you want, you can also install Aruana in a virtual environment:\n\n\t$ python -m venv .env\n\n\t$ source .env/bin/activate\n\n\t$ pip3 install aruana\n\n\n### Prerequisites\n\nAruana uses following external Python libraries:\n\n- nltk (3.3)\n\n- tqdm (4.19.5)\n\n- pdoc (0.5.1)\n\nThey are all documented in the [requirements.txt](requirements.txt) file.\n\n\n## Usage examples\n\nTo use Aruana, initialize it by choosing one of the three available languages ('en', 'fr', 'pt-br')\n\n\taruana_en = Aruana('en')\n\n### Quick preprocessing\n\nAruana has the preprocess method, which applies commonly used preprocessed steps on you text.\n\n\ttext = \"At the end of the day, you're solely responsible for your success and your failure. And the sooner you realize that, you accept that, and integrate that into your work ethic, you will start being successful. As long as you blame others for the reason you aren't where you want to be, you will always be a failure.\"\n\tpreprocessed_text = aruana_en.preprocess(text)\n\tprint(preprocessed_text)\n\n\t>>> ['at', 'the', 'end', 'of', 'the', 'day', 'you', 'are', 'sole', 'respons', 'for', 'your', 'success', 'and', 'your', 'failur', 'and', 'the', 'sooner', 'you', 'realiz', 'that', 'you', 'accept', 'that', 'and', 'integr', 'that', 'into', 'your', 'work', 'ethic', 'you', 'will', 'start', 'be', 'success', 'as', 'long', 'as', 'you', 'blame', 'other', 'for', 'the', 'reason', 'you', 'are', 'not', 'where', 'you', 'want', 'to', 'be', 'you', 'will', 'alway', 'be', 'a', 'failur']\n\nIf you prefer, you can choose to:\n\n- tokenize the sentence\n\n- stem it\n\n- remove stop words\n\n- pos tag the portuguese sentences\n\n\t\ttext = \"At the end of the day, you're solely responsible for your success and your failure. And the sooner you realize that, you accept that, and integrate that into your work ethic, you will start being successful. As long as you blame others for the reason you aren't where you want to be, you will always be a failure.\"\n\n\t\tpreprocessed_text = aruana_en.preprocess(text, stem=False, remove_stopwords=True)\n\n\t\tprint(preprocessed_text)\n\n\t\t['end', 'day', 'solely', 'responsible', 'success', 'failure', 'sooner', 'realize', 'that', 'accept', 'that', 'integrate', 'work', 'ethic', 'start', 'successful', 'long', 'blame', 'others', 'reason', 'want', 'be', 'always', 'failure']\n\n### List preprocessing\n\nIf you have a list of sentences or you are using Pandas, you can pass the entire list for preprocessing by using the preprocess_list method.\n\n\tlist_of_strings = ['I love you',\n\t\t\t\t\t\t'Please, never leave me alone',\n\t\t\t\t\t\t'If you go, I will die',\n\t\t\t\t\t\t'I am watching a lot of romantic comedy lately',\n\t\t\t\t\t\t'I have to eat icecream' ]\n\n\tlist_processed = aruana_en.preprocess_list(list_of_strings, stem=False, remove_stopwords=True)\n\n\tprint(list_processed)\n\n\t>>> [['love'], ['please', 'never', 'leave', 'alone'], ['go', 'die'], ['watching', 'lot', 'romantic', 'comedy', 'lately'], ['eat', 'icecream']]\n\n### Defining your own pipeline\n\nUse the single available methods to create a custom pipeline instead of using the quick preprocessing function.\n\n\ttext = \"At the end of the day, @john you're solely responsible for your #success and your #failure. And the sooner you realize that, you accept that, and integrate that into your work ethic, you will start being #successful.\"\n\ttext = aruana_en.lower_remove_white(text)\n\ttext = aruana_en.expand_contractions(text)\n\ttext = aruana_en.replace_handles(text, 'HANDLE')\n\ttext = aruana_en.replace_hashtags(text, 'HASHTAG')\n\ttext = aruana_en.remove_stopwords(text)\n\ttext = aruana_en.replace_punctuation(text, placeholder='PUNCTUATION')\n\ttext = aruana_en.tokenize(text)\n\tprint(text)\n\n\t>>> ['end', 'day', 'PUNCTUATION', 'HANDLE', 'solely', 'responsible', 'HASHTAG', 'HASHTAG', 'PUNCTUATION', 'sooner', 'realize', 'that', 'PUNCTUATION', 'accept', 'that', 'PUNCTUATION', 'integrate', 'work', 'ethic', 'PUNCTUATION', 'start', 'HASHTAG', 'PUNCTUATION']\n\n## Development\n\n### Testing\n\n1. Create a clean test environment\n\n2. Navigate to aruana project on your computer and generate a package using bdist_wheel\n\n\t\t$ python3 setup.py sdist bdist_wheel\n\n3. Install the package\n\n\t\t$ python3 setup.py install\n\n### Docs\n\nNavigate to aruana/aruana and type:\n\n\t$ pdoc --html aruana\n\n### Release\n\nFollow the steps below before releasing a new version:\n\n1. Update all necessary documents\n\n2. Generate the package using bdist\n\n3. Install the new version on a clean environment for testing\n\n4. If everything is ok, generate the doc using pdoc\n\n## Contributing\n\nPlease read [CONTRIBUTING.md](CONTRIBUTING.md) for details on our code of conduct, and the process for submitting pull requests to us.\n\n## Versioning\n\nUse [SemVer](http://semver.org/) for versioning.\n\n## Authors\n\n* **Wilame Vallantin** - *Initial work* - [Nhe'eng](https://nheeng.com/)\n\n## License\n\nThis project is licensed under the Apache License - see the [LICENSE.md](LICENSE.md) file for details\n\n## V. 1.1.1\n\n### New features\n\n- Adds the random_classification method, useful for random text classification for testing model accuracy\n- Adds the replace_with_blob method, useful for creating blobs from a corpus for testing method accuracy\n- adds the strings module, with a list of punctuation and diacritic strings\n- adds an internal tokenizer\n- adds a pos-tagger for portuguese (experimental, version 0.0.1)\n\n### Improvements\n- expand_contractions recognizes now more words for portuguese\n- Preprocess text now converts emojis to text instead of completely removing them\n- Removes NLTK tokenizer and replaces it for an internal tokenizer\n\n\n", "description_content_type": "text/markdown", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://aruana.nheeng.com", "keywords": "", "license": "Apache", "maintainer": "", "maintainer_email": "", "name": "Aruana", "package_url": "https://pypi.org/project/Aruana/", "platform": "", "project_url": "https://pypi.org/project/Aruana/", "project_urls": { "Homepage": "https://aruana.nheeng.com" }, "release_url": "https://pypi.org/project/Aruana/1.1.1/", "requires_dist": null, "requires_python": "", "summary": "Aruana is a collection of methods that can be used for simple NLP tasks and for machine learning text preprocessing.", "version": "1.1.1" }, "last_serial": 4794794, "releases": { "0.0.1": [ { "comment_text": "", "digests": { "md5": "2285f223795b601c82f539e30688aa8d", "sha256": "2d3d26307cf1d6a817d0671c70063ad85d7ea3e081586291b574a1241502c648" }, "downloads": -1, "filename": "Aruana-0.0.1-py3.5.egg", "has_sig": false, "md5_digest": "2285f223795b601c82f539e30688aa8d", "packagetype": "bdist_egg", "python_version": "3.5", "requires_python": null, "size": 433489, "upload_time": "2019-01-21T13:52:41", "url": "https://files.pythonhosted.org/packages/7b/d9/d41b7b261f4369f33f5c7c6964d4a16f1b315f1adcfbff64180d6876e135/Aruana-0.0.1-py3.5.egg" }, { "comment_text": "", "digests": { "md5": "6ce7abbfa456f3f8f02756b04e4cf34f", "sha256": "97d0280e579460db4fab14565ee54a849dac88a4a4a04a4bd51e0f34c1a11596" }, "downloads": -1, "filename": "Aruana-0.0.1-py3-none-any.whl", "has_sig": false, "md5_digest": "6ce7abbfa456f3f8f02756b04e4cf34f", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 186881, "upload_time": "2019-01-21T13:52:39", "url": "https://files.pythonhosted.org/packages/87/9f/266bd55054b155ba0a961fdb4aa385f1decc3e88e7365477dcb393f47f52/Aruana-0.0.1-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "58e303a55a43bce9a174afc9e1ad9b55", "sha256": "d8629ae418c9562279d260064c0de64582acca42ee764bb401918e38fcc816ff" }, "downloads": -1, "filename": "Aruana-0.0.1.tar.gz", "has_sig": false, "md5_digest": "58e303a55a43bce9a174afc9e1ad9b55", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 180673, "upload_time": "2019-01-21T13:52:43", "url": "https://files.pythonhosted.org/packages/9f/ef/41ef3dd1302dc9c401e49bd0c7fb2397545c98c2737ec38a93163bfc6830/Aruana-0.0.1.tar.gz" } ], "1.0.0": [ { "comment_text": "", "digests": { "md5": "09a7124be1e098509f4d5e9c8b2e3515", "sha256": "d76fa3b6cdd35e6562e523104d3e1252b28f653487927dbc867e09f36ab64976" }, "downloads": -1, "filename": "Aruana-1.0.0-py3-none-any.whl", "has_sig": false, "md5_digest": "09a7124be1e098509f4d5e9c8b2e3515", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 186871, "upload_time": "2019-01-21T21:43:04", "url": "https://files.pythonhosted.org/packages/aa/65/902082430b98beeed6fa1a3cb44e5b5d1f65d299a1db1d59194472027a93/Aruana-1.0.0-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "836594a9c2f07f1388e8ea630739ea45", "sha256": "639e15497c04f816f838913b9d29a9a09efe28ab75c78963fdbe8ddc93b0a6d9" }, "downloads": -1, "filename": "Aruana-1.0.0.tar.gz", "has_sig": false, "md5_digest": "836594a9c2f07f1388e8ea630739ea45", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 180668, "upload_time": "2019-01-21T21:43:07", "url": "https://files.pythonhosted.org/packages/25/6b/cee81be188add4f488de327340d0ad9410444a60e0afbca64a9ad06635b2/Aruana-1.0.0.tar.gz" } ], "1.1.1": [ { "comment_text": "", "digests": { "md5": "e679690a9826d2501a75669f5bbd419c", "sha256": "d83cb6fd6ef129272e30ef5ff09ae9e2ecd468130558557659ab5291240429fb" }, "downloads": -1, "filename": "Aruana-1.1.1-py3.5.egg", "has_sig": false, "md5_digest": "e679690a9826d2501a75669f5bbd419c", "packagetype": "bdist_egg", "python_version": "3.5", "requires_python": null, "size": 1114979, "upload_time": "2019-02-08T09:23:15", "url": "https://files.pythonhosted.org/packages/ee/84/88fd147fc05890a8671a2ac81d17b2b3f207a99e498dfc39be82ffe52448/Aruana-1.1.1-py3.5.egg" }, { "comment_text": "", "digests": { "md5": "ad2b98bef0ca212b3e1f0513749e51c0", "sha256": "ef4ecde8ae8083131e9773f7204b3bb0ae33d6c38c8e1cd8c98964ac1f3491ce" }, "downloads": -1, "filename": "Aruana-1.1.1-py3-none-any.whl", "has_sig": false, "md5_digest": "ad2b98bef0ca212b3e1f0513749e51c0", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 466075, "upload_time": "2019-02-08T09:23:12", "url": "https://files.pythonhosted.org/packages/48/82/22f17f45e45c47ad4896c097705786344df46ecc29b79d56ff9b1f340a7e/Aruana-1.1.1-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "791e087546a88de0aea0de3cb9f56daf", "sha256": "cb618ea6e5d1815a0b514a489359e380f82990403f05757be6fc39ec7324951f" }, "downloads": -1, "filename": "Aruana-1.1.1.tar.gz", "has_sig": false, "md5_digest": "791e087546a88de0aea0de3cb9f56daf", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 451497, "upload_time": "2019-02-08T09:23:18", "url": "https://files.pythonhosted.org/packages/50/fe/6a4764a1e334bf73b30ab4ea01cf216723e83259bcdc7893d7cf4b50a633/Aruana-1.1.1.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "e679690a9826d2501a75669f5bbd419c", "sha256": "d83cb6fd6ef129272e30ef5ff09ae9e2ecd468130558557659ab5291240429fb" }, "downloads": -1, "filename": "Aruana-1.1.1-py3.5.egg", "has_sig": false, "md5_digest": "e679690a9826d2501a75669f5bbd419c", "packagetype": "bdist_egg", "python_version": "3.5", "requires_python": null, "size": 1114979, "upload_time": "2019-02-08T09:23:15", "url": "https://files.pythonhosted.org/packages/ee/84/88fd147fc05890a8671a2ac81d17b2b3f207a99e498dfc39be82ffe52448/Aruana-1.1.1-py3.5.egg" }, { "comment_text": "", "digests": { "md5": "ad2b98bef0ca212b3e1f0513749e51c0", "sha256": "ef4ecde8ae8083131e9773f7204b3bb0ae33d6c38c8e1cd8c98964ac1f3491ce" }, "downloads": -1, "filename": "Aruana-1.1.1-py3-none-any.whl", "has_sig": false, "md5_digest": "ad2b98bef0ca212b3e1f0513749e51c0", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 466075, "upload_time": "2019-02-08T09:23:12", "url": "https://files.pythonhosted.org/packages/48/82/22f17f45e45c47ad4896c097705786344df46ecc29b79d56ff9b1f340a7e/Aruana-1.1.1-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "791e087546a88de0aea0de3cb9f56daf", "sha256": "cb618ea6e5d1815a0b514a489359e380f82990403f05757be6fc39ec7324951f" }, "downloads": -1, "filename": "Aruana-1.1.1.tar.gz", "has_sig": false, "md5_digest": "791e087546a88de0aea0de3cb9f56daf", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 451497, "upload_time": "2019-02-08T09:23:18", "url": "https://files.pythonhosted.org/packages/50/fe/6a4764a1e334bf73b30ab4ea01cf216723e83259bcdc7893d7cf4b50a633/Aruana-1.1.1.tar.gz" } ] }