{ "info": { "author": "Everton Tomalok", "author_email": "evertontomalok123@gmail.com", "bugtrack_url": null, "classifiers": [ "License :: OSI Approved :: MIT License", "Operating System :: OS Independent", "Programming Language :: Python :: 3" ], "description": "# preprocessingtext\n \nA tool short, but very usefull to help in pre-processing data from texts.\n\n \n\n## How to Install\n \n\n >> pip install --user preprocessingtext\n \n \n\n## Usage\n\n#### Using stem_sentence()\n\n >> from preprocessingtext import CleanSentence\n \n >> cleaner = CleanSentence(idiom='portuguese')\n \n >> cleaner.stem_sentence(sentence=\"String\", remove_stop_words=True, remove_punctuation=True, normalize_text=True, replace_garbage=True)\n\nTo init a a class, you need to pass the idiom that you want to work. The custom value, is \"portuguese\". \n\nBefore, you can instance a new object from CleanSentence, and call the method stem_sentence. You can choose in use \n\"remove_stop_words\" from string (pass True or False) and \"remove_punctuation\" from string (pass True or False), \n\"replace_garbage\" (True or False) removing values from data, and \"normalize_text\" (True or False) to normalize text.\n \n#### Usage of list_to_replace\nYou can improve what you need to replace (clean) in your data. You can use \"cleaner.list_to_replace.append('what_you_need_to_add')\",\nor you can pass a new list of values: cleaner.list_to_replace = ['item1', 'item2', 'item3']\n\n # Custom value of list_to_replace\n >> list_to_replace = ['https://', 'http://', '$']\n \n # Adding new values\n list_to_replace.append('item1')\n ['https://', 'http://', 'R$', '$', 'item1']\n \n # Replacing values\n >> list_to_replace = ['item1', 'item2', 'item3']\n \n\n#### Using tokenizer()\n\n >> cleaner.tokenizer('Um exemplo de tokens.')\n \n >> ['Um', 'exemplo', 'de', 'tokens']\n\n## Example\n \n ## Using all parameters of stem_sentence()\n >> string = \"Eu sou uma senten\u00e7a comum. Serei pr\u00e9-processada com este modulo, veremos a serguir usando os m\u00e9todos disponiveis\"\n >> cleaner.stem_sentence(sentence=string,\n remove_stop_words=True,\n remove_punctuation=True,\n normalize_text=True,\n replace_garbage=True\n )\n >> sentenc comum pre-process modul ver segu us metod disponi\n \n ## Don't using remove_stop_words\n >> print(cleaner.stem_sentence(sentence=string,\n remove_stop_words=False,\n remove_punctuation=True,\n normalize_text=True,\n replace_garbage=True\n )\n )\n >> eu sou uma sentenc comum ser pre-process com est modul ver a segu us os metod disponi\n \n ## Tokenizer\n >> print(cleaner.tokenizer('Um exemplo de tokens.'))\n >> ['Um', 'exemplo', 'de', 'tokens']\n \n ## Cleaning garbage words\n >> string_web = 'Acesse esses links para ganhar dinheiro: https://easymoney.com.net and http://falselink.com'\n >> cleaner.stem_sentence(sentence=string_web,\n remove_stop_words=False,\n remove_punctuation=True,\n replace_garbage=True\n )\n >> acess ess link par ganh dinh easymoney.com.net and falselink.com\n \n## English example\n >> en_cleaner = CleanSentences(idiom='english')\n \n >> string_web = 'Access these links to gain money: https://easymoney.com.net and http://falselink.com'\n >> print(en_cleaner.stem_sentence(sentence=string_web,\n remove_stop_words=True,\n remove_punctuation=True,\n replace_garbage=True\n )\n ) \n >> acc link gain money easymoney.com.net falselink.com\n\n \n# Author\n{\n 'name': Everton Tomalok, \n'email': evertontomalok123@gmail.com \n}", "description_content_type": "", "docs_url": null, "download_url": "https://github.com/EvertonTomalok/preprocessingtext/archive/master.zip", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/EvertonTomalok/preprocessingtext", "keywords": "pre-processing text", "license": "MIT", "maintainer": "", "maintainer_email": "", "name": "preprocessingtext", "package_url": "https://pypi.org/project/preprocessingtext/", "platform": "", "project_url": "https://pypi.org/project/preprocessingtext/", "project_urls": { "Download": "https://github.com/EvertonTomalok/preprocessingtext/archive/master.zip", "Homepage": "https://github.com/EvertonTomalok/preprocessingtext" }, "release_url": "https://pypi.org/project/preprocessingtext/0.0.4/", "requires_dist": null, "requires_python": "", "summary": "A series of methods to help you work pre processing of text in general, like stem, tokenizer and others.", "version": "0.0.4" }, "last_serial": 4232800, "releases": { "0.0.1": [ { "comment_text": "", "digests": { "md5": "af086772775d57e401eefd2e782def40", "sha256": "bbbd642ddbea1d3f28e5194630a80b08831d6dfa054fdadecf014fe07082c3d8" }, "downloads": -1, "filename": "preprocessingtext-0.0.1.tar.gz", "has_sig": false, "md5_digest": "af086772775d57e401eefd2e782def40", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 3185, "upload_time": "2018-09-02T02:54:58", "url": "https://files.pythonhosted.org/packages/f8/06/18118cb55592ff43680ca72843b44328dcf9a56fe54ed0fdf351fc2e8dba/preprocessingtext-0.0.1.tar.gz" } ], "0.0.2": [ { "comment_text": "", "digests": { "md5": "d6a7bc6ac5aace0bb62e6d1f56f2f093", "sha256": "5151e77193a4a50044bd44827b4b15352ba42b0d4891ac7cd132d681299b114a" }, "downloads": -1, "filename": "preprocessingtext-0.0.2.tar.gz", "has_sig": false, "md5_digest": "d6a7bc6ac5aace0bb62e6d1f56f2f093", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 3261, "upload_time": "2018-09-02T03:24:37", "url": "https://files.pythonhosted.org/packages/29/50/2f20934af43649a99eeb5235deb0231341aaf13c6567d6b86003c2789c42/preprocessingtext-0.0.2.tar.gz" } ], "0.0.3": [ { "comment_text": "", "digests": { "md5": "9abb8cf031474906edfadc1d190708e3", "sha256": "e4063e1d61928d6de8971f928b2b0851ee9ca24c44fb9ef1378f4663eeb20dc3" }, "downloads": -1, "filename": "preprocessingtext-0.0.3.tar.gz", "has_sig": false, "md5_digest": "9abb8cf031474906edfadc1d190708e3", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 3829, "upload_time": "2018-09-02T23:25:04", "url": "https://files.pythonhosted.org/packages/95/c1/315ed87f95ca4939d799ae8345f99f092eee3e95715bc9ea2c269da533ff/preprocessingtext-0.0.3.tar.gz" } ], "0.0.4": [ { "comment_text": "", "digests": { "md5": "78270cd411bb65baaaa6bc2b7e88c064", "sha256": "81a3365bf106b901bf3bd968d75e9ba0190b363fd943d6e140e76d84f7ff2e75" }, "downloads": -1, "filename": "preprocessingtext-0.0.4.tar.gz", "has_sig": false, "md5_digest": "78270cd411bb65baaaa6bc2b7e88c064", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 3866, "upload_time": "2018-09-02T23:31:49", "url": "https://files.pythonhosted.org/packages/35/33/01f321ffff1e01f90fd97974fc6bd1e0eb1ad35d8dd35b795d54eeda3c69/preprocessingtext-0.0.4.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "78270cd411bb65baaaa6bc2b7e88c064", "sha256": "81a3365bf106b901bf3bd968d75e9ba0190b363fd943d6e140e76d84f7ff2e75" }, "downloads": -1, "filename": "preprocessingtext-0.0.4.tar.gz", "has_sig": false, "md5_digest": "78270cd411bb65baaaa6bc2b7e88c064", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 3866, "upload_time": "2018-09-02T23:31:49", "url": "https://files.pythonhosted.org/packages/35/33/01f321ffff1e01f90fd97974fc6bd1e0eb1ad35d8dd35b795d54eeda3c69/preprocessingtext-0.0.4.tar.gz" } ] }