{ "info": { "author": "Michal Szczepanski", "author_email": "michal@vane.pl", "bugtrack_url": null, "classifiers": [ "Development Status :: 3 - Alpha", "Intended Audience :: Developers", "License :: OSI Approved :: MIT License", "Operating System :: OS Independent", "Programming Language :: Python :: 2", "Programming Language :: Python :: 3" ], "description": "prosecco\n====\n\n[![GitHub](https://img.shields.io/github/license/vane/prosecco)](https://github.com/vane/prosecco/blob/master/LICENSE)\n[![pypi](https://img.shields.io/pypi/v/prosecco)](https://pypi.org/project/prosecco/)\n[![GitHub commits since tagged version](https://img.shields.io/github/commits-since/vane/prosecco/0.0.7)](https://github.com/vane/prosecco/compare/0.0.7...master)\n[![GitHub last commit](https://img.shields.io/github/last-commit/vane/prosecco)](https://github.com/vane/prosecco)\n\n\n## Description\n\nNLP engine with text extraction capabilities that can be easily extended to desired needs.\n\nCan be used to build chat bots, question answer machines (see [example/qa.py](https://github.com/vane/prosecco/blob/master/example/qa.py)), text converters.\n\nExtract words or even whole sentences in ordered manner. \nGet position of found text. \nUse ```Condition``` class and mark data using regex or string comparasion. \nExtend each part of it in easy manner. ( see [example/custom_condition_class.py](https://github.com/vane/prosecco/blob/master/example/custom_condition_class.py)).\n\n## Install\n```bash\npip install prosecco\n```\n## Usage\n\n### Basic\n[example/basic.py](https://github.com/vane/prosecco/blob/master/example/basic.py)\n```python\nfrom prosecco import Prosecco, Condition, EnglishWordNormalizer\n\n# Read wikipedia https://en.wikipedia.org/wiki/Superhero\nwith open(\"superhero.txt\") as f:\n text = f.read()\n\n# 1. Create conditions with hero names\nconditions = [\n Condition(lemma_type=\"hero|dc\", compare=[\"batman\", \"superman\", \"wonder woman\"], lower=True),\n Condition(lemma_type=\"hero|marvel\", normalizer=EnglishWordNormalizer(),\n compare=[\"spiderman\", \"iron man\", \"black panther\"], lower=True)\n]\n# 2. Create prosecco\np = Prosecco(conditions=conditions)\n# 3. Let's drink and print output\np.drink(text, progress=True)\nlemmas = set(p.get_lemmas(type=\"hero\"))\nprint(\" \".join(map(str, lemmas)))\n```\n\n### Output\n```Batman[hero|dc][start:1089] Wonder Woman[hero|dc][start:2100] Iron Man[hero|marvel][start:2184] Superman[hero|dc][start:2070] Spider-Man[hero|marvel][start:2080] Black Panther[hero|marvel][start:17690]```\n\n### Advanced\n[example/advanced.py](https://github.com/vane/prosecco/blob/master/example/advanced.py)\n```python\nfrom prosecco import *\n\ntext = \"\"\"Chrz\u0105szcz brzmi w trzcinie w Szczebrzeszynie.\nZ\u0105b zupa z\u0119bowa, d\u0105b zupa d\u0119bowa.\nGdzie Rzym, gdzie Krym. W Pacanowie kozy kuj\u0105.\nTak, je\u015bli mam szcz\u0119\u015bliwy by\u0107, to w Gda\u0144sku musz\u0119 \u017cy\u0107! \n\"\"\"\n\n# 1. Create condition with city names\ncities = [\"szczebrzeszyn\", \"pacanow\", \"gdansk\", \"rzym\", \"krym\"]\nanimals = [\"koz\", \"chrzaszcz\"]\n# 2. Normalizer to remove polish specific charset\nn = CharsetNormalizer(Charset.PL_EN)\n# 3. Stemmer to remove suffix\ns = SuffixStemmer(language=\"pl\")\n# 4. Conditions for city and animal\ncity_condition = Condition(lemma_type=\"city\", compare=cities, normalizer=n, stemmer=s, lower=True)\nanimal_condition = Condition(lemma_type=\"animal\", compare=animals, normalizer=n, stemmer=s, lower=True)\nconditions = [city_condition, animal_condition]\n# 5. Create tokenizer for polish charset\ntokenizer = LanguageTokenizer(Charset.PL)\n# 6. Get list of tokens\ntokens = tokenizer.tokenize(text)\n# 7. Create visitor with conditions provided in step 1\nvisitor = Visitor(conditions=conditions)\n# 8. Parse tokens based on visitor conditions\nlexer = Lexer(tokens=tokens, visitor=visitor)\n# 9. Get list of lemmas\nlemmas = lexer.lex()\n# 10. filter found cities and print output\nfound = filter(lambda l: l.type == \"city\", lemmas)\nprint(\" \".join(map(str, found)))\n# 11. filter found anumals and print output\nfound = list(filter(lambda l: l.type == \"animal\", lemmas))\nprint(\" \".join(map(str, found)))\n# 12. print exact words from text\nfor l in list(found):\n print(text[l.start:l.start+len(l.sentence)])\n``` \n\n### Output\n```bash\nSzczebrzeszynie[city][start:29] Rzym[city][start:86] Krym[city][start:98] Pacanowie[city][start:106] Gda\u0144sku[city][start:163]\nChrz\u0105szcz[animal][start:0] kozy[animal][start:116]\nChrz\u0105szcz\nkozy\n```\n\n### QA ( question answer machine )\n[example/qa.py](https://github.com/vane/prosecco/blob/master/example/qa.py)\n```python\nfrom datetime import datetime\nfrom prosecco import Prosecco, Condition, EnglishWordNormalizer, SuffixStemmer\n\n\nmessages = \"\"\"Whats the time ?\nHow long boil egg?\n100 miles to km\n30,3 celsius to farenheit\"\"\"\n\n# create conditions\nquestion = (\"what\", \"whats\", \"how\")\nmeasure = (\"celsius\", \"farenheit\", \"mile\", \"km\", \"kilometer\", \"time\", \"long\")\ncooking = (\"boil\",\"cook\", \"fry\")\nfood = (\"egg\",)\nconditions = [\n Condition(lemma_type=\"question\", compare=question, normalizer=EnglishWordNormalizer(), lower=True),\n Condition(lemma_type=\"measure\", compare=measure,\n normalizer=EnglishWordNormalizer(),\n stemmer=SuffixStemmer(language=\"en\"),\n lower=True),\n Condition(lemma_type=\"cooking\", compare=cooking, normalizer=EnglishWordNormalizer(), lower=True),\n Condition(lemma_type=\"food\", compare=food,\n normalizer=EnglishWordNormalizer(),\n stemmer=SuffixStemmer(language=\"en\"),\n lower=True),\n Condition(lemma_type=\"number\", compare=r\"\\d+([\\.\\,]\\d+)?\", regex=True, until_character=\" \"),\n]\n\ndef printer(data):\n print(\"Robot : \", data)\n\n# time condition\ndef resolve_time(p):\n printer(datetime.now())\n\n# cooking condition\ndef resolve_cooking(p):\n if check_condition(p.get_lemmas(\"cooking|food\"), [\"boil\", \"egg\"]):\n printer(\"\"\"\nHard for 9-15 minutes.\nSoft for 6-8 minutes.\"\"\")\n return True\n\ndef resolve_measure(p):\n measures = p.get_lemmas(\"measure\")\n fr = measures[0]\n to = measures[1]\n numbers = p.get_lemmas(\"number\")\n if len(numbers) == 0:\n printer(\"No number for conversion provided\")\n return True\n value = float(numbers[0].sentence.replace(\",\", \".\"))\n if fr.condition == \"mile\" and to.condition == \"km\":\n printer(value / 0.62137119)\n return True\n elif fr.condition == \"km\" and to.condition == \"mile\":\n printer(value * 0.62137119)\n return True\n elif fr.condition == \"celsius\" and to.condition == \"farenheit\":\n print(9/5 * value + 32)\n return True\n elif fr.condition == \"farenheit\" and to.condition == \"celsius\":\n print((value - 32) * 5/9)\n return True\n return False\n\ndef check_condition(lemmas, conditions):\n for l in lemmas:\n for c in conditions:\n if l.condition == c:\n conditions.remove(c)\n return len(conditions) == 0\n\ndef resolve(p, m):\n if len(p.get_lemmas(\"question\")) > 0:\n if check_condition(p.get_lemmas(\"measure\"), [\"time\"]):\n resolve_time(p)\n return True\n elif len(p.get_lemmas(\"cooking\")) > 0:\n return resolve_cooking(p)\n elif len(p.get_lemmas(\"measure\")) > 0:\n return resolve_measure(p)\n return False\n\nfor m in messages.split('\\n'):\n print(\"Question : \", m)\n p = Prosecco(conditions=conditions)\n p.drink(m)\n if not resolve(p, m):\n print(\"Unsupported resolver : \", p.lemmas)\n```\n\n### Output\n\n```bash\nQuestion : Whats the time ?\nRobot : 2019-08-13 20:38:06.948720\nQuestion : How long boil egg?\nRobot : \nHard for 9-15 minutes.\nSoft for 6-8 minutes.\nQuestion : 100 miles to km\nRobot : 160.93440057946685\nQuestion : 30,3 celsius to farenheit\n86.53999999999999\n```\n\n\n", "description_content_type": "text/markdown", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/vane/prosecco", "keywords": "", "license": "MIT", "maintainer": "", "maintainer_email": "", "name": "prosecco", "package_url": "https://pypi.org/project/prosecco/", "platform": "", "project_url": "https://pypi.org/project/prosecco/", "project_urls": { "Homepage": "https://github.com/vane/prosecco" }, "release_url": "https://pypi.org/project/prosecco/0.0.7/", "requires_dist": null, "requires_python": "", "summary": "Simple, extendable nlp engine that can extract data based on provided conditions.", "version": "0.0.7" }, "last_serial": 5674443, "releases": { "0.0.2": [ { "comment_text": "", "digests": { "md5": "c1dc1db1725df83fad66c4e44e9399f9", "sha256": "0f602c87785447d73c482140665e585a178be0d6a3ebc54807b2bd98533884f5" }, "downloads": -1, "filename": "prosecco-0.0.2-py2.7.egg", "has_sig": false, "md5_digest": "c1dc1db1725df83fad66c4e44e9399f9", "packagetype": "bdist_egg", "python_version": "2.7", "requires_python": null, "size": 1791, "upload_time": "2019-08-10T23:28:07", "url": "https://files.pythonhosted.org/packages/ab/5b/048c3ff2720e7cb7f21039b5f6fda25f332c5c6cd54a7bb211ac62d62e70/prosecco-0.0.2-py2.7.egg" }, { "comment_text": "", "digests": { "md5": "df7e1f717f1fff70afecc5482e530afe", "sha256": "43cc147b0b5b6f95f19d1e710d113409eaa954d6d99553a0662c070d1a0e7b14" }, "downloads": -1, "filename": "prosecco-0.0.2-py3.7.egg", "has_sig": false, "md5_digest": "df7e1f717f1fff70afecc5482e530afe", "packagetype": "bdist_egg", "python_version": "3.7", "requires_python": null, "size": 1791, "upload_time": "2019-08-10T23:28:09", "url": "https://files.pythonhosted.org/packages/00/84/809ad719e5407933c5fb49e74c95d2923a8eb0a592319cb640f229aea7a6/prosecco-0.0.2-py3.7.egg" }, { "comment_text": "", "digests": { "md5": "a9d469ae9660981b369455f3d96afcf6", "sha256": "541362749b46361548de1740794155d21da4e0e9a5f527e44285fc1a627a787c" }, "downloads": -1, "filename": "prosecco-0.0.2-py3-none-any.whl", "has_sig": false, "md5_digest": "a9d469ae9660981b369455f3d96afcf6", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 2754, "upload_time": "2019-08-10T23:29:45", "url": "https://files.pythonhosted.org/packages/ce/43/3be122ef84e69a87c3eb508b8fbdc54dbf56304d40e57f3d34e555d96327/prosecco-0.0.2-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "59c42358b29a693944a7f54b2db52341", "sha256": "8ed142f90239dc34993ee196b6d27fa24321610dfe455ae730773e47a352441f" }, "downloads": -1, "filename": "prosecco-0.0.2.tar.gz", "has_sig": false, "md5_digest": "59c42358b29a693944a7f54b2db52341", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 1975, "upload_time": "2019-08-10T23:29:46", "url": "https://files.pythonhosted.org/packages/2f/90/c438325421dd04cf1fc406240f4746ff9b2885c25ff2c0cae4cebb96dbee/prosecco-0.0.2.tar.gz" } ], "0.0.3": [ { "comment_text": "", "digests": { "md5": "23b205b67babd79ceb3c89fcba3f974c", "sha256": "2704d8fd0615df17bbe942d163a205dabc57a12df4c61e2e973c0daf748a8919" }, "downloads": -1, "filename": "prosecco-0.0.3-py3-none-any.whl", "has_sig": false, "md5_digest": "23b205b67babd79ceb3c89fcba3f974c", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 3011, "upload_time": "2019-08-11T13:18:07", "url": "https://files.pythonhosted.org/packages/f9/fa/24b3280a1fa752513e8e529ab2a9c3eb6f4fb7f556973fef7852ef7117f2/prosecco-0.0.3-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "e5e191f86a42716d85b35e2bc1790728", "sha256": "751e639d6a195f8fb8f4bb3ad7545e16ff6d93ba33c72b5f7da9001de3ccd280" }, "downloads": -1, "filename": "prosecco-0.0.3.tar.gz", "has_sig": false, "md5_digest": "e5e191f86a42716d85b35e2bc1790728", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 2287, "upload_time": "2019-08-11T13:18:09", "url": "https://files.pythonhosted.org/packages/42/a5/9e56ac5aea984904f7aad98f31e30320456899f1fb5c16e466094ba846bb/prosecco-0.0.3.tar.gz" } ], "0.0.4": [ { "comment_text": "", "digests": { "md5": "04e557caee90bc0de194ecad01b02cb2", "sha256": "aad1e04f75b89175e16925ea44842785a8c5c8711d36552539510098c9ddfa5b" }, "downloads": -1, "filename": "prosecco-0.0.4-py3-none-any.whl", "has_sig": false, "md5_digest": "04e557caee90bc0de194ecad01b02cb2", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 5013, "upload_time": "2019-08-11T20:56:43", "url": "https://files.pythonhosted.org/packages/15/94/9cc032a8082b6b638ed40c14ff421fe877118e24c92373409ac103a20d4b/prosecco-0.0.4-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "2e62a6b86ddae90b3331c88bd082eca3", "sha256": "25f6f626ec1423bbf9c2d4452ee2d6bdf230905728ac25c5633331a9f66b3500" }, "downloads": -1, "filename": "prosecco-0.0.4.tar.gz", "has_sig": false, "md5_digest": "2e62a6b86ddae90b3331c88bd082eca3", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 3724, "upload_time": "2019-08-11T20:56:45", "url": "https://files.pythonhosted.org/packages/62/e4/abba096c4a6267b5b4a8d120409ef4461d379f7dfe1944e042153172a865/prosecco-0.0.4.tar.gz" } ], "0.0.5": [ { "comment_text": "", "digests": { "md5": "1023230ac11187e81cb6f93caeb594c5", "sha256": "7b1889f2a760f041b5f0fa173e558f089386fee3c51d0c04957a76dee010618f" }, "downloads": -1, "filename": "prosecco-0.0.5-py3-none-any.whl", "has_sig": false, "md5_digest": "1023230ac11187e81cb6f93caeb594c5", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 5499, "upload_time": "2019-08-12T17:53:16", "url": "https://files.pythonhosted.org/packages/ca/20/46ff4d285c46b3873f98ae0f8e179b39a69d2a80f640f84c324886e802c2/prosecco-0.0.5-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "efbcc4c2a88e451d9ce38d990b64f8cf", "sha256": "06ccff93d7358915fb5365ae0960e4ba3983200a2f1f1577b9ce610511525cd1" }, "downloads": -1, "filename": "prosecco-0.0.5.tar.gz", "has_sig": false, "md5_digest": "efbcc4c2a88e451d9ce38d990b64f8cf", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 4282, "upload_time": "2019-08-12T17:53:17", "url": "https://files.pythonhosted.org/packages/f2/d7/89a6f65b8c59babea75b0fe8864b05c6d5f6865e8ec6de0773181fea79bf/prosecco-0.0.5.tar.gz" } ], "0.0.6": [ { "comment_text": "", "digests": { "md5": "c18393bf97693b6359ba7732b46f580b", "sha256": "197224d1e1206b64b33183e1ee43d10ec778c232fdb86283c58ff494c307e6f8" }, "downloads": -1, "filename": "prosecco-0.0.6-py3-none-any.whl", "has_sig": false, "md5_digest": "c18393bf97693b6359ba7732b46f580b", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 33067, "upload_time": "2019-08-14T01:24:10", "url": "https://files.pythonhosted.org/packages/b4/8a/b006b86483df44018d5b6201925bf0dece4bf49f82b49532186bddf98e8d/prosecco-0.0.6-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "f896c4138745c81c854453f838ee06b6", "sha256": "472277ef1012b2cdd8cba7479a1b3f6f71d3227bfaa7eec8d163f48ffaaae122" }, "downloads": -1, "filename": "prosecco-0.0.6.tar.gz", "has_sig": false, "md5_digest": "f896c4138745c81c854453f838ee06b6", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 29449, "upload_time": "2019-08-14T01:24:11", "url": "https://files.pythonhosted.org/packages/25/d1/798da27aa28dc504fbc50015ae4d30339f24244a49f276acd51a2ce84cc0/prosecco-0.0.6.tar.gz" } ], "0.0.7": [ { "comment_text": "", "digests": { "md5": "d24ddab1704ea455328f4486d2cd09a6", "sha256": "6c387a7f2128edb77817e49de45463b7558e5ca2a34864b75f3ce2e80a91a4f2" }, "downloads": -1, "filename": "prosecco-0.0.7-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "d24ddab1704ea455328f4486d2cd09a6", "packagetype": "bdist_wheel", "python_version": "py2.py3", "requires_python": null, "size": 32567, "upload_time": "2019-08-14T02:25:05", "url": "https://files.pythonhosted.org/packages/8c/15/088adaf5316639c3f74325dcbc6ef9371efd07d5b31a79588f7a67351595/prosecco-0.0.7-py2.py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "b3cbf57ab0e7b98ff1247cf3e00ce739", "sha256": "38ea9c640dbe8123b6fbe1e67b1c98a7bc5296d37d7de83de202ff9c6b4ca7d0" }, "downloads": -1, "filename": "prosecco-0.0.7.tar.gz", "has_sig": false, "md5_digest": "b3cbf57ab0e7b98ff1247cf3e00ce739", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 31250, "upload_time": "2019-08-14T02:25:07", "url": "https://files.pythonhosted.org/packages/be/4f/88592b3fc7703277e5e8e37c6918006d4c11c6c40253c643952274db0e7a/prosecco-0.0.7.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "d24ddab1704ea455328f4486d2cd09a6", "sha256": "6c387a7f2128edb77817e49de45463b7558e5ca2a34864b75f3ce2e80a91a4f2" }, "downloads": -1, "filename": "prosecco-0.0.7-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "d24ddab1704ea455328f4486d2cd09a6", "packagetype": "bdist_wheel", "python_version": "py2.py3", "requires_python": null, "size": 32567, "upload_time": "2019-08-14T02:25:05", "url": "https://files.pythonhosted.org/packages/8c/15/088adaf5316639c3f74325dcbc6ef9371efd07d5b31a79588f7a67351595/prosecco-0.0.7-py2.py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "b3cbf57ab0e7b98ff1247cf3e00ce739", "sha256": "38ea9c640dbe8123b6fbe1e67b1c98a7bc5296d37d7de83de202ff9c6b4ca7d0" }, "downloads": -1, "filename": "prosecco-0.0.7.tar.gz", "has_sig": false, "md5_digest": "b3cbf57ab0e7b98ff1247cf3e00ce739", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 31250, "upload_time": "2019-08-14T02:25:07", "url": "https://files.pythonhosted.org/packages/be/4f/88592b3fc7703277e5e8e37c6918006d4c11c6c40253c643952274db0e7a/prosecco-0.0.7.tar.gz" } ] }