{ "info": { "author": "Philipp Thomann", "author_email": "philipp.thomann@d-one.ai", "bugtrack_url": null, "classifiers": [ "Development Status :: 3 - Alpha", "Intended Audience :: Developers", "License :: OSI Approved :: Apache Software License", "Natural Language :: English", "Programming Language :: Python :: 3", "Programming Language :: Python :: 3.6", "Programming Language :: Python :: 3.7", "Topic :: Scientific/Engineering :: Artificial Intelligence", "Topic :: Text Processing :: Markup :: HTML" ], "description": "[![Travis Build Status]()](https://travis-ci.org/d-one/nlpeasy)\n[![pypi Version](https://img.shields.io/pypi/v/nlpeasy.svg?style=flat-square&logo=pypi&logoColor=white)](https://pypi.org/project/nlpeasy/)\n\nNLPeasy\n=======\n\nBuild NLP pipelines the easy way\n\n> **Disclaimer:** This is in Alpha stage, lot of things can go wrong.\n> It could possibly change your Elasticsearch Data the API is not fixed yet\n> and even the name NLPeasy might change.\n\n* Free software: Apache Software License 2.0\n\n\nUsage\n-----\n\nFor this example to completely work you need to have Python at least in Version 3.6 installed.\nAlso you need to have install and start either\n- **Docker** , direct download links for\n [Mac (DMG)](https://download.docker.com/mac/stable/Docker.dmg) and\n [Windows (exe)](https://download.docker.com/win/stable/Docker%20for%20Windows%20Installer.exe).\n- **Elasticsearch** and **Kibana**:\n or\n (pure Apache licensed version)\n\nThen on the terminal issue:\n```bash\npython -m venv venv\nsource venv/bin/activate\npip install nlpeasy scikit-learn\npython -m spacy download en_core_web_md\n```\nThe package `scikit-learn` is just used in this example to get the newsgroups data and preprocess it.\nThe last command downloads a spacy model for the english language -\nfor the following you need to have at least it's `md` (=medium) version which has wordvectors.\n\n```python\nimport pandas as pd\nimport nlpeasy as ne\nfrom sklearn.datasets import fetch_20newsgroups\n\n# connect to running elastic or else start an Open Source stack on your docker\nelk = ne.connect_elastic(dockerPrefix='nlp', elkVersion='7.4.0', mountVolumePrefix=None)\n# If it is started on docker it will on the first time pull the images (1.3GB)!\n# Setting mountVolumePrefix=\"./elastic-data/\" would keep the data of elastic in your\n# filesystems and then the data survives container restarts\n\n# read data as Pandas data frame\nnews_raw = fetch_20newsgroups(remove=('headers', 'footers', 'quotes'))\nnews_groups = [news_raw['target_names'][i] for i in news_raw['target']]\nnews = pd.DataFrame({'newsgroup': news_groups, 'message': news_raw['data']})\n\n# setup NLPeasy pipeline with name for the elastic index and set the text column\npipeline = ne.Pipeline(index='news', textCols=['message'], tagCols=['newsgroup'], elk=elk)\n\npipeline += ne.VaderSentiment('message', 'sentiment')\npipeline += ne.SpacyEnrichment(nlp='en_core_web_md', cols=['message'], vec=True)\n\n# do the pipeline - just for first 100, the whole thing would take 10 minutes\nnews_enriched = pipeline.process(news.head(10000000), writeElastic=True)\n\n# Create Kibana Dashboard of all the columns\npipeline.create_kibana_dashboard()\n\n# open Kibana in webbrowser\nelk.show_kibana()\n```\n\nLet's have some fun outside of Elastic/Kibana - but this needs `pip install matplotlib`\n```python\nimport numpy as np\nfrom scipy.cluster.hierarchy import dendrogram, linkage\nimport matplotlib.pyplot as plt\ngrouped = news_enriched.loc[~news_enriched.message_vec.isna()].groupby('newsgroup')\ngroup_vec = grouped.apply(lambda z: np.stack(z.message_vec.values).mean(axis=0))\nclust = linkage(np.stack(group_vec), 'ward')\n# calculate full dendrogram\nplt.figure(figsize=(10, 10))\nplt.title('Hierarchical Clustering Dendrogram Newsgroups')\nplt.xlabel('sample index')\nplt.ylabel('distance')\ndendrogram(\n clust,\n leaf_rotation=0., # rotates the x axis labels\n leaf_font_size=8., # font size for the x axis labels\n labels=group_vec.index,\n orientation='left'\n)\nplt.show()\n```\n\nInstallation\n------------\n\nPrerequisites:\n- Python 3 (we use Python 3.7)\n- Elastic: Several possibilities\n - Have Docker installed - needs to have the docker package installed (see below).\n - Install and start Elasticsearch and Kibana:\n or\n (pure Apache licensed version)\n - Use any running Elasticsearch and Kibana (on premise or cloud)...\n- Pretrained Models: See below for Spacy Language Models and WordVectors\n\nIt is recommended to use a virtual environment:\n```bash\ncd $PROJECT_DIR\npython -m venv venv\nsource venv/bin/activate\n```\nThe source statement has to be repeated whenever you open a new terminal.\n\nThen install\n```bash\npip install nlpeasy\n```\nOr the development version from GitHub:\n```bash\npip install --upgrade git+https://github.com/d-one/nlpeasy\n```\n\nIf you want to use spaCy language models download them (90-200 MB), e.g.\n```bash\npython -m spacy download en_core_web_md\n# and/or\npython -m spacy download de_core_news_md\n```\nIf you want to use pretrained [FastText-Wordvectors](https://fasttext.cc/docs/en/pretrained-vectors.html) (each ~7GB):\n```bash\ncurl -O https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.en.zip\ncurl -O https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.de.zip\n```\n\nIf you want to use Jupyter, install it to the virtual environment:\n```bash\npip install jupyterlab\n```\n\n### Development\nTo install this module in Dev-mode, i.e. change files and reload module:\n```bash\ngit clone https://github.com/d-one/nlpeasy\ncd nlpeasy\n```\n\nIt is recommended to use a virtual environment:\n```bash\npython -m venv venv\nsource venv/bin/activate\n```\n\nInstall the version in edit mode:\n```bash\npip install -e .\n```\n\nIn Jupyter you can have reloaded code when you change the files as in:\n```python\n%load_ext autoreload\n%autoreload 2\n```\n\nFeatures\n--------\n\n* Pandas based pipeline\n* Support for any extensions - now includes some for Regex, spaCy, VaderSentiment\n* Write results to ElasticSearch\n* Automatic Kibana dashboard generation\n* Have Elastic started in Docker if it is not installed locally or remotely\n* Apache License 2.0\n\nCredits\n-------\n\nThis package was created with [Cookiecutter]() and the [`audreyr/cookiecutter-pypackage`] project template.", "description_content_type": "text/markdown", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/d-one/nlpeasy", "keywords": "nlpeasy", "license": "Apache Software License 2.0", "maintainer": "", "maintainer_email": "", "name": "nlpeasy", "package_url": "https://pypi.org/project/nlpeasy/", "platform": "", "project_url": "https://pypi.org/project/nlpeasy/", "project_urls": { "Homepage": "https://github.com/d-one/nlpeasy" }, "release_url": "https://pypi.org/project/nlpeasy/0.6.2/", "requires_dist": null, "requires_python": "", "summary": "Easy Peasy Language Squeezy", "version": "0.6.2" }, "last_serial": 5969075, "releases": { "0.6.0": [ { "comment_text": "", "digests": { "md5": "f1e438643afad349ef148f260d54e07e", "sha256": "040d7a662177b4805914e76adc247ec7ccef2e8b0208a01978b52fb4b06dcd9c" }, "downloads": -1, "filename": "nlpeasy-0.6.0.tar.gz", "has_sig": false, "md5_digest": "f1e438643afad349ef148f260d54e07e", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 23745, "upload_time": "2019-10-10T14:34:00", "url": "https://files.pythonhosted.org/packages/0c/0c/920e87b91d66ef3ee7afef7fe13026f2f9ba47081f5fa81eb906c5f82b03/nlpeasy-0.6.0.tar.gz" } ], "0.6.1": [ { "comment_text": "", "digests": { "md5": "ccad599cc5ff48bb736e184bb18678a1", "sha256": "23843df46d03e7e480676e3c872515d68d93a693cb57e3cbd7ea6b339bdb4442" }, "downloads": -1, "filename": "nlpeasy-0.6.1.tar.gz", "has_sig": false, "md5_digest": "ccad599cc5ff48bb736e184bb18678a1", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 22454, "upload_time": "2019-10-11T01:49:31", "url": "https://files.pythonhosted.org/packages/66/28/dc0af2361fbe3fa6df92075b754affd5c4db3ab8f7b73eb841f5a2de4f6f/nlpeasy-0.6.1.tar.gz" } ], "0.6.2": [ { "comment_text": "", "digests": { "md5": "19290e9358382f0aef09ab4771ba94ba", "sha256": "90356e863628155dff4bcecd2646df20e52adb56efb4aafdc44d7f5ad80b87bb" }, "downloads": -1, "filename": "nlpeasy-0.6.2.tar.gz", "has_sig": false, "md5_digest": "19290e9358382f0aef09ab4771ba94ba", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 24263, "upload_time": "2019-10-13T22:58:41", "url": "https://files.pythonhosted.org/packages/91/b6/0891b6aa3e834ca2e1884711d4d887870580985339b2674e4e71d791cc80/nlpeasy-0.6.2.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "19290e9358382f0aef09ab4771ba94ba", "sha256": "90356e863628155dff4bcecd2646df20e52adb56efb4aafdc44d7f5ad80b87bb" }, "downloads": -1, "filename": "nlpeasy-0.6.2.tar.gz", "has_sig": false, "md5_digest": "19290e9358382f0aef09ab4771ba94ba", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 24263, "upload_time": "2019-10-13T22:58:41", "url": "https://files.pythonhosted.org/packages/91/b6/0891b6aa3e834ca2e1884711d4d887870580985339b2674e4e71d791cc80/nlpeasy-0.6.2.tar.gz" } ] }