{ "info": { "author": "Bureaucratic Labs", "author_email": "hello@b-labs.pro", "bugtrack_url": null, "classifiers": [ "Development Status :: 3 - Alpha", "Intended Audience :: Developers", "Intended Audience :: Science/Research", "License :: OSI Approved :: MIT License", "Programming Language :: Python", "Programming Language :: Python :: 3.6", "Programming Language :: Python :: 3.7", "Programming Language :: Python :: Implementation :: CPython", "Topic :: Scientific/Engineering :: Information Analysis", "Topic :: Software Development :: Libraries :: Python Modules", "Topic :: Text Processing :: Linguistic" ], "description": "# Dostoevsky [![Build Status](https://travis-ci.org/bureaucratic-labs/dostoevsky.svg?branch=master)](https://travis-ci.org/bureaucratic-labs/dostoevsky) [![FOSSA Status](https://app.fossa.io/api/projects/git%2Bgithub.com%2Fbureaucratic-labs%2Fdostoevsky.svg?type=shield)](https://app.fossa.io/projects/git%2Bgithub.com%2Fbureaucratic-labs%2Fdostoevsky?ref=badge_shield)\n\n\n\nSentiment analysis library for russian language\n\n## Install\n\nPlease note that `Dostoevsky` supports only Python 3.6+\n\n```bash\n$ pip install dostoevsky\n```\n\n## Social network model [FastText]\n\nThis model was trained on [RuSentiment dataset](https://github.com/text-machine-lab/rusentiment) and achieves up to ~0.71 F1 score. \nHyperparameters used for training:\n```\nepoch = 10\nlr = 0.21909\ndim = 64\nminCount = 1\nwordNgrams = 3\nminn = 2\nmaxn = 5\nbucket = 259929\ndsub = 2\nloss = one-vs-all\n```\n\n### Usage\n\nFirst of all, you'll need to download binary model:\n\n```bash\n$ dostoevsky download fasttext-social-network-model\n```\n\nThen you can use sentiment analyzer:\n\n```python\nfrom dostoevsky.tokenization import RegexTokenizer\nfrom dostoevsky.models import FastTextSocialNetworkModel\n\ntokenizer = RegexTokenizer()\ntokens = tokenizer.split('\u0432\u0441\u0451 \u043e\u0447\u0435\u043d\u044c \u043f\u043b\u043e\u0445\u043e') # [('\u0432\u0441\u0451', None), ('\u043e\u0447\u0435\u043d\u044c', None), ('\u043f\u043b\u043e\u0445\u043e', None)]\n\nmodel = FastTextSocialNetworkModel(tokenizer=tokenizer)\n\nmessages = [\n '\u043f\u0440\u0438\u0432\u0435\u0442',\n '\u044f \u043b\u044e\u0431\u043b\u044e \u0442\u0435\u0431\u044f!!',\n '\u043c\u0430\u043b\u043e\u043b\u0435\u0442\u043d\u0438\u0435 \u0434\u0435\u0431\u0438\u043b\u044b'\n]\n\nresults = model.predict(messages, k=2)\n\nfor message, sentiment in zip(messages, results):\n \"\"\"\n \u043f\u0440\u0438\u0432\u0435\u0442 -> {'speech': 1.0000100135803223, 'skip': 0.0020607432816177607}\n \u044f \u043b\u044e\u0431\u043b\u044e \u0442\u0435\u0431\u044f!! -> {'positive': 0.9886782765388489, 'skip': 0.005394937004894018}\n \u043c\u0430\u043b\u043e\u043b\u0435\u0442\u043d\u0438\u0435 \u0434\u0435\u0431\u0438\u043b\u044b -> {'negative': 0.9525841474533081, 'neutral': 0.13661839067935944}]\n \"\"\"\n print(message, '->', sentiment)\n```\n\n## Social network model [CNN]\n\nThis model was trained on RuSentiment dataset too, but uses pretrained embeddings from RuSentiment dataset and achieves up to ~0.70 F1 score. Also, this model is implemented using Keras, so its possible to run on GPU. \n![](https://i.imgur.com/bGAEWvg.png)\n\n### Usage\n\nFirst of all, you'll need to download pretrained word embeddings and model:\n\n```bash\n$ dostoevsky download vk-embeddings cnn-social-network-model\n```\n\nThen, we can build our pipeline: `text -> tokenizer -> word embeddings -> CNN`\n\n```python\nfrom dostoevsky.tokenization import UDBaselineTokenizer, RegexTokenizer\nfrom dostoevsky.embeddings import SocialNetworkEmbeddings\nfrom dostoevsky.models import SocialNetworkModel\n\ntokenizer = UDBaselineTokenizer() or RegexTokenizer()\ntokens = tokenizer.split('\u0432\u0441\u0451 \u043e\u0447\u0435\u043d\u044c \u043f\u043b\u043e\u0445\u043e') # [('\u0432\u0441\u0451', 'ADJ'), ('\u043e\u0447\u0435\u043d\u044c', 'ADV'), ('\u043f\u043b\u043e\u0445\u043e', 'ADV')]\n\nembeddings_container = SocialNetworkEmbeddings()\n\nvectors = embeddings_container.get_word_vectors(tokens)\nvectors.shape # (3, 300) - three words/vectors with dim=300\n\nmodel = SocialNetworkModel(\n tokenizer=tokenizer,\n embeddings_container=embeddings_container,\n lemmatize=False,\n)\n\nmessages = [\n '\u043d\u0430\u0441\u0442\u0443\u043f\u0438\u043b\u0438 \u043d\u0430 \u043d\u043e\u0433\u0443',\n '\u0432\u0441\u0451 \u0441\u0443\u043f\u0435\u0440\u0441\u043a\u0438',\n]\n\nresults = model.predict(messages)\n\nfor message, sentiment in zip(messages, results):\n print(message, '->', sentiment) # \u043d\u0430\u0441\u0442\u0443\u043f\u0438\u043b\u0438 \u043d\u0430 \u043d\u043e\u0433\u0443 -> negative\n```\n\n\n## License\n[![FOSSA Status](https://app.fossa.io/api/projects/git%2Bgithub.com%2Fbureaucratic-labs%2Fdostoevsky.svg?type=large)](https://app.fossa.io/projects/git%2Bgithub.com%2Fbureaucratic-labs%2Fdostoevsky?ref=badge_large)\n\n\n", "description_content_type": "text/markdown", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/bureaucratic-labs/dostoevsky", "keywords": "natural language processing,sentiment analysis", "license": "MIT", "maintainer": "", "maintainer_email": "", "name": "dostoevsky", "package_url": "https://pypi.org/project/dostoevsky/", "platform": "", "project_url": "https://pypi.org/project/dostoevsky/", "project_urls": { "Homepage": "https://github.com/bureaucratic-labs/dostoevsky" }, "release_url": "https://pypi.org/project/dostoevsky/0.3.0/", "requires_dist": [ "b-labs-models (==2017.8.22)", "razdel (==0.4.0)", "gensim (==3.8.0)", "Keras (==2.2.5)", "fasttext (==0.9.1)", "pymorphy2 (==0.8)", "pytest (==5.1.2)", "russian-tagsets (==0.6)", "scikit-learn (==0.21.3)", "tensorflow (==1.14.0)" ], "requires_python": "", "summary": "Sentiment analysis library for russian language", "version": "0.3.0" }, "last_serial": 5788708, "releases": { "0.0.1": [ { "comment_text": "", "digests": { "md5": "0f4c51f01389c576e1f1118d8ae0ee35", "sha256": "0a1e84d6848bd0379ec723c75dc1861dfb803dba9eba20140c034f039fd75559" }, "downloads": -1, "filename": "dostoevsky-0.0.1-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "0f4c51f01389c576e1f1118d8ae0ee35", "packagetype": "bdist_wheel", "python_version": "3.6", "requires_python": null, "size": 1453, "upload_time": "2018-05-09T15:13:13", "url": "https://files.pythonhosted.org/packages/33/20/96e7a4222ae4898f655f081f3026483e8cdeabbc78959e1294c177d2ed1a/dostoevsky-0.0.1-py2.py3-none-any.whl" } ], "0.1.0": [ { "comment_text": "", "digests": { "md5": "c8e125d338aa957411f14693120e17bd", "sha256": "e76ee388b4b946ec4ca0e39a7675605f71e4c33d4865016238993de74508fb19" }, "downloads": -1, "filename": "dostoevsky-0.1.0-py3-none-any.whl", "has_sig": false, "md5_digest": "c8e125d338aa957411f14693120e17bd", "packagetype": "bdist_wheel", "python_version": "3.6", "requires_python": null, "size": 11715, "upload_time": "2018-12-04T15:36:51", "url": "https://files.pythonhosted.org/packages/5f/b9/aecb4cdd44b764262e9c4abaf08d49c69d05476b2a129625c07e805a1b9c/dostoevsky-0.1.0-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "bd4e5067d1bd76cfe7eb7fed7a329c4f", "sha256": "d1abea91142eb1dcc3aa0b6ccdbe85c85a9ff58e2ac8278105dac9567b370598" }, "downloads": -1, "filename": "dostoevsky-0.1.0.tar.gz", "has_sig": false, "md5_digest": "bd4e5067d1bd76cfe7eb7fed7a329c4f", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 8536, "upload_time": "2018-12-04T15:41:21", "url": "https://files.pythonhosted.org/packages/33/ab/d2bd464629e8c2dd4458e4f213cb51e2cd1f7b2cbf06772981517c6de85a/dostoevsky-0.1.0.tar.gz" } ], "0.1.1": [ { "comment_text": "", "digests": { "md5": "bb728281b8eb069c00ee6710be860435", "sha256": "992987c4f06a64c4050545d96b382d7fc83548e3d228a8cbeea1e1bd636bbb3d" }, "downloads": -1, "filename": "dostoevsky-0.1.1-py3-none-any.whl", "has_sig": false, "md5_digest": "bb728281b8eb069c00ee6710be860435", "packagetype": "bdist_wheel", "python_version": "3.6", "requires_python": null, "size": 12181, "upload_time": "2018-12-04T15:51:41", "url": "https://files.pythonhosted.org/packages/65/19/fb6ec63aeed788a9d0744ae076e2578e670fe93ccd013ec089da22fc562e/dostoevsky-0.1.1-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "f72bf0f19c2bf4d03f2f8ab17ba94989", "sha256": "04378e4ab5ecd8d0e3ce55b8d050a414b64a638d21e8b9b650cab4c806893144" }, "downloads": -1, "filename": "dostoevsky-0.1.1.tar.gz", "has_sig": false, "md5_digest": "f72bf0f19c2bf4d03f2f8ab17ba94989", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 8703, "upload_time": "2018-12-04T15:51:39", "url": "https://files.pythonhosted.org/packages/15/03/50f70c5d71941e7645cd1b90125421101cd5ed1383df82bd229f95a4a571/dostoevsky-0.1.1.tar.gz" } ], "0.1.2": [ { "comment_text": "", "digests": { "md5": "0884d300bcdac766a53fb5f912687fe3", "sha256": "bd6ec9c6bfd61842690661c9a1fa34131442392281de6ed4db3563e93a9fbd07" }, "downloads": -1, "filename": "dostoevsky-0.1.2-py3-none-any.whl", "has_sig": false, "md5_digest": "0884d300bcdac766a53fb5f912687fe3", "packagetype": "bdist_wheel", "python_version": "3.6", "requires_python": null, "size": 12744, "upload_time": "2018-12-05T11:01:31", "url": "https://files.pythonhosted.org/packages/48/72/14eebbd6247972214d18c406fcff560e3851fb683db347f4d059fb462849/dostoevsky-0.1.2-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "a0d126fdb98274e47b12fe24f5ad7eb6", "sha256": "3df95782f3fc675b3ee2fdbb59966f59d1efbd6f764923d52032a50a36c3c832" }, "downloads": -1, "filename": "dostoevsky-0.1.2.tar.gz", "has_sig": false, "md5_digest": "a0d126fdb98274e47b12fe24f5ad7eb6", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 8770, "upload_time": "2018-12-05T11:01:29", "url": "https://files.pythonhosted.org/packages/44/b8/15c0e6c8ce129f1ec9cc25cb3ab64d79853143ca489927b5c8659b96473f/dostoevsky-0.1.2.tar.gz" } ], "0.2.0": [ { "comment_text": "", "digests": { "md5": "1ca44b637210e022fb553ff855192199", "sha256": "dac7e8cdfdb818441b81a20eaa79b2ef0d5f695ccbdc3d4aef9ff99616598c12" }, "downloads": -1, "filename": "dostoevsky-0.2.0-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "1ca44b637210e022fb553ff855192199", "packagetype": "bdist_wheel", "python_version": "py2.py3", "requires_python": null, "size": 12181, "upload_time": "2019-08-12T10:10:12", "url": "https://files.pythonhosted.org/packages/57/e3/a1ab98e9f22be97c3f770567d18b03cea17ce3fd68aa7bf2e23e956b53e6/dostoevsky-0.2.0-py2.py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "3a67cc902acf2e0cdac0b4538adfbead", "sha256": "4f2a29d69d2d337aeffa2fa91901074dcef97d629155356bcecbb2c592af87d2" }, "downloads": -1, "filename": "dostoevsky-0.2.0-py3-none-any.whl", "has_sig": false, "md5_digest": "3a67cc902acf2e0cdac0b4538adfbead", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 12023, "upload_time": "2019-07-18T22:41:10", "url": "https://files.pythonhosted.org/packages/4e/f7/04894ebfbfc0a04244cbfcc6e3b102144c87a54667e66a5d127a7ec42ead/dostoevsky-0.2.0-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "d4c0d0a88f585a76a2f420508a876fa0", "sha256": "6a7a5d17ae0a180ddbb7073ccd24c93ea607166c44883d998df6d65540d0ad0b" }, "downloads": -1, "filename": "dostoevsky-0.2.0.tar.gz", "has_sig": false, "md5_digest": "d4c0d0a88f585a76a2f420508a876fa0", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 8948, "upload_time": "2019-07-18T22:41:11", "url": "https://files.pythonhosted.org/packages/74/6b/ae812a62398a713c755a7bf630ecce2357a0cfb1457bfa24c52609722c6d/dostoevsky-0.2.0.tar.gz" } ], "0.2.1": [ { "comment_text": "", "digests": { "md5": "90c603b9e41dbea9a5becc44bc11c64b", "sha256": "0417e7fa552b67a5dc988ef182e23eda446918fe333cd653bf39a6d6af24f6e5" }, "downloads": -1, "filename": "dostoevsky-0.2.1-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "90c603b9e41dbea9a5becc44bc11c64b", "packagetype": "bdist_wheel", "python_version": "py2.py3", "requires_python": null, "size": 12493, "upload_time": "2019-08-12T10:10:51", "url": "https://files.pythonhosted.org/packages/c8/1a/af50d8aa216bbefd5b3b617de7f336f3bcd3fbaa86f21836c6bdb07b64f9/dostoevsky-0.2.1-py2.py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "a5d5a5836eee62d6a4bce3b990d6e559", "sha256": "03a1a2ad9bf6363733ae1a1a752e1360a8bce410cab49480d6f87992f9a289db" }, "downloads": -1, "filename": "dostoevsky-0.2.1.tar.gz", "has_sig": false, "md5_digest": "a5d5a5836eee62d6a4bce3b990d6e559", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 8000, "upload_time": "2019-08-12T10:10:52", "url": "https://files.pythonhosted.org/packages/7f/74/d1219651dcca278c0a5a824229c40451d434fb8b2fe31dcacdd0d2570c93/dostoevsky-0.2.1.tar.gz" } ], "0.3.0": [ { "comment_text": "", "digests": { "md5": "78a0269c146c99341c7a25c950495ca4", "sha256": "18061348dd1cd167d05729748e76b0aef65cd505b327916889ccc54be2f5c714" }, "downloads": -1, "filename": "dostoevsky-0.3.0-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "78a0269c146c99341c7a25c950495ca4", "packagetype": "bdist_wheel", "python_version": "py2.py3", "requires_python": null, "size": 13331, "upload_time": "2019-09-05T22:14:50", "url": "https://files.pythonhosted.org/packages/18/58/b52bb3af231e584f030405e285183bd7fc9e057af57893e5a340ee421067/dostoevsky-0.3.0-py2.py3-none-any.whl" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "78a0269c146c99341c7a25c950495ca4", "sha256": "18061348dd1cd167d05729748e76b0aef65cd505b327916889ccc54be2f5c714" }, "downloads": -1, "filename": "dostoevsky-0.3.0-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "78a0269c146c99341c7a25c950495ca4", "packagetype": "bdist_wheel", "python_version": "py2.py3", "requires_python": null, "size": 13331, "upload_time": "2019-09-05T22:14:50", "url": "https://files.pythonhosted.org/packages/18/58/b52bb3af231e584f030405e285183bd7fc9e057af57893e5a340ee421067/dostoevsky-0.3.0-py2.py3-none-any.whl" } ] }