{ "info": { "author": "Alex Olieman", "author_email": "alex@olieman.net", "bugtrack_url": null, "classifiers": [ "Development Status :: 4 - Beta", "License :: OSI Approved :: GNU Lesser General Public License v3 (LGPLv3)", "Programming Language :: Python :: 3", "Programming Language :: Python :: 3.7", "Topic :: Scientific/Engineering :: Artificial Intelligence", "Topic :: Text Processing", "Topic :: Text Processing :: Linguistic" ], "description": "Wayward\n=======\n\n.. image:: https://readthedocs.org/projects/wayward/badge/?version=latest\n :target: https://wayward.readthedocs.io/en/latest/?badge=latest\n :alt: Documentation status\n\n.. image:: https://badge.fury.io/py/wayward.svg\n :target: https://pypi.org/project/wayward/\n :alt: PyPI package version\n\n\n.. docs-inclusion-marker\n\n**Wayward** is a Python package that helps to identify characteristic terms from\nsingle documents or groups of documents. It can be used for keyword extraction\nand several related tasks, and can create efficient sparse representations for\nclassifiers. It was originally created to provide term weights for word clouds.\n\nRather than use simple term frequency to estimate the importance of words and\nphrases, it weighs terms by statistical models known as *parsimonious language\nmodels*. These models are good at picking up the terms that distinguish a text\ndocument from other documents in a collection.\n\nFor this to work, a preferably large amount of documents is needed\nto serve as a background collection, to compare the documents of interest to.\nThis could be a random sample of newspaper articles, for instance, but for many\napplications it works better to take a natural collection, such as a periodical\npublication, and to fit the model for separate parts (e.g. individual issues,\nor yearly groups of issues).\n\nSee the `References`_ section for more information about parsimonious\nlanguage models and their applications.\n\nWayward does not do visualization of word clouds. For that, you can paste\nits output into a tool like http://wordle.net or the `IBM Word-Cloud Generator\n`_.\n\n\nInstallation\n------------\n\nEither install the latest release from PyPI::\n\n $ pip install wayward\n\nor clone the git repository, and use `Poetry `_\nto install the package in editable mode::\n\n $ git clone https://github.com/aolieman/wayward.git\n $ cd wayward/\n $ poetry install\n\nUsage\n-----\n>>> quotes = [\n... \"Love all, trust a few, Do wrong to none\",\n... ...\n... \"A lover's eyes will gaze an eagle blind. \"\n... \"A lover's ear will hear the lowest sound.\",\n... ]\n>>> doc_tokens = [\n... re.sub(r\"[.,:;!?\\\"\u2018\u2019]|'s\\b\", \" \", quote).lower().split()\n... for quote in quotes\n... ]\n\nThe ``ParsimoniousLM`` is initialized with all document tokens as a\nbackground corpus, and subsequently takes a single document's tokens\nas input. Its ``top()`` method returns the top terms and their probabilities:\n\n>>> from wayward import ParsimoniousLM\n>>> plm = ParsimoniousLM(doc_tokens, w=.1)\n>>> plm.top(10, doc_tokens[-1])\n[('lover', 0.1538461408077277),\n ('will', 0.1538461408077277),\n ('eyes', 0.0769230704038643),\n ('gaze', 0.0769230704038643),\n ('an', 0.0769230704038643),\n ('eagle', 0.0769230704038643),\n ('blind', 0.0769230704038643),\n ('ear', 0.0769230704038643),\n ('hear', 0.0769230704038643),\n ('lowest', 0.0769230704038643)]\n\nThe ``SignificantWordsLM`` is similarly initialized with a background corpus,\nbut subsequently takes a group of document tokens as input. Its ``group_top``\nmethod returns the top terms and their probabilities:\n\n>>> from wayward import SignificantWordsLM\n>>> swlm = SignificantWordsLM(doc_tokens, lambdas=(.7, .1, .2))\n>>> swlm.group_top(10, doc_tokens[-2:], fix_lambdas=True)\n[('much', 0.09077675276900632),\n ('lover', 0.06298706244865138),\n ('will', 0.06298706244865138),\n ('you', 0.04538837638450315),\n ('your', 0.04538837638450315),\n ('rhymes', 0.04538837638450315),\n ('speak', 0.04538837638450315),\n ('neither', 0.04538837638450315),\n ('rhyme', 0.04538837638450315),\n ('nor', 0.04538837638450315)]\n\nSee |example/dickens.py|_ for a runnable example with more realistic data.\n\n.. |example/dickens.py| replace:: ``example/dickens.py``\n.. _example/dickens.py: https://github.com/aolieman/wayward/blob/master/example/dickens.py\n\nOrigin and Relaunch\n-------------------\nThis package started out as WeighWords_,\nwritten by Lars Buitinck at the University of Amsterdam. It provides an efficient\nparsimonious LM implementation, and a very accessible API.\n\nA recent innovation in language modeling, Significant Words Language\nModels, led to the addition of a two-way parsimonious language model to this package.\nThis new version targets python\u00a03.x, and after a long slumber deserved a fresh name.\nThe name \"Wayward\" was chosen because it is a near-homophone of WeighWords, and as\na nod to parsimonious language modeling: it uncovers which terms \"depart\" most from\nthe background collection. The parsimonization algorithm discounts terms that are\nalready well explained by the background model, until the most wayward terms come\nout on top.\n\nSee the Changelog_ for an overview of the most important changes.\n\n.. _WeighWords: https://github.com/larsmans/weighwords/\n.. _Changelog: https://wayward.readthedocs.io/en/develop/changelog.html\n\nReferences\n----------\nD. Hiemstra, S. Robertson, and H. Zaragoza (2004). `Parsimonious Language Models\nfor Information Retrieval\n`_.\nProc. SIGIR'04.\n\nR. Kaptein, D. Hiemstra, and J. Kamps (2010). `How different are Language Models\nand word clouds? `_.\nProc. ECIR'10.\n\nM. Dehghani, H. Azarbonyad, J. Kamps, D. Hiemstra, and M. Marx (2016).\n`Luhn Revisited: Significant Words Language Models\n`_.\nProc. CKIM'16.\n", "description_content_type": "text/x-rst", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/aolieman/wayward", "keywords": "natural language processing,language modeling,parsimonious language model,significant words language model,word cloud", "license": "LGPL-3.0", "maintainer": "Alex Olieman", "maintainer_email": "alex@olieman.net", "name": "wayward", "package_url": "https://pypi.org/project/wayward/", "platform": "", "project_url": "https://pypi.org/project/wayward/", "project_urls": { "Documentation": "https://wayward.readthedocs.io", "Homepage": "https://github.com/aolieman/wayward", "Repository": "https://github.com/aolieman/wayward" }, "release_url": "https://pypi.org/project/wayward/0.3.2/", "requires_dist": [ "numpy (>=1.15,<2.0)" ], "requires_python": ">=3.7,<4.0", "summary": "Wayward is a Python package that helps to identify characteristic terms from single documents or groups of documents.", "version": "0.3.2" }, "last_serial": 5378731, "releases": { "0.3.0": [ { "comment_text": "", "digests": { "md5": "5ea26c3c067f3501c1aef35b7e882f28", "sha256": "0f08297e36dfbb05f837608846d45b2c26043fdc01daa44e66ea58c95dd62136" }, "downloads": -1, "filename": "wayward-0.3.0-py3-none-any.whl", "has_sig": false, "md5_digest": "5ea26c3c067f3501c1aef35b7e882f28", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": ">=3.7,<4.0", "size": 10629, "upload_time": "2019-06-04T21:44:00", "url": "https://files.pythonhosted.org/packages/c0/71/c31758007b7882ec97bfa44fde29e5b7574dea9d2d0b78b030f503ac0600/wayward-0.3.0-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "692d206a91da3829b5d3c0e5a73241c7", "sha256": "e94b7ab7a8733c3ba1d1a300865b83f4df556a0ade23ff249a59797bba888f18" }, "downloads": -1, "filename": "wayward-0.3.0.tar.gz", "has_sig": false, "md5_digest": "692d206a91da3829b5d3c0e5a73241c7", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.7,<4.0", "size": 8080, "upload_time": "2019-06-04T21:44:02", "url": "https://files.pythonhosted.org/packages/1f/82/9f3ff72912fec132f77a743c3e3aba7c43b9623d10394c1f5ed47cbb7e3b/wayward-0.3.0.tar.gz" } ], "0.3.1": [ { "comment_text": "", "digests": { "md5": "449589186aa74af6e2b9404b8d0a9867", "sha256": "ffd7507f40297dcdb5ec0dac802bb8a1451df585ce6f7ffe2c1550ee2dfa0bf4" }, "downloads": -1, "filename": "wayward-0.3.1-py3-none-any.whl", "has_sig": false, "md5_digest": "449589186aa74af6e2b9404b8d0a9867", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": ">=3.7,<4.0", "size": 12617, "upload_time": "2019-06-05T20:12:32", "url": "https://files.pythonhosted.org/packages/a6/ab/2ff2775433fee869e74738c43a45c70352ec01f5f8f39f0e6c7216cb1a3d/wayward-0.3.1-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "0bfe3ca8877027cab017066680bb14ab", "sha256": "a7e98253443086a7419b707aba0c89fa826e7f9e69d5a51bf3d32615aa86e30e" }, "downloads": -1, "filename": "wayward-0.3.1.tar.gz", "has_sig": false, "md5_digest": "0bfe3ca8877027cab017066680bb14ab", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.7,<4.0", "size": 11434, "upload_time": "2019-06-05T20:12:34", "url": "https://files.pythonhosted.org/packages/81/29/07a7d35270d4e49a182f5d952e5d3be20ec21f5c2925d789f17f793be5af/wayward-0.3.1.tar.gz" } ], "0.3.2": [ { "comment_text": "", "digests": { "md5": "c1d3213e6f8afde0d5057d01171b4631", "sha256": "7b7d9319aaaddab7f4015da45793ba70d26747a137e1a8587ae4904466b42937" }, "downloads": -1, "filename": "wayward-0.3.2-py3-none-any.whl", "has_sig": false, "md5_digest": "c1d3213e6f8afde0d5057d01171b4631", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": ">=3.7,<4.0", "size": 12798, "upload_time": "2019-06-09T21:13:13", "url": "https://files.pythonhosted.org/packages/77/69/0db62d672f2d798fb6769e72a6d63acc800ed6d8a5ee1cd918d86344d71d/wayward-0.3.2-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "eff701230941c7d09db1fd6698134c27", "sha256": "ebb680e12025591af59ed3766db4ccd30d2c684123865fc8e53278f090eaeada" }, "downloads": -1, "filename": "wayward-0.3.2.tar.gz", "has_sig": false, "md5_digest": "eff701230941c7d09db1fd6698134c27", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.7,<4.0", "size": 12005, "upload_time": "2019-06-09T21:13:15", "url": "https://files.pythonhosted.org/packages/d7/e6/fc5cec12e69ee9e25d2dcc46f4fa4fd4cce708d406a8cfaa2eb6275f2206/wayward-0.3.2.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "c1d3213e6f8afde0d5057d01171b4631", "sha256": "7b7d9319aaaddab7f4015da45793ba70d26747a137e1a8587ae4904466b42937" }, "downloads": -1, "filename": "wayward-0.3.2-py3-none-any.whl", "has_sig": false, "md5_digest": "c1d3213e6f8afde0d5057d01171b4631", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": ">=3.7,<4.0", "size": 12798, "upload_time": "2019-06-09T21:13:13", "url": "https://files.pythonhosted.org/packages/77/69/0db62d672f2d798fb6769e72a6d63acc800ed6d8a5ee1cd918d86344d71d/wayward-0.3.2-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "eff701230941c7d09db1fd6698134c27", "sha256": "ebb680e12025591af59ed3766db4ccd30d2c684123865fc8e53278f090eaeada" }, "downloads": -1, "filename": "wayward-0.3.2.tar.gz", "has_sig": false, "md5_digest": "eff701230941c7d09db1fd6698134c27", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.7,<4.0", "size": 12005, "upload_time": "2019-06-09T21:13:15", "url": "https://files.pythonhosted.org/packages/d7/e6/fc5cec12e69ee9e25d2dcc46f4fa4fd4cce708d406a8cfaa2eb6275f2206/wayward-0.3.2.tar.gz" } ] }