{ "info": { "author": "Michael Aquilina", "author_email": "michaelaquilina@gmail.com", "bugtrack_url": null, "classifiers": [ "Development Status :: 2 - Pre-Alpha", "Intended Audience :: Developers", "License :: OSI Approved :: BSD License", "Natural Language :: English", "Programming Language :: Python :: 3.4", "Programming Language :: Python :: 3.5", "Programming Language :: Python :: 3.6", "Programming Language :: Python :: 3.7" ], "description": "===============================\nhashedindex\n===============================\n\n|TravisCI| |AppVeyor| |CodeCov| |PyPi|\n\n\nFast and simple InvertedIndex implementation using hash lists (python dictionaries).\n\nSupports Python 3.5+\n\nFree software: BSD license\n\n* Installing_\n* Features_\n* `Text Parsing`_\n* `Integration with Numpy and Pandas`_\n* `Reporting Bugs`_\n\n\nInstalling\n----------\n\nThe easiest way to install hashindex is through pypi\n\n::\n\n pip install hashedindex\n\n\nFeatures\n--------\n\n``hashedindex`` provides a simple to use inverted index structure that is flexible enough to work with all kinds of use cases.\n\nBasic Usage:\n\n.. code-block:: python\n\n import hashedindex\n index = hashedindex.HashedIndex()\n\n index.add_term_occurrence('hello', 'document1.txt')\n index.add_term_occurrence('world', 'document1.txt')\n\n index.get_documents('hello')\n Counter({'document1.txt': 1})\n\n index.items()\n {'hello': Counter({'document1.txt': 1}),\n 'world': Counter({'document1.txt': 1})}\n\n example = 'The Quick Brown Fox Jumps Over The Lazy Dog'\n\n for term in example.split():\n index.add_term_occurrence(term, 'document2.txt')\n\n``hashedindex`` is not limited to strings, any hashable object can be indexed.\n\n.. code-block:: python\n\n index.add_term_occurrence('foo', 10)\n index.add_term_occurrence(('fire', 'fox'), 90.2)\n\n index.items()\n {'foo': Counter({10: 1}), ('fire', 'fox'): Counter({90.2: 1})}\n\nText Parsing\n------------\n\nThe ``hashedindex`` module comes included with a powerful textparser module with methods to split\ntext into tokens.\n\n.. code-block:: python\n\n from hashedindex import textparser\n list(textparser.word_tokenize(\"hello cruel world\"))\n [('hello',), ('cruel',), ('world',)]\n\nTokens are wrapped within tuples due to the ability to specify any number of n-grams required:\n\n.. code-block:: python\n\n list(textparser.word_tokenize(\"Life is about making an impact, not making an income.\", ngrams=2))\n [(u'life', u'is'), (u'is', u'about'), (u'about', u'making'), (u'making', u'an'), (u'an', u'impact'),\n (u'impact', u'not'), (u'not', u'making'), (u'making', u'an'), (u'an', u'income')]\n\nTake a look at the function's docstring for information on how to use ``stopwords``, specify a\n``min_length`` or ``ignore_numeric`` terms.\n\nIntegration with Numpy and Pandas\n---------------------------------\n\nThe idea behind ``hashedindex`` is to provide a really quick and easy way of generating\nmatrices for machine learning with the additional use of numpy, pandas and scikit-learn.\nFor example:\n\n.. code-block:: python\n\n from hashedindex import textparser\n import hashedindex\n import numpy as np\n\n index = hashedindex.HashedIndex()\n\n documents = ['spam1.txt', 'ham1.txt', 'spam2.txt']\n for doc in documents:\n with open(doc, 'r') as fp:\n for term in textparser.word_tokenize(fp.read()):\n index.add_term_occurrence(term, doc)\n\n # You *probably* want to use scipy.sparse.csr_matrix for better performance\n X = np.as_matrix(index.generate_feature_matrix(mode='tfidf'))\n\n y = []\n for doc in index.documents():\n y.append(1 if 'spam' in doc else 0)\n y = np.asarray(doc)\n\n from sklearn.svm import SVC\n classifier = SVC(kernel='linear')\n classifier.fit(X, y)\n\nYou can also extend your feature matrix to a more verbose pandas DataFrame:\n\n.. code-block:: python\n\n import pandas as pd\n X = index.generate_feature_matrix(mode='tfidf')\n df = pd.DataFrame(X, columns=index.terms(), index=index.documents())\n\nAll methods within the code have high test coverage so you can be sure everything works as expected.\n\nReporting Bugs\n--------------\n\nFound a bug? Nice, a bug found is a bug fixed. Open an Issue or better yet, open a pull request.\n\n.. |TravisCI| image:: https://travis-ci.org/MichaelAquilina/hashedindex.svg?branch=master\n :target: https://travis-ci.org/MichaelAquilina/hashedindex\n\n.. |AppVeyor| image:: https://ci.appveyor.com/api/projects/status/qkhn4bub2pye7skm?svg=true\n :target: https://ci.appveyor.com/project/MichaelAquilina/hashedindex\n\n.. |PyPi| image:: https://badge.fury.io/py/hashedindex.svg\n :target: https://badge.fury.io/py/hashedindex\n\n.. |CodeCov| image:: https://codecov.io/gh/MichaelAquilina/hashedindex/branch/master/graph/badge.svg\n :target: https://codecov.io/gh/MichaelAquilina/hashedindex\n\n\n\n\nHistory\n-------\n\n0.5.0 (2019-07-21)\n--------------------\n* Drop support for python 2.7 and 3.4\n\n0.1.0 (2015-01-11)\n---------------------\n\n* First release on PyPI.\n", "description_content_type": "", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/MichaelAquilina/hashedindex", "keywords": "hashedindex", "license": "BSD", "maintainer": "", "maintainer_email": "", "name": "hashedindex", "package_url": "https://pypi.org/project/hashedindex/", "platform": "", "project_url": "https://pypi.org/project/hashedindex/", "project_urls": { "Homepage": "https://github.com/MichaelAquilina/hashedindex" }, "release_url": "https://pypi.org/project/hashedindex/0.5.0/", "requires_dist": null, "requires_python": "", "summary": "InvertedIndex implementation using hash lists (dictionaries)", "version": "0.5.0" }, "last_serial": 5564396, "releases": { "0.1.0": [ { "comment_text": "", "digests": { "md5": "5bdffe94be7c96618b80af5c5c35aaa6", "sha256": "2be05c3de3ab43804a9c470b0b2025227bababeb27933b01675c490060cd3816" }, "downloads": -1, "filename": "hashedindex-0.1.0.tar.gz", "has_sig": false, "md5_digest": "5bdffe94be7c96618b80af5c5c35aaa6", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 14686, "upload_time": "2015-06-03T22:27:15", "url": "https://files.pythonhosted.org/packages/e7/8d/c5c314f09ed3f635046365b6f5ada84137d035c413512df3a720bb3fe468/hashedindex-0.1.0.tar.gz" } ], "0.1.3": [ { "comment_text": "", "digests": { "md5": "a464d24195ee1cfda51ab40bd13ddece", "sha256": "571fed81acc5be51b91b3329da4ddef1fa25f73bc287e3327cd2c5e7ebae8a75" }, "downloads": -1, "filename": "hashedindex-0.1.3.tar.gz", "has_sig": false, "md5_digest": "a464d24195ee1cfda51ab40bd13ddece", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 14611, "upload_time": "2015-06-06T16:19:30", "url": "https://files.pythonhosted.org/packages/a6/6e/64e7607dd816af7824423ee90dfa9e2228beeadc608cafef7e506a007c32/hashedindex-0.1.3.tar.gz" } ], "0.2.0": [ { "comment_text": "", "digests": { "md5": "318d99f9b40794081c87fe4795fd0bba", "sha256": "fa740349612452643002f2dabe7c9d70771b9eefc3a3e9b0318c067eed821390" }, "downloads": -1, "filename": "hashedindex-0.2.0.tar.gz", "has_sig": false, "md5_digest": "318d99f9b40794081c87fe4795fd0bba", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 15864, "upload_time": "2015-07-06T13:42:48", "url": "https://files.pythonhosted.org/packages/36/50/ead60921e687710b91c612992dcaf1eeb1ffd379a5c2bed1d50f3c47ab4c/hashedindex-0.2.0.tar.gz" } ], "0.2.1": [ { "comment_text": "", "digests": { "md5": "7d825677c2403b5593b951c19ddd4650", "sha256": "0ec982df682c26c648a52c689592a7c2769b282e792fc8209b03dfabdfee4e6c" }, "downloads": -1, "filename": "hashedindex-0.2.1.tar.gz", "has_sig": false, "md5_digest": "7d825677c2403b5593b951c19ddd4650", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 15917, "upload_time": "2015-07-06T13:57:19", "url": "https://files.pythonhosted.org/packages/9c/e3/c7d939867bee29d8b2e99066c4f330109b0481c6d262b1d2321bb0327dbc/hashedindex-0.2.1.tar.gz" } ], "0.2.2": [ { "comment_text": "", "digests": { "md5": "85c291eb11a94b2ec2877c14da70a065", "sha256": "5219188a874618bc96dcb16ec4f6550dd69a532f0285fb4f871894ebb700c110" }, "downloads": -1, "filename": "hashedindex-0.2.2.tar.gz", "has_sig": false, "md5_digest": "85c291eb11a94b2ec2877c14da70a065", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 16166, "upload_time": "2015-07-07T13:50:36", "url": "https://files.pythonhosted.org/packages/06/2d/885ebeaec5cc9abe1977b7af9ffa3e01f3ffa392ccfb64ec8b198b6666da/hashedindex-0.2.2.tar.gz" } ], "0.3.0": [ { "comment_text": "", "digests": { "md5": "ff5867f470888e58ca735492945ea05b", "sha256": "c2039d5145ea97bbd79b0bec76c90401d45e4cea9614d18919aa519afef9f3ff" }, "downloads": -1, "filename": "hashedindex-0.3.0.tar.gz", "has_sig": false, "md5_digest": "ff5867f470888e58ca735492945ea05b", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 17481, "upload_time": "2015-07-13T19:38:22", "url": "https://files.pythonhosted.org/packages/c9/26/834cb6ef6f2a370195621444c5a6fdb440b0bd849d28a97767aef96ee622/hashedindex-0.3.0.tar.gz" } ], "0.4.0": [ { "comment_text": "", "digests": { "md5": "e83b05d60707de3aedda8d742edf8c40", "sha256": "9358954e8c4a148c656195f68f14f6a444de1b6a3fb53dc497555ced2789cf2b" }, "downloads": -1, "filename": "hashedindex-0.4.0.tar.gz", "has_sig": false, "md5_digest": "e83b05d60707de3aedda8d742edf8c40", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 20037, "upload_time": "2015-07-13T19:57:55", "url": "https://files.pythonhosted.org/packages/d3/00/f69af1763eb943ab3e430da495369f65c6b0620425924487fa8f8f13ffca/hashedindex-0.4.0.tar.gz" } ], "0.4.1": [ { "comment_text": "", "digests": { "md5": "b1aadc87c77fc2bdfb13807d27516250", "sha256": "94b6eeac2a62b42e5fb6a2c00251d6cf18d8bbea4b756ab9d962e8d1ec43ff19" }, "downloads": -1, "filename": "hashedindex-0.4.1.tar.gz", "has_sig": false, "md5_digest": "b1aadc87c77fc2bdfb13807d27516250", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 20213, "upload_time": "2017-12-21T20:54:53", "url": "https://files.pythonhosted.org/packages/5e/f2/158e7881332d28ba231a71a3e8f24cebb5565328b05605173e509a478651/hashedindex-0.4.1.tar.gz" } ], "0.4.2": [ { "comment_text": "", "digests": { "md5": "66bba344df4949f6c019f0022d1fc89a", "sha256": "480a9274ff4db1bc98dab0a414757e7d97b30f48e6b8ad1db7cc97e1f6ef03ca" }, "downloads": -1, "filename": "hashedindex-0.4.2.tar.gz", "has_sig": false, "md5_digest": "66bba344df4949f6c019f0022d1fc89a", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 20607, "upload_time": "2018-03-26T22:18:55", "url": "https://files.pythonhosted.org/packages/83/fa/e1a6fe41af326017fc3e1aa756708512fa4b54038239cbcd97f3567bba3d/hashedindex-0.4.2.tar.gz" } ], "0.4.3": [ { "comment_text": "", "digests": { "md5": "4aa2620192812760ad24c077946a48c1", "sha256": "3dc41bcc7970b379f7ef551c771bd44570c75025b777a77fc4b4dec339447172" }, "downloads": -1, "filename": "hashedindex-0.4.3.tar.gz", "has_sig": false, "md5_digest": "4aa2620192812760ad24c077946a48c1", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 20693, "upload_time": "2018-04-05T09:56:09", "url": "https://files.pythonhosted.org/packages/41/d8/6085c63d8a807716774887cba1e74921e495d3699bd9f5a01bfabcae233a/hashedindex-0.4.3.tar.gz" } ], "0.4.4": [ { "comment_text": "", "digests": { "md5": "fa2b33697433f7052d71eacb163ce9c7", "sha256": "23317c2eedec19920ae0ae0d74230bdbc7156e9f37cfa61e16885e8912baaf42" }, "downloads": -1, "filename": "hashedindex-0.4.4.tar.gz", "has_sig": false, "md5_digest": "fa2b33697433f7052d71eacb163ce9c7", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 21077, "upload_time": "2018-04-08T12:40:02", "url": "https://files.pythonhosted.org/packages/ee/10/1a09d0339ac3cb999f2bea791c0d8cc54eb1ca4b3e6217abfbf15f23354b/hashedindex-0.4.4.tar.gz" } ], "0.5.0": [ { "comment_text": "", "digests": { "md5": "d5c7d553700436ee566b36a08147c4eb", "sha256": "289362277c3f18d7f3a9ed449d3db6ca80d251220e7fafb7919ce6b4a95bdaf6" }, "downloads": -1, "filename": "hashedindex-0.5.0-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "d5c7d553700436ee566b36a08147c4eb", "packagetype": "bdist_wheel", "python_version": "3.7", "requires_python": null, "size": 8089, "upload_time": "2019-07-21T20:07:03", "url": "https://files.pythonhosted.org/packages/b6/46/a167cc4b846e6e69d0d66fd7c3b98a519691c5641fa7c0a73f4487c89078/hashedindex-0.5.0-py2.py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "8f7983acb575abf39632369060929374", "sha256": "f56275317219bd5f28d7d8cc7758473f8e27c17916096f633d3eb304763ae806" }, "downloads": -1, "filename": "hashedindex-0.5.0.tar.gz", "has_sig": false, "md5_digest": "8f7983acb575abf39632369060929374", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 21018, "upload_time": "2019-07-21T20:07:01", "url": "https://files.pythonhosted.org/packages/ab/92/869f8c0b18c889e73f9f5ea4a5b8e556bd0dd4b098f3aebd08fd2cf8e92e/hashedindex-0.5.0.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "d5c7d553700436ee566b36a08147c4eb", "sha256": "289362277c3f18d7f3a9ed449d3db6ca80d251220e7fafb7919ce6b4a95bdaf6" }, "downloads": -1, "filename": "hashedindex-0.5.0-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "d5c7d553700436ee566b36a08147c4eb", "packagetype": "bdist_wheel", "python_version": "3.7", "requires_python": null, "size": 8089, "upload_time": "2019-07-21T20:07:03", "url": "https://files.pythonhosted.org/packages/b6/46/a167cc4b846e6e69d0d66fd7c3b98a519691c5641fa7c0a73f4487c89078/hashedindex-0.5.0-py2.py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "8f7983acb575abf39632369060929374", "sha256": "f56275317219bd5f28d7d8cc7758473f8e27c17916096f633d3eb304763ae806" }, "downloads": -1, "filename": "hashedindex-0.5.0.tar.gz", "has_sig": false, "md5_digest": "8f7983acb575abf39632369060929374", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 21018, "upload_time": "2019-07-21T20:07:01", "url": "https://files.pythonhosted.org/packages/ab/92/869f8c0b18c889e73f9f5ea4a5b8e556bd0dd4b098f3aebd08fd2cf8e92e/hashedindex-0.5.0.tar.gz" } ] }