{ "info": { "author": "Daniel Lindsley", "author_email": "daniel@toastdriven.com", "bugtrack_url": null, "classifiers": [ "Development Status :: 5 - Production/Stable", "Intended Audience :: Developers", "License :: OSI Approved :: BSD License", "Operating System :: OS Independent", "Programming Language :: Python", "Topic :: Internet :: WWW/HTTP :: Indexing/Search" ], "description": "===========\nmicrosearch\n===========\n\n\nA small search library.\n\nPrimarily intended to be a learning tool to teach the fundamentals of search.\n\nUseful for embedding into Python apps where you don't want/need something\nas complex as Lucene.\n\nPart of my (upcoming) 2012 PyCon talk - https://us.pycon.org/2012/schedule/presentation/66/\n\n\nRequirements\n============\n\n* Python 2.5+ or Python 3.2+\n* (Optional) simplejson\n* (Optional) unittest2 (Python 2.5 - for runnning the tests)\n\n\nUsage\n=====\n\nExample::\n\n import microsearch\n\n # Create an instance, pointing it to where the data should be stored.\n ms = microsearch.Microsearch('/tmp/microsearch')\n\n # Index some data.\n ms.index('email_1', {'text': \"Peter,\\n\\nI'm going to need those TPS reports on my desk first thing tomorrow! And clean up your desk!\\n\\nLumbergh\"})\n ms.index('email_2', {'text': 'Everyone,\\n\\nM-m-m-m-my red stapler has gone missing. H-h-has a-an-anyone seen it?\\n\\nMilton'})\n ms.index('email_3', {'text': \"Peter,\\n\\nYeah, I'm going to need you to come in on Saturday. Don't forget those reports.\\n\\nLumbergh\"})\n ms.index('email_4', {'text': 'How do you feel about becoming Management?\\n\\nThe Bobs'})\n\n # Search on it.\n ms.search('Peter')\n ms.search('tps report')\n\n\nShortcomings\n============\n\nThis library is meant to help others learn. While it has full test coverage,\nit may not be suitable for production use. Reasons you may not want to use it\nin Real Code(tm):\n\n* No concurrency support\n\n * Tries to work atomically with files\n * But there are no locks\n * So it's possible for writes to overlap between processes\n\n* Maybe thread-safe?\n\n * Pretty much everything is on an instance\n * But I haven't tested it extensively with threading\n\n* No support for deleting documents\n\n * If an existing document changes or gets deleted, stale data will be left\n in the index\n * A workaround would be blowing away the index directory, moving the docs out\n and reindexing them :/\n\n* Only n-grams are supported\n\n * Because writing a full Porter or Snowball stemmer is beyond the needs\n of this library\n\n* No clue on performance at scale\n\n * This is a proof-of-concept & learning tool, *not* Lucene!\n * With a 2011 MBP on the first 1.2K docs of the Enron corpus:\n\n * Indexing is pretty slow at ~1 document per second\n * Search is pretty fast at ~0.007 sec per query\n * RAM never exceeded 15Mb when indexing, 10Mb when searching\n * Script in the source repo as ``enron_bench.py``.\n\n\nRunning Tests\n=============\n\nWith a source checkout, run:\n\nIn Python 2:\n\n python -m unittest2 tests\n\nIn Python 3:\n\n python -m unittest tests\n\nTests should be passing at all times under both Python 2.7 & Python 3.2.\n\n\nContributions\n=============\n\nIf you wish to contribute to improving ``microsearch``, the code you submit\nmust:\n\n* Be your own work & BSD-licensed\n* Include a working fix/feature\n* Follow the existing style of the codebase\n* Include passing test coverage of the new code\n* If it's user-facing, must include documentation\n\nOther submissions are welcome, but won't get merged until all of these\nrequirements are met.\n\n\n:author: Daniel Lindsley \n:date: 2011/02/22", "description_content_type": null, "docs_url": null, "download_url": "UNKNOWN", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "http://github.com/toastdriven/microsearch", "keywords": null, "license": "UNKNOWN", "maintainer": null, "maintainer_email": null, "name": "microsearch", "package_url": "https://pypi.org/project/microsearch/", "platform": "UNKNOWN", "project_url": "https://pypi.org/project/microsearch/", "project_urls": { "Download": "UNKNOWN", "Homepage": "http://github.com/toastdriven/microsearch" }, "release_url": "https://pypi.org/project/microsearch/1.0.0/", "requires_dist": null, "requires_python": null, "summary": "A small search library.", "version": "1.0.0" }, "last_serial": 635631, "releases": { "0.1.0": [ { "comment_text": "", "digests": { "md5": "9394172d1aa60f64974f9c760dc13bb0", "sha256": "64798c6b2078e471c7a78f64eaf83f41341ac3bf05d000a8a00f589ae9073d5a" }, "downloads": -1, "filename": "microsearch-0.1.0.tar.gz", "has_sig": false, "md5_digest": "9394172d1aa60f64974f9c760dc13bb0", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 3268, "upload_time": "2011-12-10T09:43:09", "url": "https://files.pythonhosted.org/packages/a7/b2/b764f09012a1af9e76f3ec72d0d2bf8640b5c024ef388ed3d6990fb082a0/microsearch-0.1.0.tar.gz" } ], "0.9.0": [ { "comment_text": "", "digests": { "md5": "abdf37da75ac0e8cbaf95379b2152f2c", "sha256": "c0268729ff909ea6717ec89a115a3f0097acd513a857a27171b07f742c32af45" }, "downloads": -1, "filename": "microsearch-0.9.0.tar.gz", "has_sig": false, "md5_digest": "abdf37da75ac0e8cbaf95379b2152f2c", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 6764, "upload_time": "2012-02-22T18:26:36", "url": "https://files.pythonhosted.org/packages/8c/64/771940150b54096a9523f9237e3a4e42c4dffc0b0d55e3fd83be74e4299f/microsearch-0.9.0.tar.gz" } ], "0.9.1": [ { "comment_text": "", "digests": { "md5": "77c8302260d2b9cdb7e0197c613ad860", "sha256": "134b5594795f951b7d52fdc76bb33ca5e42b3afaa4d04614391a5ea35af3c845" }, "downloads": -1, "filename": "microsearch-0.9.1.tar.gz", "has_sig": false, "md5_digest": "77c8302260d2b9cdb7e0197c613ad860", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 6957, "upload_time": "2012-02-23T05:24:43", "url": "https://files.pythonhosted.org/packages/5c/c3/0eef823b9b46a3d410d24e10b2fa6d9740008e76d217a127c26d650eb45e/microsearch-0.9.1.tar.gz" } ], "1.0.0": [ { "comment_text": "", "digests": { "md5": "cfc750c8832b9582fe6ea3e492b684f2", "sha256": "731aa64f2e2cffed60c97c4d051ba118b80c655ca25d15e001341d7cb035c4a4" }, "downloads": -1, "filename": "microsearch-1.0.0.tar.gz", "has_sig": false, "md5_digest": "cfc750c8832b9582fe6ea3e492b684f2", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 9508, "upload_time": "2012-02-23T07:51:29", "url": "https://files.pythonhosted.org/packages/98/a4/4392a153b9babde4d15405f2e74c326b6a69c0299cba33d0e2bd3029ec26/microsearch-1.0.0.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "cfc750c8832b9582fe6ea3e492b684f2", "sha256": "731aa64f2e2cffed60c97c4d051ba118b80c655ca25d15e001341d7cb035c4a4" }, "downloads": -1, "filename": "microsearch-1.0.0.tar.gz", "has_sig": false, "md5_digest": "cfc750c8832b9582fe6ea3e492b684f2", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 9508, "upload_time": "2012-02-23T07:51:29", "url": "https://files.pythonhosted.org/packages/98/a4/4392a153b9babde4d15405f2e74c326b6a69c0299cba33d0e2bd3029ec26/microsearch-1.0.0.tar.gz" } ] }