{ "info": { "author": "Kapil Thangavelu", "author_email": "kapil.foss@gmail.com", "bugtrack_url": null, "classifiers": [ "Framework :: Zope3", "Intended Audience :: Developers" ], "description": "----------\nore.xapian\n----------\n\nThe package provides a content indexing framework for a multi-threaded\npython application. It utilizes xapian for its indexing library, and the\nzope component architecture for flexibility. It operates primarily as a\nframework wrapper for xapian core search facilities.\n\nfeatures \n\n - processes all indexing operations asynchronously.\n\n - mechanisms for indexing/resolving content from multiple data stores.\n\n - easy to customize indexing behavior via adaptation.\n\n - transaction aware modifications, aggregates operations for content\n within a transaaction scope.\n\nContent\n-------\n\nLet's create some content to work with. The only responsibility on\ncontent for purposes of integration with indexing is that they\nimplement the IIndexable marker interface. \n\n >>> class Content( object ):\n ... implements( interfaces.IIndexable )\n ... __parent__ = None\n ... @property\n ... def __name__( self ): return self.title\n ... def __init__( self, **kw): self.__dict__.update(kw)\n ... def __hash__( self ): return hash(self.title)\n >>>\n >>> rabbit = Content( title=u\"rabbit\", description=\"furry little creatures\", keywords=(\"skin\",) )\n >>> elephant = Content( title=u\"elephant\", description=\"large mammals with memory\", keywords=(\"apple\",) )\n >>> snake = Content( title=u\"snake\", description=\"reptile with scales\", keywords=(\"skin\", \"apple\") )\n >>>\n\nResolvers\n---------\n\nResolvers allow us to index content from multiple data stores. ie. we\ncould have content from a relational database, and content from a\nsubversion, and content from the fs, that we'd like to index into\nxapian. Resolvers allow us to unambigously identify object via an\nidentifier, and to retrieve an object given its identifier. Resolvers\nare structured as named utilities, with the utility name corresponding\nto the resolving strategy. \n\nOne key requirement, is that we need to be able to load the content\nasynchronously in a different thread in order for the indexing\nmachinery to work with it.\n\nFor the purposes of testing we'll construct a simple resolver scheme\nand some sample content here:\n\n >>> class ContentResolver( object ):\n ... implements( interfaces.IResolver )\n ... scheme = \"\" # name of resolver utility ( optionally \"\" for default )\n ... map = dict( rabbit=rabbit, elephant=elephant, snake=snake )\n ...\n ... def id( self, object ): return object.title\n ... def resolve( self, id ): return self.map[id]\n ...\n >>> component.provideUtility( ContentResolver() )\n\nCatalog Definition\n------------------\n\na core responsibility of any application utilizing this package, is to\ndefine the application specific fields of interest to index. \n\nan application does this via constructing a xapian index connection\nand adding additional fields:\n\n >>> import xappy\n >>> indexer = xappy.IndexerConnection('tmp.idx')\n >>> indexer.add_field_action('resolver', xappy.FieldActions.INDEX_EXACT )\n >>> indexer.add_field_action('resolver', xappy.FieldActions.STORE_CONTENT )\n >>> indexer.add_field_action('object_type', xappy.FieldActions.INDEX_EXACT )\n >>> indexer.add_field_action('object_type', xappy.FieldActions.STORE_CONTENT )\n >>> indexer.add_field_action('title', xappy.FieldActions.INDEX_FREETEXT )\n >>> indexer.add_field_action('title', xappy.FieldActions.STORE_CONTENT )\n >>> indexer.add_field_action('title', xappy.FieldActions.STORE_CONTENT )\n >>> indexer.add_field_action('title', xappy.FieldActions.SORTABLE )\n >>> indexer.add_field_action('description', xappy.FieldActions.INDEX_FREETEXT )\n >>> indexer.add_field_action('keyword', xappy.FieldActions.INDEX_EXACT )\n\n\nQueue Processor\n---------------\n\nNow we can startup our asynchronous indexing thread, with this\nindex connection. Note we shouldn't attempt to perform any indexing\ndirectly in the application threads with this indexer, as no locking is\nperformed by xapian. Instead, write operations are routed to the queue\nprocessor which performs all modifications to the index in a separate\nthread/process. For the purposes of testing, we'll also lower the time\nthreshold for index flushes (default 60s):\n\nFor test purposes, we set the poll timeout to 0.1 seconds.\n\n >>> from ore.xapian import queue\n >>> queue.QueueProcessor.POLL_TIMEOUT = 0.1\n >>> queue.QueueProcessor.FLUSH_THRESHOLD = 1 \n\nLet's start the indexing queue. We typically do this in ZCML, but its not \nrequired, and for testing purposes we'll do it directly from python.\n\n >>> queue.QueueProcessor.start( indexer )\n \n\nVerify that the queue is running.\n \n >>> queue.QueueProcessor.indexer_running\n True\n\nIndexing\n--------\n\nContent indexing is automatically provided via event integration. Event \nsubscribers for object modified, object added, and object removed are\nutilized to generate index operations which are processed asynchronously \nby the queue processor. \n\nOperations\n==========\n\nHowever in order for the proper resolver to be associated with the\nindex operations for each object we need to construct an operation\nfactory thats associated to the resolver. The appropriate operation \nfactory for an object will be found via adaptation:\n\n >>> from ore.xapian.operation import OperationFactory\n >>> class MyOperationFactory( OperationFactory ):\n ... resolver_id = ContentResolver.scheme\n >>> component.provideAdapter( MyOperationFactory, (interfaces.IIndexable,) )\n\nThe operation factory is used by the various event handlers to create\noperations for the index queue. The default implementation already\nprovides an appropriate generic implementation for the creation of\noperations, our customization is only to ensure that the factory uses\nthe specified resolver.\n\nContent Integration\n===================\n\nApplications will be typically be indexing many types of objects\ncorresponding to different interfaces and with different attribute\nvalues. An index however tries to index object attributes into a\ncommon set of fields appropriate for generic application usage and \nsearch interfaces. Therefore a common application need is to customize\nthe representation of an object that is indexed. \n\n >>> class ContentIndexer( object ):\n ... implements( interfaces.IIndexer )\n ... def __init__( self, context): self.ob = context\n ... def document( self, connection ):\n ... doc = xappy.UnprocessedDocument()\n ... doc.fields.append( xappy.Field( 'title', self.ob.title ))\n ... doc.fields.append( xappy.Field( 'description', self.ob.description ))\n ... doc.fields.append( xappy.Field( 'object_type', self.ob.__class__.__name__ ) )\n ... for kw in self.ob.keywords:\n ... \t doc.fields.append( xappy.Field( 'keyword', kw ) )\n ... return doc \n >>> \n >>> component.provideAdapter( ContentIndexer, (interfaces.IIndexable,) ) \n\nNow let's generate some events to kickstart the indexing:\n \n >>> from zope.event import notify\n >>> from zope.app.container.contained import ObjectAddedEvent\n >>>\n >>> notify( ObjectAddedEvent( rabbit ) )\n >>> notify( ObjectAddedEvent( elephant ) )\n >>> notify( ObjectAddedEvent( snake ) )\n\nIn order to have the indexer process these events, we need to commit the\ntransaction.\n\n >>> transaction.commit()\n >>> import time\n >>> time.sleep(0.1) \n\nSearching\n--------- \n\nSearch Utilities are analagous to xapian search connections. To allow\nfor reuse of a connection and avoid passing constructor arguments, \nwe construct a search gateway which functions as a container for \npooling search connections and which we register as a utility for\neasy access:\n \n >>> from ore.xapian import search\n >>> search_connections = search.ConnectionHub('tmp.idx')\n\nWe can get a search connection from a gateway by calling it:\n\n >>> searcher = search_connections.get()\n >>> query = searcher.query_parse('rabbit')\n >>> results = searcher.search( query, 0, 30)\n >>> len(results)\n 1\n \nWe can retrieve the object indexed by calling the object() method on \na individual search result:\n\n >>> results[0].object() is rabbit\n True\n \n >>> query = searcher.query_parse('mammals')\n >>> results = searcher.search( query, 0, 30 )\n >>> len(results) == 1\n True\n\nWe can search across the object type index to retrieve all indexed of\nthe same type.\n\n >>> query = searcher.query_field( 'object_type', 'Content')\n >>> results = searcher.search( query, 0, 30 )\n >>> len(results) == 3\n True\n\nLet's try a keyword search, we indexed the content objects across two keywords,\n\"apple\" and \"skin\".\n\n >>> query = searcher.query_field('keyword', 'apple')\n >>> results = searcher.search( query, 0, 30 )\n >>> print [i.object().title for i in results]\n [u'snake', u'elephant']\n\nLet's sort on title as well.\n\n >>> query = searcher.query_field('keyword', 'skin')\n >>> results = searcher.search( query, 0, 30, sortby=\"title\" )\n >>> print [i.object().title for i in results]\n [u'rabbit', u'snake']\n\nContent Integration Redux\n-------------------------\n\nFor verification let's test modification and deletion as well.\n\n >>> from zope.lifecycleevent import ObjectModifiedEvent\n >>> from zope.app.container.contained import ObjectRemovedEvent\n\nWe'll give the rabbit a new description.\n \n >>> rabbit.description = 'hairy little animal'\n >>> notify(ObjectModifiedEvent(rabbit))\n\nAnd delete the snake-object.\n\n >>> notify(ObjectRemovedEvent(snake))\n\nWait a bit and reopen the search connection.\n\n >>> transaction.commit()\n >>> time.sleep(0.1) \n >>> searcher.reopen()\n \nVerify search results:\n\n >>> query = searcher.query_parse('hairy')\n >>> len(searcher.search(query, 0, 30))\n 1\n\n >>> query = searcher.query_parse('snake')\n >>> len(searcher.search(query, 0, 30))\n 0\n \nCleanup\n-------\n\nTo be a good testing citizen, we cleanup our queue processing thread.\n \n >>> queue.QueueProcessor.stop()\n\nCaveats\n-------\n\nThere are several caveats to using an indexing against relational\ncontent, the primary one of concern is the use of non index aware\napplications, performing modifications of the database structure.\n\nthere are additional ways to deal with this, if the index queue\nis moved directly into the database, then modifying applications\ncan insert operations directly into the index queue. additionally\nmost databases support trigger operations that can perform this\nfunctionality directly in the schema structure.\n\nthe additional constraint with using database based operations is\nthat additional properties of the domain model may be lost, or \nhard to capture for other appliacations or database triggers.\n\n\n\nChanges\n-------\n0.5.0 - November 11th, 2008\n\n- add extensive optional logging options\n\n- don't let a bad op kill the indexing thread\n \n- log if an object can't be resolved\n\n- enable synchronous mode for integration testing\n\n- allow for multiple zcml defs for a queue processor\n\n0.4.2 - May 2nd, 2008\n\n- add license headers\n\n0.4.1 - May 1st, 2008\n=====================\n\n- packaging fix, not a zip safe package (includes zcml)\n\n0.4 - April 30th, 2008 \n======================\n\n- transactional operation buffer for feeding into operation queue.\n also performs aggregation of operations for a given piece of\n content within a transaction scope.\n\n- zcml support for starting an indexer\n \n- additional test coverage and bug fixes\n\n- rename flush timeout to poll timeout on queue processor.\n\n- transaction package dependency\n\n0.3 - February 10th, 2008 \n=========================\n\nFirst Release", "description_content_type": null, "docs_url": null, "download_url": "UNKNOWN", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://svn.objectrealms.net/svn/public/ore.xapian/", "keywords": "zope3 index search xapian xappy", "license": "GPL", "maintainer": null, "maintainer_email": null, "name": "ore.xapian", "package_url": "https://pypi.org/project/ore.xapian/", "platform": "UNKNOWN", "project_url": "https://pypi.org/project/ore.xapian/", "project_urls": { "Download": "UNKNOWN", "Homepage": "https://svn.objectrealms.net/svn/public/ore.xapian/" }, "release_url": "https://pypi.org/project/ore.xapian/0.5.0/", "requires_dist": null, "requires_python": null, "summary": "A Xapian Content Indexing/Searching Framework for Zope3", "version": "0.5.0" }, "last_serial": 795836, "releases": { "0.3.0": [ { "comment_text": "", "digests": { "md5": "2f7338a0b3cedf19012c1a931b181373", "sha256": "99c7651fdff0bbb93abe69a25abc8ea1a20c9593c60cd942b849848865394a80" }, "downloads": -1, "filename": "ore.xapian-0.3.0-py2.5.egg", "has_sig": false, "md5_digest": "2f7338a0b3cedf19012c1a931b181373", "packagetype": "bdist_egg", "python_version": "2.5", "requires_python": null, "size": 20450, "upload_time": "2008-02-14T18:05:42", "url": "https://files.pythonhosted.org/packages/93/66/424d33b6c0df531b0aaee18ba7cfb29823d9190ebe6cda49a1f1fe6c33d4/ore.xapian-0.3.0-py2.5.egg" }, { "comment_text": "", "digests": { "md5": "d7b9d4131b8d246302a831dab3a4dcb8", "sha256": "6a5cd11b1de93143bcd7c14d4d1c46bb108428e35fc1985431234f15e69f2d7d" }, "downloads": -1, "filename": "ore.xapian-0.3.0.tar.gz", "has_sig": false, "md5_digest": "d7b9d4131b8d246302a831dab3a4dcb8", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 10730, "upload_time": "2008-02-14T18:06:16", "url": "https://files.pythonhosted.org/packages/86/ac/79f026ada29e2983e9798a8f1f1075cf285328f0a80e1718fb9824f35e20/ore.xapian-0.3.0.tar.gz" } ], "0.4.0": [ { "comment_text": "", "digests": { "md5": "d8bc9aa6915e30a5dd5a0b5a5eab0fef", "sha256": "848ef94dad186514fc38a674bdb0a428e666e17145f8fcab5c51f1c1c4f7e915" }, "downloads": -1, "filename": "ore.xapian-0.4.0-py2.5.egg", "has_sig": false, "md5_digest": "d8bc9aa6915e30a5dd5a0b5a5eab0fef", "packagetype": "bdist_egg", "python_version": "2.5", "requires_python": null, "size": 27470, "upload_time": "2008-04-30T05:43:23", "url": "https://files.pythonhosted.org/packages/1a/8e/6d5164db86e3008c4cef3332da05798a6c8be4c25ca1ea7d30ac75149a61/ore.xapian-0.4.0-py2.5.egg" }, { "comment_text": "", "digests": { "md5": "09adbc949dadd6932497248f85841c5f", "sha256": "5c5e98afb7cdcb36d99d71564fd5211642d1f9a9ef187699e15508bda5ae2666" }, "downloads": -1, "filename": "ore.xapian-0.4.0.tar.gz", "has_sig": false, "md5_digest": "09adbc949dadd6932497248f85841c5f", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 16816, "upload_time": "2008-04-30T05:43:16", "url": "https://files.pythonhosted.org/packages/b3/98/f3cb02c207f4f5ce8888d7ce80a6fdcc2cfabdfae9b1f8f6a8c99cb37a3d/ore.xapian-0.4.0.tar.gz" } ], "0.4.1": [ { "comment_text": "", "digests": { "md5": "b22ca92f45e9dbddfc861a08d32bed1b", "sha256": "fdc66a2f5187cbc0be3d08243d667bb1778b5a9a11f8d69bafaee7adf48b0ba3" }, "downloads": -1, "filename": "ore.xapian-0.4.1-py2.5.egg", "has_sig": false, "md5_digest": "b22ca92f45e9dbddfc861a08d32bed1b", "packagetype": "bdist_egg", "python_version": "2.5", "requires_python": null, "size": 27491, "upload_time": "2008-05-01T04:07:44", "url": "https://files.pythonhosted.org/packages/45/a5/37a2d02448c5a5a16fad2cab234e1b10a0606748d49a4a68e52815aec1d2/ore.xapian-0.4.1-py2.5.egg" }, { "comment_text": "", "digests": { "md5": "61fa1e87708c8e1d6e14394d6c0ee93e", "sha256": "6ac0780f88b02253f6d005d9c326b7daf8519cca2b31eafe6df153a424cf247e" }, "downloads": -1, "filename": "ore.xapian-0.4.1.tar.gz", "has_sig": false, "md5_digest": "61fa1e87708c8e1d6e14394d6c0ee93e", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 15955, "upload_time": "2008-05-01T04:07:43", "url": "https://files.pythonhosted.org/packages/8d/e7/04ce46cb5de23bba6274a4e4d74576189ad3963698c2f059d112e619a0f5/ore.xapian-0.4.1.tar.gz" } ], "0.4.2": [ { "comment_text": "", "digests": { "md5": "b7d06c369a889b93f94e14d78d3f2a6d", "sha256": "a6de069b11fd3af12d037f30f8f978e082715031bce209f600f70db59f45ffa7" }, "downloads": -1, "filename": "ore.xapian-0.4.2-py2.5.egg", "has_sig": false, "md5_digest": "b7d06c369a889b93f94e14d78d3f2a6d", "packagetype": "bdist_egg", "python_version": "2.5", "requires_python": null, "size": 30822, "upload_time": "2008-05-02T17:52:53", "url": "https://files.pythonhosted.org/packages/80/69/126c740af3e9c0f4aed0130976cd598ba3cf98ed9985e65d0b222fbc79b4/ore.xapian-0.4.2-py2.5.egg" }, { "comment_text": "", "digests": { "md5": "301696c8ca021d10e312d9596174fb68", "sha256": "fc4d64843666708758eb21d48256cce4f37ca6af099be6d6522f54cdd978cb24" }, "downloads": -1, "filename": "ore.xapian-0.4.2.tar.gz", "has_sig": false, "md5_digest": "301696c8ca021d10e312d9596174fb68", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 16399, "upload_time": "2008-05-02T17:52:52", "url": "https://files.pythonhosted.org/packages/86/fa/dc13718fd009612926941348db8da84cac78f1bc6f3f5d2ee88871167cf8/ore.xapian-0.4.2.tar.gz" } ], "0.5.0": [ { "comment_text": "", "digests": { "md5": "b24d31ae914ca1e81fd1b48fc8fe3c9b", "sha256": "405e00fbddbdc886927112f087ef17ed4eae43592d2ac8828b250879f8eb6299" }, "downloads": -1, "filename": "ore.xapian-0.5.0-py2.5.egg", "has_sig": false, "md5_digest": "b24d31ae914ca1e81fd1b48fc8fe3c9b", "packagetype": "bdist_egg", "python_version": "2.5", "requires_python": null, "size": 33710, "upload_time": "2008-11-11T07:43:29", "url": "https://files.pythonhosted.org/packages/9d/03/e91b1e63107a8b62577aed72137cdd094821be2b17428f144822cb118a54/ore.xapian-0.5.0-py2.5.egg" }, { "comment_text": "", "digests": { "md5": "4b9db83a61a4d3589f0ea756246beea3", "sha256": "f38d5e44183166affcb5d660054290227bbb798ac2cd8a66e988c1de5d417ff7" }, "downloads": -1, "filename": "ore.xapian-0.5.0.tar.gz", "has_sig": false, "md5_digest": "4b9db83a61a4d3589f0ea756246beea3", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 18254, "upload_time": "2008-11-11T07:43:28", "url": "https://files.pythonhosted.org/packages/f6/fe/fc1db344b3f1572f3939b227a679575abc11957f92453606acf1982b9985/ore.xapian-0.5.0.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "b24d31ae914ca1e81fd1b48fc8fe3c9b", "sha256": "405e00fbddbdc886927112f087ef17ed4eae43592d2ac8828b250879f8eb6299" }, "downloads": -1, "filename": "ore.xapian-0.5.0-py2.5.egg", "has_sig": false, "md5_digest": "b24d31ae914ca1e81fd1b48fc8fe3c9b", "packagetype": "bdist_egg", "python_version": "2.5", "requires_python": null, "size": 33710, "upload_time": "2008-11-11T07:43:29", "url": "https://files.pythonhosted.org/packages/9d/03/e91b1e63107a8b62577aed72137cdd094821be2b17428f144822cb118a54/ore.xapian-0.5.0-py2.5.egg" }, { "comment_text": "", "digests": { "md5": "4b9db83a61a4d3589f0ea756246beea3", "sha256": "f38d5e44183166affcb5d660054290227bbb798ac2cd8a66e988c1de5d417ff7" }, "downloads": -1, "filename": "ore.xapian-0.5.0.tar.gz", "has_sig": false, "md5_digest": "4b9db83a61a4d3589f0ea756246beea3", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 18254, "upload_time": "2008-11-11T07:43:28", "url": "https://files.pythonhosted.org/packages/f6/fe/fc1db344b3f1572f3939b227a679575abc11957f92453606acf1982b9985/ore.xapian-0.5.0.tar.gz" } ] }