===============================
Django/Xappy search integration
===============================

Bridges Xappy (an interface to the Xapian search engine) with Django.

While other projects, like the GSoC 2008 project[1] try to be generic
and support a common set of functionality, this allows you to take
full advantage of the features provided by Xappy. On the downside, it
is Xappy-specific.


[1] http://code.google.com/p/djangosearch/


Dependencies
============

Just Python 2.5, Django and Xappy. Xappy should be a recent version,
the app is currently written against revision 252.


Usage
=====

django-xappy was originally designed for a project with an index
spanning multiple models. As such, keep in mind that if you're use case
is simpler, usage may currently not be as straightforward and easy as
it could be.

In the case that one index does include multiple models, the official
Django search-api branch, as well as some other projects, for example
``djapian``, use a proxy model that mirrors all documents in the index.
For a example, see:

    http://code.google.com/p/djapian/wiki/IndexingManyModelsAtOnce

We adapt that approach, however, instead of maintaining an additional
model with it's own rows duplicating all other models, the proxy is
simply a non-model object that defines the fields of the index, and to
what fields of each particular models they map.

Defining an index
-----------------

The first step is to define the index. This primarily entails the fields
that the index is supposed to have, and the Xappy actions to apply to
each field:

    import django_xappy as search
    from django_xappy import action, FieldActions

    class MyIndex(search.Index):
        location = '/var/search/index'

        class Data:
            @action(FieldActions.INDEX_FREETEXT)
            def name():
                return "index this!"

First, note that we specify the location attribute directly in the class.
This may seem counter-intuitive at first if you expect that to be
instance data, but note that your index class is not a template for just
some index, but, like each model represents a database table, it
represents an actual physical search index that you intend to maintain.

Now, every method of the inner ``Data`` class that has at least one
action applied to it is considered a field of the index.

Remember that while an index can store the content of multiple models
with clashing field names, it's own field names must be unique. For this
reason, you define fields as methods and return the appropriate value for
the model instance in ``self.content_object`` (your ``Data`` class is
the proxy that wraps around the objects to be indexed).

Example:

    @action(FieldActions.INDEX_FREETEXT)
    @action(FieldActions.STORE_CONTENT)
    def name(self):
        if self == Book:
            return self.content_object.title
        elif self == auth.User:
            return self.content_object.username

This field is supposedly part of an index that searches both ``Books``
and ``Users``. It maps to ``Book.title`` or ``User.username``, depending
on the type of an object.

Registering the models
----------------------

Once your index is defined, you must tell it which models it handles.
Note that a model can be registerd with multiple indexes.

    MyIndex.register(Book)
    MyIndex.register(auth.User)

This will cause all changes to those model are logged, so make sure it
runs before you start working with any of the affected models.

Putting it in an app's ``models.py`` file works best. For larger
projects I usually create a separate ``search`` application with it's
own ``models.py`` file, and define the index there.

Alternatively, using an application's ``__init__.py`` works as well.

Using the index
---------------

To connect to your index, simply create an instance:

    index = MyIndex()

.. admonition:: Note

    If you want to open your index at a location other than the default,
    the following works as well:

        index = MyIndex('/some/other/place')

    Just remember that django-xappy's own code will always open the
    default location (for example, the update code), so this is really
    only useful in rare cases.

To search, just do:

    results = index.search('who am i')

This will give you the first ten results.

    results = index.search('who am i', page=3, num_per_page=5)

Now, the result set includes 5 documents from page 3.

See the **Advanced Usage** section for more about pagination.

.. admonition:: Note

    You can also modify the index, although you usually don't need to
    (and shouldn't) do this. Use the provided update scripts instead.
    For example, to add a document:

        f = Film.objects.get(pk=1)
        index.add(f)
        index.flush()

.. admonition:: Note

    The Xappy separation between a search and an indexer connection is
    hidden by the index class, although if possible you should only use
    an instance for either modifying or searching.

In templates
------------

Usually, you would pass the results collection that is returned by
``search()`` into your template.

There, you can simply iterate over it:

    {% if results %}
        {% for result in results %}
            {{ result.content_object }}
        {% endif %}
    {% endif %}

``result.content_object`` gives you access to the orignal model
instance. If you used the STORE_CONTENT action on some of your
fields, you may instead those values using on of:

    {{ result.some_field }}
    {{ result.highlighted.some_field }}
    {{ result.summarised.some_field }}

Keeping your index up-to-date:
------------------------------

Since django-xappy logs all changes to your models instead of applying
them directly, you need to update your index in regular intervals.

A management command is available to help you with this. Provided you
have **django-xappy** in your ``INSTALLED_APPS`` list, you can do:

    $ ./manage.py index --update

for an incremental update, and

    $ ./manage.py index --full-rebuild

to rebuild all indexes from scratch.

To apply changes on a regular basis, you normally would just setup a
cronjob to run ``manage.py index --update -q``.

.. admonition Note on using multiple indexes

    Due to the way the model change log is stored (with only one
    record per change), it is currently not possible to update
    indexes selectively. There is no way to track which change has
    already been applied to which index.


Advanced usage
==============

Pagination
----------

While technically, you have to use pagination (the ``search()`` function
always returns a paged subset of the results), there currently isn't good
support for pagination with respect to display, i.e. rendering **next**
and **previous** links etc.

You can however use an external paginator to do this, like the one that
Django has builtin:

    from django.core.paginator import Paginator
    Paginator(results, num_per_page).page(page)

Just make sure that the ``num_per_page`` and ``page`` values are the same
that you passed into ``search()``.

Custom update scripts
---------------------

If you don't like to use the management command, you can create a
standalone update script. A default script is provided that you
can easily wrap around:

    # 1) SETUP DJANGO
    ...

    # 2) RUN SCRIPT
    from django_xappy.scripts import update
    update.main()

Keep in mind that you **have** to do step 1 and setup your project's
Django environment for this script. For information on how to do this,
see:

    http://www.b-list.org/weblog/2007/sep/22/standalone-django-scripts/

Also, all modules that define an index need to be loaded, or
``update.main`` won't know **what** to update.

``examples\simple\scripts\update_index.py`` shows how this might look.

If you want to further customize things: ``update.main`` wraps around
the lower-level functions ``apply_changes`` and ``rebuild``, which you
can call directly. Of course, you can also manually modify the index as
per your liking, using ``index.update()``, ``index.delete()`` etc.


TODO
====
    * Simplify usage for simple cases where an index does not
      spawn multiple models.
    * Port tests from critify project
    * Fail if a data class does not define any fields/actions?
    * Provide some kind of support for "unapproved" items, which would
      not be included in the index.
    * Add a "search" management command for some simple index testing.
    * Allow disabling of search result database resolving - when
      outputting the search results, instead of using a resolved model
      instance, one would have to use STORE_CONTENT index fields
      instead. On the plus side, performance would likely improve.
    * Improve the example project with respect to search display (
      model-specific results, result highlighting, ...)
    * Better pagination features. There is no reason why one would have
      to use an external paginator.