.. contents::

Introduction
============

``bg.crawler`` is a command-line frontend for feeding a tree of files (a directory)
into a Solr for indexing

Usage
=====

Command line options::

    blackmoon:~/src/bg.crawler> bin/solr-crawler --help
    usage: solr-crawler [-h] [--solr-url SOLR_URL] [--max-depth MAX_DEPTH]
                        [--batch-size BATCH_SIZE] [--tag TAG] [--clear-all]
                        [--clear-tag SOLR_CLEAR_TAG] [--verbose] [--no-type-check]
                        <directory>

    Commandline parser

    positional arguments:
      <directory>           Directory to be crawled

    optional arguments:
      -h, --help            show this help message and exit
      --solr-url SOLR_URL   SOLR server URL
      --max-depth MAX_DEPTH
                            maximum folder depth
      --batch-size BATCH_SIZE
                            Solr batch size
      --tag TAG             Solr import tag
      --clear-all           Clear the Solr indexes before crawling
      --clear-tag SOLR_CLEAR_TAG
                            Remove all items from Solr indexed tagged with the
                            given tag
      --verbose             Verbose logging
      --no-type-check       Apply extension filter while crawling


* ``--solr-url`` defines the URL of the SOLR server

* ``--max-depth`` limits the crawler to a given folder depth

* ``--batch-size`` insert N documents within one batch before
  sending a commit to Solr (default behavior: every single
  add to the Solr indexed will be committed)

* ``--tag`` will tag the imported document(s) with a string
  (this may be useful importing different document sources
  into Solr while supporting the option to filter by tag
  at query time)

* ``--clear-all`` clear the complete Solr index before running
  the import

* ``--clear-tag`` remove all documents with the given tag before
  running the import

* ``--verbose`` enable extensive logging

* ``--no-type-check`` if set: do not apply any type check filtering
  but instead pass all file types to Solr

Licence
=======

``bg.crawler`` is published under the GNU Public Licence V2 (GPL 2)

Credits
=======

``bg.crawler`` is sponsored by BG Phoenics

Author
======

Written by 

| ZOPYX Ltd.
| c/o Andreas Jung
| Charlottenstr. 37/1
| D-72070 Tuebingen
| Germany
| info@zopyx.com
| www.zopyx.com
