Metadata-Version: 1.0
Name: bg.crawler
Version: 0.1
Summary: Solr crawler for BG Phoenics
Home-page: http://pypi.python.org/pypi/bg.crawler
Author: Andreas Jung
Author-email: info@zopyx.com
License: GPL
Description: .. contents::
        
        Introduction
        ============
        
        ``bg.crawler`` is a command-line frontend for feeding a tree of files (a directory)
        into a Solr for indexing
        
        Usage
        =====
        
        Command line options::
        
            blackmoon:~/src/bg.crawler> bin/solr-crawler --help
            usage: solr-crawler [-h] [--solr-url SOLR_URL] [--max-depth MAX_DEPTH]
                                [--batch-size BATCH_SIZE] [--tag TAG] [--clear-all]
                                [--clear-tag SOLR_CLEAR_TAG] [--verbose] [--no-type-check]
                                <directory>
        
            Commandline parser
        
            positional arguments:
              <directory>           Directory to be crawled
        
            optional arguments:
              -h, --help            show this help message and exit
              --solr-url SOLR_URL   SOLR server URL
              --max-depth MAX_DEPTH
                                    maximum folder depth
              --batch-size BATCH_SIZE
                                    Solr batch size
              --tag TAG             Solr import tag
              --clear-all           Clear the Solr indexes before crawling
              --clear-tag SOLR_CLEAR_TAG
                                    Remove all items from Solr indexed tagged with the
                                    given tag
              --verbose             Verbose logging
              --no-type-check       Apply extension filter while crawling
        
        
        * ``--solr-url`` defines the URL of the SOLR server
        
        * ``--max-depth`` limits the crawler to a given folder depth
        
        * ``--batch-size`` insert N documents within one batch before
          sending a commit to Solr (default behavior: every single
          add to the Solr indexed will be committed)
        
        * ``--tag`` will tag the imported document(s) with a string
          (this may be useful importing different document sources
          into Solr while supporting the option to filter by tag
          at query time)
        
        * ``--clear-all`` clear the complete Solr index before running
          the import
        
        * ``--clear-tag`` remove all documents with the given tag before
          running the import
        
        * ``--verbose`` enable extensive logging
        
        * ``--no-type-check`` if set: do not apply any type check filtering
          but instead pass all file types to Solr
        
        Licence
        =======
        
        ``bg.crawler`` is published under the GNU Public Licence V2 (GPL 2)
        
        Author
        ======
        
        Written by 
        
        | ZOPYX Ltd.
        | c/o Andreas Jung
        | Charlottenstr. 37/1
        | D-72070 Tuebingen
        | Germany
        | info@zopyx.com
        | www.zopyx.com
        
        Contributors
        ============
        
        Changelog
        =========
        
        0.1. (2011-11-11)
        -------------------
        
        - initial release
          [ajung]
        
        
Keywords: Solr Python
Platform: UNKNOWN
Classifier: Programming Language :: Python
