{ "info": { "author": "Sanhe Hu", "author_email": "husanhe@gmail.com", "bugtrack_url": null, "classifiers": [ "Development Status :: 4 - Beta", "Intended Audience :: Developers", "License :: OSI Approved :: MIT License", "Natural Language :: English", "Operating System :: MacOS", "Operating System :: Microsoft :: Windows", "Operating System :: Unix", "Programming Language :: Python", "Programming Language :: Python :: 2.7", "Programming Language :: Python :: 3.4", "Programming Language :: Python :: 3.5", "Programming Language :: Python :: 3.6" ], "description": ".. image:: https://travis-ci.org/MacHu-GWU/crawlib-project.svg?branch=master\n :target: https://travis-ci.org/MacHu-GWU/crawlib-project?branch=master\n\n.. image:: https://codecov.io/gh/MacHu-GWU/crawlib-project/branch/master/graph/badge.svg\n :target: https://codecov.io/gh/MacHu-GWU/crawlib-project\n\n.. image:: https://img.shields.io/pypi/v/crawlib.svg\n :target: https://pypi.python.org/pypi/crawlib\n\n.. image:: https://img.shields.io/pypi/l/crawlib.svg\n :target: https://pypi.python.org/pypi/crawlib\n\n.. image:: https://img.shields.io/pypi/pyversions/crawlib.svg\n :target: https://pypi.python.org/pypi/crawlib\n\n.. image:: https://img.shields.io/badge/Star_Me_on_GitHub!--None.svg?style=social\n :target: https://github.com/MacHu-GWU/crawlib-project\n\n\nWelcome to ``crawlib`` Documentation\n==============================================================================\nCrawl library provides crawler project building block to simplify:\n\n1. url encoding.\n2. html parse.\n3. error handling.\n4. download html and file.\n5. request cache.\n6. duplicate filter.\n7. width first crawl strategy.\n\nIn addition, it is a web crawl framework for width first style crawling.\n\nFor example, suppose the target data is organized in a tree structure, for instance, State -> City -> Zipcode -> Street -> Address. Then ``crawlib`` is born for it.\n\nHere is an `Example Project `_ for scraping data from https://crawlib.readthedocs.io/_static/state-list.html.\n\n\nQuick Links\n------------------------------------------------------------------------------\n\n- .. image:: https://img.shields.io/badge/Link-Document-red.svg\n :target: https://crawlib.readthedocs.io/index.html\n\n- .. image:: https://img.shields.io/badge/Link-API_Reference_and_Source_Code-red.svg\n :target: API reference and source code