{ "info": { "author": "Grammy Jiang", "author_email": "grammy.jiang@gmail.com", "bugtrack_url": null, "classifiers": [ "Development Status :: 2 - Pre-Alpha", "Environment :: Plugins", "Framework :: Scrapy", "Intended Audience :: Developers", "License :: OSI Approved :: BSD License", "Operating System :: OS Independent", "Programming Language :: Python", "Programming Language :: Python :: 3", "Programming Language :: Python :: 3.5", "Programming Language :: Python :: 3.6", "Topic :: Internet :: WWW/HTTP", "Topic :: Software Development :: Libraries :: Python Modules" ], "description": "===========================\nFrontera-SeedLoader-MongoDB\n===========================\n\n.. image:: https://img.shields.io/pypi/v/frontera-seedloader-mongodb.svg\n :target: https://pypi.python.org/pypi/frontera-seedloader-mongodb\n :alt: PyPI Version\n\n.. image:: https://img.shields.io/travis/grammy-jiang/frontera-seedloader-mongodb/master.svg\n :target: http://travis-ci.org/grammy-jiang/frontera-seedloader-mongodb\n :alt: Build Status\n\n.. image:: https://img.shields.io/badge/wheel-yes-brightgreen.svg\n :target: https://pypi.python.org/pypi/frontera-seedloader-mongodb\n :alt: Wheel Status\n\n.. image:: https://img.shields.io/codecov/c/github/grammy-jiang/frontera-seedloader-mongodb/master.svg\n :target: http://codecov.io/github/grammy-jiang/frontera-seedloader-mongodb?branch=master\n :alt: Coverage report\n\nOverview\n========\n\nFrontera is a great framework for broad crawling, especially working with\nscrapy. This package provides a seed loader from MongoDB in a sync ways for\nfrontera:\n\n* Querying seeds can be customized\n\nRequirements\n============\n\n* pymongo\n\n* Tests on Python 3.5\n\n* Tests on Linux, but it's a pure python module, should work on other platforms\n with official python and Twisted supported\n\nInstallation\n============\n\nThe quick way::\n\n pip install frontera-seedloader-mongodb\n\nOr put this middleware just beside the scrapy project.\n\nDocumentation\n=============\n\nSet Seed Loader in ``SPIDER_MIDDLEWARES`` in ``settings.py``, for example::\n\n # -----------------------------------------------------------------------------\n # FRONTERA SEEDLOADER MONGODB ASYNC\n # -----------------------------------------------------------------------------\n\n SPIDER_MIDDLEWARES.update({\n 'frontera_seedloader_mongodb.contrib.scrapy.middlewares.seeds.mongodb.MongoDBAsyncSeedLoader': 500,\n })\n\n SEEDS_MONGODB_USERNAME = 'user'\n SEEDS_MONGODB_PASSWORD = 'password'\n SEEDS_MONGODB_HOST = 'localhost'\n SEEDS_MONGODB_PORT = 27017\n SEEDS_MONGODB_DATABASE = 'test_mongodb_async_db'\n SEEDS_MONGODB_COLLECTION = 'test_mongodb_async_coll'\n\n # SEEDS_MONGODB_OPTIONS_ = 'SEEDS_MONGODB_OPTIONS_'\n\n SEEDS_MONGODB_SEEDS_QUERY = {\n 'filter': {'websites': {'$exists': 1}}\n }\n SEEDS_MONGODB_SEEDS_BATCH_SIZE = 1000\n\n SEEDS_MONGODB_SEEDS_PREPARE = 'scrapy_project.utils.seeds_prepara'\n\nSettings Reference\n==================\n\nSEEDS_MONGODB_USERNAME\n----------------------\n\nA string of the username of the database.\n\nSEEDS_MONGODB_PASSWORD\n----------------------\n\nA string of the password of the database.\n\nSEEDS_MONGODB_HOST\n------------------\n\nA string of the ip address or the domain of the database.\n\nSEEDS_MONGODB_PORT\n------------------\n\nA int of the port of the database.\n\nSEEDS_MONGODB_DATABASE\n----------------------\n\nA string of the name of the database.\n\nSEEDS_MONGODB_COLLECTION\n------------------------\n\nA list of the indexes to create on the collection.\n\nSEEDS_MONGODB_OPTIONS_*\n-----------------------\n\nOptions can be attached when the seed loader start to connect to MongoBD.\n\nIf any options are needed, the option can be set with the prefix\n``SEEDS_MONGODB_OPTIONS_``, the pipeline will parse it.\n\nFor example:\n\n+---------------+-------------------------------------+\n| option name | in ``settings.py`` |\n+---------------+-------------------------------------+\n| authMechanism | SEEDS_MONGODB_OPTIONS_authMechanism |\n+---------------+-------------------------------------+\n\nFor more options, please refer to the page:\n\n* `Connection String URI Format \u2014 MongoDB Manual 3.4`_\n\n.. _`Connection String URI Format \u2014 MongoDB Manual 3.4`: https://docs.mongodb.com/manual/reference/connection-string/#connections-standard-connection-string-format\n\n\nSEEDS_MONGODB_SEEDS_QUERY\n-------------------------\n\nA dictionary as the keyword arguments to query. The default value is::\n\n SEEDS_MONGODB_SEEDS_QUERY = {\n 'filter': None,\n 'projection': None,\n 'skip': 0,\n 'limit': 0,\n 'no_cursor_timeout': False,\n 'cursor_type': CursorType.NON_TAILABLE,\n 'sort': None,\n 'allow_partial_results': False,\n 'oplog_replay': False,\n 'modifiers': None,\n 'manipulate': True\n }\n\nThe keys and values in this setting is followed the keyword arguments of the\nmethod ``find`` of ``collection`` in ``pymongo``. Please refer to\nthe following documents for more details.\n\n* `collection \u2013 Collection level operations \u2014 PyMongo 3.5.1 documentation`_\n\n.. _`collection \u2013 Collection level operations \u2014 PyMongo 3.5.1 documentation`: http://api.mongodb.com/python/current/api/pymongo/collection.html#pymongo.collection.Collection.find\n\nSEEDS_MONGODB_SEEDS_BATCH_SIZE\n------------------------------\n\nA int of The batch size that each query will return, the default value is 100.\n\nSEEDS_MONGODB_SEEDS_PREPARE\n---------------------------\n\nA string of the module path to the function to process the task (seed) from\nMongoDB.\n\nThe input will be the document queried from the collection set in\n``SEEDS_MONGODB_COLLECTION``, and the output should be a iterator which will\nreturn tuples with two elements: ``(url, doc)``. The ``url`` will be the\nargument ``url`` of ``Request``, and the ``doc`` will be given to\n``request.meta``.\nAs an example, the default function in this middleware::\n\n class MongoDBSeedLoader(SeedLoader):\n\n ...\n\n def open_spider(self, spider: Spider):\n try:\n if self.settings.get(SEEDS_MONGODB_SEEDS_PREPARE):\n self.prepare = load_object(\n self.settings.get(SEEDS_MONGODB_SEEDS_PREPARE))\n else:\n self.prepare = lambda x: map(\n lambda y: (y, {'seed': x}),\n x['websites'])\n except:\n raise NotConfigured\n\n ...\n\n ...\n\nNOTE\n====\n\nThe database drivers may have different api for the same operation, this\nseed loader adopts pymongo as the sync driver for MongoDB. If you want to\ncustomize this seed loader, please read the following documents for more\ndetails.\n\n* `pymongo \u2013 Python driver for MongoDB`_\n\n.. _`pymongo \u2013 Python driver for MongoDB`: http://api.mongodb.com/python/current/api/pymongo/\n\nTODO\n====\n\n* add an async way to load the seed from MongoDB\n* split the MongoDB to backend, make this seed load more flexible\n\n\n", "description_content_type": null, "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/grammy-jiang/frontera-seedloader-mongodb", "keywords": "", "license": "BSD", "maintainer": "", "maintainer_email": "", "name": "frontera-seedloader-mongodb", "package_url": "https://pypi.org/project/frontera-seedloader-mongodb/", "platform": "", "project_url": "https://pypi.org/project/frontera-seedloader-mongodb/", "project_urls": { "Homepage": "https://github.com/grammy-jiang/frontera-seedloader-mongodb" }, "release_url": "https://pypi.org/project/frontera-seedloader-mongodb/0.0.6/", "requires_dist": [ "frontera", "scrapy (>=1.4.0)", "txmongo" ], "requires_python": "", "summary": "", "version": "0.0.6" }, "last_serial": 3330904, "releases": { "0.0.2": [ { "comment_text": "", "digests": { "md5": "a76e4cebb9cb2478e5a57855ccc43cc2", "sha256": "14c1efb3bf0a963a5539d055041f3361ca860bde1fa4b1f2a8fe02c8b9d5ba16" }, "downloads": -1, "filename": "frontera_seedloader_mongodb-0.0.2-py3-none-any.whl", "has_sig": false, "md5_digest": "a76e4cebb9cb2478e5a57855ccc43cc2", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 11972, "upload_time": "2017-11-14T06:18:15", "url": "https://files.pythonhosted.org/packages/f9/70/e1297319c8584ab296088334edea344bde2675ac73c617a105a31216d302/frontera_seedloader_mongodb-0.0.2-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "72ada0a5d87388f0f82dbc7a57f95063", "sha256": "ac091da46d801fa4ca4d4a01349a10bda34226bc84f9752a17b9e4b87289e827" }, "downloads": -1, "filename": "frontera-seedloader-mongodb-0.0.2.tar.gz", "has_sig": false, "md5_digest": "72ada0a5d87388f0f82dbc7a57f95063", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 24088, "upload_time": "2017-11-14T06:18:17", "url": "https://files.pythonhosted.org/packages/7f/57/39ad488bbcac010d3de85c2b085f2b5a96567c3bbb848102e31e954bdb8a/frontera-seedloader-mongodb-0.0.2.tar.gz" } ], "0.0.3": [ { "comment_text": "", "digests": { "md5": "733571bcf41fe0bc46b7692590a535e9", "sha256": "84cdb6676351a33f41a14de678974f9ff6f6d12cf4fa4bbb76a8338f8d85ac1c" }, "downloads": -1, "filename": "frontera_seedloader_mongodb-0.0.3-py3-none-any.whl", "has_sig": false, "md5_digest": "733571bcf41fe0bc46b7692590a535e9", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 12209, "upload_time": "2017-11-14T06:44:44", "url": "https://files.pythonhosted.org/packages/be/fd/2cb8e903f37edac544fc2a1200149af02161991e61dc61abb86c35bd5915/frontera_seedloader_mongodb-0.0.3-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "0b452e1959b241364754d2478e4c4789", "sha256": "4418511d1a8a912f8a8a3f9975a9b6c7a72d29a2292218b50625e454b8959b0c" }, "downloads": -1, "filename": "frontera-seedloader-mongodb-0.0.3.tar.gz", "has_sig": false, "md5_digest": "0b452e1959b241364754d2478e4c4789", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 24325, "upload_time": "2017-11-14T06:44:45", "url": "https://files.pythonhosted.org/packages/74/f5/8eea01fd6a350ea8064cbc2ee976d1b8b184e08298b8dd53265cae9afb5b/frontera-seedloader-mongodb-0.0.3.tar.gz" } ], "0.0.4": [ { "comment_text": "", "digests": { "md5": "ce02d543f65b7113b273d1a5798b4794", "sha256": "e5ada7cf00c9ff8acd2b2818a6074a3777bac36c750aee47eaffcac803c35269" }, "downloads": -1, "filename": "frontera_seedloader_mongodb-0.0.4-py3-none-any.whl", "has_sig": false, "md5_digest": "ce02d543f65b7113b273d1a5798b4794", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 12159, "upload_time": "2017-11-14T07:52:10", "url": "https://files.pythonhosted.org/packages/9c/1b/6ef681edef131783e348d07f260bad7166c8239403cdb2fb00c492f2de19/frontera_seedloader_mongodb-0.0.4-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "4ce753cf252a62ff6f07fc82eaad9e0a", "sha256": "d4299ddef329b713496958db74d603bf182936933841b9a7cbe7af3a32fc0291" }, "downloads": -1, "filename": "frontera-seedloader-mongodb-0.0.4.tar.gz", "has_sig": false, "md5_digest": "4ce753cf252a62ff6f07fc82eaad9e0a", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 24280, "upload_time": "2017-11-14T07:52:11", "url": "https://files.pythonhosted.org/packages/e8/0b/b566638f03826c1b972b7a2b105ee14fb75a0edc97627d11c10d88998523/frontera-seedloader-mongodb-0.0.4.tar.gz" } ], "0.0.5": [ { "comment_text": "", "digests": { "md5": "b48571a2064dc0dee80c3f14c8faa888", "sha256": "285c451e8e237f4881b6b9b46554bcfdd8317fa71312cfa607d46f6aae35d796" }, "downloads": -1, "filename": "frontera_seedloader_mongodb-0.0.5-py3-none-any.whl", "has_sig": false, "md5_digest": "b48571a2064dc0dee80c3f14c8faa888", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 12151, "upload_time": "2017-11-14T08:08:28", "url": "https://files.pythonhosted.org/packages/b3/9c/9e961ee6fbe5593b685b4ffc45f55675e3c4e55f0605d70a68f519a82dbb/frontera_seedloader_mongodb-0.0.5-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "7fcf2ca1557544fa75781bfd1781b8af", "sha256": "a1ff3d87d7b80eb63ca6885b1a0252d8722bbe4082e33b0bd0e5b9c8e8fd3c3f" }, "downloads": -1, "filename": "frontera-seedloader-mongodb-0.0.5.tar.gz", "has_sig": false, "md5_digest": "7fcf2ca1557544fa75781bfd1781b8af", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 24291, "upload_time": "2017-11-14T08:08:29", "url": "https://files.pythonhosted.org/packages/52/dc/f1713b36c918b29883ed587641afbde6e3a4c9b6fd3c3c83ae85277f2342/frontera-seedloader-mongodb-0.0.5.tar.gz" } ], "0.0.6": [ { "comment_text": "", "digests": { "md5": "e6bc3615738bc2b923e8db5706764fe1", "sha256": "a07575e32009fe41488d8940f13ae1eee6b2043c9d876ccecccaa3b28844c42a" }, "downloads": -1, "filename": "frontera_seedloader_mongodb-0.0.6-py3-none-any.whl", "has_sig": false, "md5_digest": "e6bc3615738bc2b923e8db5706764fe1", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 12495, "upload_time": "2017-11-14T08:39:16", "url": "https://files.pythonhosted.org/packages/19/b6/4be15fb8894648dcb8ea48bb68cead16efbc0f429d97dea46b6f5cc6978d/frontera_seedloader_mongodb-0.0.6-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "0ecf61d728547e2192c1d47e93938195", "sha256": "97a15a93e194a3613c9ce2309345b40ae8d2e331f9e0ef97deb36b3aab9be7e7" }, "downloads": -1, "filename": "frontera-seedloader-mongodb-0.0.6.tar.gz", "has_sig": false, "md5_digest": "0ecf61d728547e2192c1d47e93938195", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 24557, "upload_time": "2017-11-14T08:39:18", "url": "https://files.pythonhosted.org/packages/c0/e7/447e6bebc46b032b62370464239304ad096394f269d5ea6bfc7832e15859/frontera-seedloader-mongodb-0.0.6.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "e6bc3615738bc2b923e8db5706764fe1", "sha256": "a07575e32009fe41488d8940f13ae1eee6b2043c9d876ccecccaa3b28844c42a" }, "downloads": -1, "filename": "frontera_seedloader_mongodb-0.0.6-py3-none-any.whl", "has_sig": false, "md5_digest": "e6bc3615738bc2b923e8db5706764fe1", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 12495, "upload_time": "2017-11-14T08:39:16", "url": "https://files.pythonhosted.org/packages/19/b6/4be15fb8894648dcb8ea48bb68cead16efbc0f429d97dea46b6f5cc6978d/frontera_seedloader_mongodb-0.0.6-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "0ecf61d728547e2192c1d47e93938195", "sha256": "97a15a93e194a3613c9ce2309345b40ae8d2e331f9e0ef97deb36b3aab9be7e7" }, "downloads": -1, "filename": "frontera-seedloader-mongodb-0.0.6.tar.gz", "has_sig": false, "md5_digest": "0ecf61d728547e2192c1d47e93938195", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 24557, "upload_time": "2017-11-14T08:39:18", "url": "https://files.pythonhosted.org/packages/c0/e7/447e6bebc46b032b62370464239304ad096394f269d5ea6bfc7832e15859/frontera-seedloader-mongodb-0.0.6.tar.gz" } ] }