{ "info": { "author": "Noah Levitt", "author_email": "nlevitt@archive.org", "bugtrack_url": null, "classifiers": [ "Development Status :: 5 - Production/Stable", "Environment :: Console", "License :: OSI Approved :: Apache Software License", "Programming Language :: Python :: 3.4", "Programming Language :: Python :: 3.5", "Programming Language :: Python :: 3.6", "Programming Language :: Python :: 3.7", "Topic :: Internet :: WWW/HTTP", "Topic :: System :: Archiving" ], "description": ".. image:: https://travis-ci.org/internetarchive/brozzler.svg?branch=1.4\n :target: https://travis-ci.org/internetarchive/brozzler\n\n.. |logo| image:: https://cdn.rawgit.com/internetarchive/brozzler/1.1b12/brozzler/dashboard/static/brozzler.svg\n :width: 60px\n\n|logo| brozzler\n===============\n\"browser\" \\| \"crawler\" = \"brozzler\"\n\nBrozzler is a distributed web crawler (\u722c\u866b) that uses a real browser (Chrome\nor Chromium) to fetch pages and embedded URLs and to extract links. It employs\n`youtube-dl `_ to enhance media capture\ncapabilities and `rethinkdb `_ to\nmanage crawl state.\n\nBrozzler is designed to work in conjuction with warcprox for web archiving.\n\nRequirements\n------------\n\n- Python 3.4 or later\n- RethinkDB deployment\n- Chromium or Google Chrome >= version 64\n\nNote: The browser requires a graphical environment to run. When brozzler is run\non a server, this may require deploying some additional infrastructure,\ntypically X11. Xvnc4 and Xvfb are X11 variants that are suitable for use on a\nserver, because they don't display anything to a physical screen. The `vagrant\nconfiguration `_ in the brozzler repository has an example setup\nusing Xvnc4. (When last tested, chromium on Xvfb did not support screenshots,\nso Xvnc4 is preferred at this time.)\n\nGetting Started\n---------------\n\nThe easiest way to get started with brozzler for web archiving is with\n``brozzler-easy``. Brozzler-easy runs brozzler-worker, warcprox, brozzler\nwayback, and brozzler-dashboard, configured to work with each other in a single\nprocess.\n\nMac instructions:\n\n::\n\n # install and start rethinkdb\n brew install rethinkdb\n # no brew? try rethinkdb's installer: https://www.rethinkdb.com/docs/install/osx/\n rethinkdb &>>rethinkdb.log &\n\n # install brozzler with special dependencies pywb and warcprox\n pip install brozzler[easy] # in a virtualenv if desired\n\n # queue a site to crawl\n brozzler-new-site http://example.com/\n\n # or a job\n brozzler-new-job job1.yml\n\n # start brozzler-easy\n brozzler-easy\n\nAt this point brozzler-easy will start archiving your site. Results will be\nimmediately available for playback in pywb at http://localhost:8880/brozzler/.\n\n*Brozzler-easy demonstrates the full brozzler archival crawling workflow, but\ndoes not take advantage of brozzler's distributed nature.*\n\nInstallation and Usage\n----------------------\n\nTo install brozzler only::\n\n pip install brozzler # in a virtualenv if desired\n\nLaunch one or more workers: [*]_ ::\n\n brozzler-worker --warcprox-auto\n\nSubmit jobs::\n\n brozzler-new-job myjob.yaml\n\nSubmit sites not tied to a job::\n\n brozzler-new-site --time-limit=600 http://example.com/\n\n.. [*] A note about ``--warcprox-auto``: this option tells brozzler to\n look for a healthy warcprox instance in the `rethinkdb service registry\n `_. For\n this to work you need to have at least one instance of warcprox running,\n with the ``--rethinkdb-services-url`` option pointing to the same rethinkdb\n services table that brozzler is using. Using ``--warcprox-auto`` is\n recommended for clustered deployments.\n\nJob Configuration\n-----------------\n\nBrozzler jobs are defined using YAML files. Options may be specified either at\nthe top-level or on individual seeds. At least one seed URL must be specified,\nhowever everything else is optional. For details, see ``_.\n\n::\n\n id: myjob\n time_limit: 60 # seconds\n proxy: 127.0.0.1:8000 # point at warcprox for archiving\n ignore_robots: false\n warcprox_meta: null\n metadata: {}\n seeds:\n - url: http://one.example.org/\n - url: http://two.example.org/\n time_limit: 30\n - url: http://three.example.org/\n time_limit: 10\n ignore_robots: true\n scope:\n surt: http://(org,example,\n\nBrozzler Dashboard\n------------------\n\nBrozzler comes with a rudimentary web application for viewing crawl job status.\nTo install the brozzler with dependencies required to run this app, run\n\n::\n\n pip install brozzler[dashboard]\n\n\nTo start the app, run\n\n::\n\n brozzler-dashboard\n\nAt this point Brozzler Dashboard will be accessible at http://localhost:8000/.\n\n.. image:: Brozzler-Dashboard.png\n\nSee ``brozzler-dashboard --help`` for configuration options.\n\nBrozzler Wayback\n----------------\n\nBrozzler comes with a customized version of `pywb\n`_, which supports using the rethinkdb\n\"captures\" table (populated by warcprox) as its index.\n\nTo use, first install dependencies.\n\n::\n\n pip install brozzler[easy]\n\nWrite a configuration file pywb.yml.\n\n::\n\n # 'archive_paths' should point to the output directory of warcprox\n archive_paths: warcs/ # pywb will fail without a trailing slash\n collections:\n brozzler:\n index_paths: !!python/object:brozzler.pywb.RethinkCDXSource\n db: brozzler\n table: captures\n servers:\n - localhost\n enable_auto_colls: false\n enable_cdx_api: true\n framed_replay: true\n port: 8880\n\nRun pywb like so:\n\n::\n\n $ PYWB_CONFIG_FILE=pywb.yml brozzler-wayback\n\nThen browse http://localhost:8880/brozzler/.\n\n.. image:: Brozzler-Wayback.png\n\nHeadless Chrome (experimental)\n------------------------------\n\nBrozzler is known to work nominally with Chrome/Chromium in headless mode, but\nthis has not yet been extensively tested.\n\nLicense\n-------\n\nCopyright 2015-2018 Internet Archive\n\nLicensed under the Apache License, Version 2.0 (the \"License\"); you may\nnot use this software except in compliance with the License. You may\nobtain a copy of the License at\n\n::\n\n http://www.apache.org/licenses/LICENSE-2.0\n\nUnless required by applicable law or agreed to in writing, software\ndistributed under the License is distributed on an \"AS IS\" BASIS,\nWITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\nSee the License for the specific language governing permissions and\nlimitations under the License.\n\n", "description_content_type": "", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/internetarchive/brozzler", "keywords": "", "license": "Apache License 2.0", "maintainer": "", "maintainer_email": "", "name": "brozzler", "package_url": "https://pypi.org/project/brozzler/", "platform": "", "project_url": "https://pypi.org/project/brozzler/", "project_urls": { "Homepage": "https://github.com/internetarchive/brozzler" }, "release_url": "https://pypi.org/project/brozzler/1.5.7/", "requires_dist": null, "requires_python": "", "summary": "Distributed web crawling with browsers", "version": "1.5.7" }, "last_serial": 5827304, "releases": { "1.1.dev10": [ { "comment_text": "", "digests": { "md5": "81a6ac3ad960cd82967dd761f41ba777", "sha256": "e8f7e98e916ed1b785d0ab117bd9ab7199a40b2d4a4507f7c93974661ca5449a" }, "downloads": -1, "filename": "brozzler-1.1.dev10.tar.gz", "has_sig": false, "md5_digest": "81a6ac3ad960cd82967dd761f41ba777", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 30197, "upload_time": "2016-05-11T18:50:09", "url": "https://files.pythonhosted.org/packages/88/07/f666af7199c98495bc7882318100b5f08a801f2ddf50b0353074ccb318e5/brozzler-1.1.dev10.tar.gz" } ], "1.1.dev11": [ { "comment_text": "", "digests": { "md5": "3cc234636b3be2c02ea6e80a80edc68b", "sha256": "8ed24bf578dcaea6a1ac13267ec5755596c5a5abc8e6c381e1330ecb478f0eea" }, "downloads": -1, "filename": "brozzler-1.1.dev11.tar.gz", "has_sig": false, "md5_digest": "3cc234636b3be2c02ea6e80a80edc68b", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 30039, "upload_time": "2016-05-11T19:06:04", "url": "https://files.pythonhosted.org/packages/70/58/bfb692a09f144a421a9ea474eb4bbb03f145ebda455ac925743963151df8/brozzler-1.1.dev11.tar.gz" } ], "1.1.dev12": [ { "comment_text": "", "digests": { "md5": "0487d72a9919b85c62f2fa07967fa90e", "sha256": "aace3b322ed3cc61624c8288d5001b22781dd739fdcd7237cb7ca8bcf986ce0a" }, "downloads": -1, "filename": "brozzler-1.1.dev12.tar.gz", "has_sig": false, "md5_digest": "0487d72a9919b85c62f2fa07967fa90e", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 30043, "upload_time": "2016-05-11T19:10:30", "url": "https://files.pythonhosted.org/packages/d8/46/8043381457f25e3f2405b2a98ff97d574791ab3d16a63dec3dc1302afae6/brozzler-1.1.dev12.tar.gz" } ], "1.1b1": [ { "comment_text": "", "digests": { "md5": "c7f04b90bce6f7f8d22e89ae3b651449", "sha256": "296bacfff4ee9de707c745e71e365b198f0bc02ff94cb1a8bf8f0d3bbd9121de" }, "downloads": -1, "filename": "brozzler-1.1b1.tar.gz", "has_sig": false, "md5_digest": "c7f04b90bce6f7f8d22e89ae3b651449", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 30918, "upload_time": "2016-06-16T19:12:34", "url": "https://files.pythonhosted.org/packages/b3/95/29af1b464f97f266b8c1782ece57470e84b85606884d9ee5c904cbf97ef6/brozzler-1.1b1.tar.gz" } ], "1.1b10": [ { "comment_text": "", "digests": { "md5": "cd54128bc74847874c1ec2da3131927d", "sha256": "86e46118142453c32846e1315664dc8a27a88a4aa4b4696471aacff535e10625" }, "downloads": -1, "filename": "brozzler-1.1b10.tar.gz", "has_sig": false, "md5_digest": "cd54128bc74847874c1ec2da3131927d", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 668080, "upload_time": "2017-03-22T23:12:10", "url": "https://files.pythonhosted.org/packages/64/b4/d1a504f2308b7280b790a4e21858bbcde4515916708a732561f3a018c3f0/brozzler-1.1b10.tar.gz" } ], "1.1b11": [ { "comment_text": "", "digests": { "md5": "1105d820039052bea16cea36054b8d74", "sha256": "9d44bf412c13c6c05c7c0324a6cbb940b6b46d2d81a996059034700f049f189f" }, "downloads": -1, "filename": "brozzler-1.1b11.tar.gz", "has_sig": false, "md5_digest": "1105d820039052bea16cea36054b8d74", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 671028, "upload_time": "2017-06-09T00:31:17", "url": "https://files.pythonhosted.org/packages/3d/99/b10185866f3126e6e8eeedb223a5228415b21176b69df7f0cdd2f74d0133/brozzler-1.1b11.tar.gz" } ], "1.1b12": [ { "comment_text": "", "digests": { "md5": "4a6bd5cfcbacd21c819bc7d2f9c09e8b", "sha256": "72a3f21b82fedcb2a05365365babf468d69068ae855847be46520d50b8d056f5" }, "downloads": -1, "filename": "brozzler-1.1b12.tar.gz", "has_sig": false, "md5_digest": "4a6bd5cfcbacd21c819bc7d2f9c09e8b", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 674840, "upload_time": "2018-02-03T00:53:38", "url": "https://files.pythonhosted.org/packages/3c/6e/c3ef4f3675bf84fdbca51d0e40dbdff060ba2e65b8678d3c5543d6c68181/brozzler-1.1b12.tar.gz" } ], "1.1b2": [ { "comment_text": "", "digests": { "md5": "bf6cca609085d01a5e18728c31f9b463", "sha256": "b86310d5e87ce25f02e8246310a6abac6aac2288b05187069a35c9150669fcf0" }, "downloads": -1, "filename": "brozzler-1.1b2.tar.gz", "has_sig": false, "md5_digest": "bf6cca609085d01a5e18728c31f9b463", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 30907, "upload_time": "2016-06-16T19:18:32", "url": "https://files.pythonhosted.org/packages/2d/63/88c32f44b8ae80111694c3a51373a29d45720d2228b4fddeb459171a6e9e/brozzler-1.1b2.tar.gz" } ], "1.1b3": [ { "comment_text": "", "digests": { "md5": "09c3c66ab271d5b7c0d5052e7917d52c", "sha256": "77c556504ca0f3702db7725266152381fd9774fb462e1b40636ebefb4b4cb9e9" }, "downloads": -1, "filename": "brozzler-1.1b3.tar.gz", "has_sig": false, "md5_digest": "09c3c66ab271d5b7c0d5052e7917d52c", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 37621, "upload_time": "2016-07-27T21:55:07", "url": "https://files.pythonhosted.org/packages/74/48/1bbb0895682b0257c7b2af8fd4c098e6783a6352021359986624d78e2ae3/brozzler-1.1b3.tar.gz" } ], "1.1b4": [ { "comment_text": "", "digests": { "md5": "054c191e8cf63b21438e3ebfc55578b2", "sha256": "7484aff52ae1c732af7e66920aaaf378b157e89366ebaf4c121b06d17a217c41" }, "downloads": -1, "filename": "brozzler-1.1b4.tar.gz", "has_sig": false, "md5_digest": "054c191e8cf63b21438e3ebfc55578b2", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 652689, "upload_time": "2016-08-04T22:56:17", "url": "https://files.pythonhosted.org/packages/fe/93/29461eee24a402f616ee191d80f2a9d67e3e04e5f057ad5b428079329273/brozzler-1.1b4.tar.gz" } ], "1.1b5": [ { "comment_text": "", "digests": { "md5": "3206a8c9db4a6d665c9b873f87d44742", "sha256": "27644e659aad397fb8672b14c9883c5339979feb7b31bfea33e872d035a67d3d" }, "downloads": -1, "filename": "brozzler-1.1b5.tar.gz", "has_sig": false, "md5_digest": "3206a8c9db4a6d665c9b873f87d44742", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 652695, "upload_time": "2016-08-05T00:34:06", "url": "https://files.pythonhosted.org/packages/49/9a/5c2589cf1183689dfe33948c5158fb2548fc1fd2296b26967dfbaaf7dd81/brozzler-1.1b5.tar.gz" } ], "1.1b6": [ { "comment_text": "", "digests": { "md5": "8b2a7892f7093567d6f11c371ac2d8da", "sha256": "b325a5ba0cb693882ab0b4613c9311aa6b7858cd8ec8a2b7ad285ad7b6918e37" }, "downloads": -1, "filename": "brozzler-1.1b6.tar.gz", "has_sig": false, "md5_digest": "8b2a7892f7093567d6f11c371ac2d8da", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 50895, "upload_time": "2016-10-13T22:09:05", "url": "https://files.pythonhosted.org/packages/ae/62/6f50fffa100223c5bcd98db01306a66b3f19e2a2a1972ed1d4b62cc6244f/brozzler-1.1b6.tar.gz" } ], "1.1b6.dev94": [], "1.1b7": [ { "comment_text": "", "digests": { "md5": "d36eed9fcb7b4c01e065575bbe79ee2e", "sha256": "6cf5591e0d565ca620f69287540df143cb47c081c3a14acce2716aa32ec22a0e" }, "downloads": -1, "filename": "brozzler-1.1b7.tar.gz", "has_sig": false, "md5_digest": "d36eed9fcb7b4c01e065575bbe79ee2e", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 661876, "upload_time": "2016-11-11T22:59:10", "url": "https://files.pythonhosted.org/packages/9c/eb/11d1116afa6107d7d7008140c615e9a08d9dda4449444eddec790c63845e/brozzler-1.1b7.tar.gz" } ], "1.1b8": [ { "comment_text": "", "digests": { "md5": "7cf3a11fde5b69f54e1543e288580092", "sha256": "5db163e01270ae7f695a8a252dbb527ded2ed0e92dd3ae1a7d86a53eecea0df5" }, "downloads": -1, "filename": "brozzler-1.1b8.tar.gz", "has_sig": false, "md5_digest": "7cf3a11fde5b69f54e1543e288580092", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 662667, "upload_time": "2016-12-15T20:35:56", "url": "https://files.pythonhosted.org/packages/a2/6a/eb8b2e0d80a55a460daed218888871e18e429ba6afb87d72caad47123527/brozzler-1.1b8.tar.gz" } ], "1.1b9": [ { "comment_text": "", "digests": { "md5": "90405bb10f97310f750442e6728b5e7d", "sha256": "22b7c99b4305bd7b2e5e8b456b6a8a9548104575ef360c6c2f4de0445f82f9b6" }, "downloads": -1, "filename": "brozzler-1.1b9.tar.gz", "has_sig": false, "md5_digest": "90405bb10f97310f750442e6728b5e7d", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 668054, "upload_time": "2017-03-22T22:26:20", "url": "https://files.pythonhosted.org/packages/a3/64/0191c97ddf44f642a243ec12e14013d949182c06ac3661d566c20a564230/brozzler-1.1b9.tar.gz" } ], "1.3": [ { "comment_text": "", "digests": { "md5": "5b09bc5e414dffdb9b44bb89f95925cc", "sha256": "ff2d592e049ec4d705785f6c322cd6d4fc15a0fd31e207df34ae494d825435af" }, "downloads": -1, "filename": "brozzler-1.3.tar.gz", "has_sig": false, "md5_digest": "5b09bc5e414dffdb9b44bb89f95925cc", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 66322, "upload_time": "2018-06-25T19:19:17", "url": "https://files.pythonhosted.org/packages/c7/40/9e7e31b1f6d44e7d0e9e805027c389623e3d40b2611a12e2829d1832844d/brozzler-1.3.tar.gz" } ], "1.3.dev293": [ { "comment_text": "", "digests": { "md5": "c2f692aac1817e4a0068240c2e4dd146", "sha256": "484986c04a95b0af6b984c39ee6be7b86ef52126c5a3048974b8771b1890f341" }, "downloads": -1, "filename": "brozzler-1.3.dev293.tar.gz", "has_sig": false, "md5_digest": "c2f692aac1817e4a0068240c2e4dd146", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 66334, "upload_time": "2018-06-25T19:11:26", "url": "https://files.pythonhosted.org/packages/1c/2e/25b122dc6f68cbefb6195421b2d1e5a52e820be62e5f5e8cbac6f19474c3/brozzler-1.3.dev293.tar.gz" } ], "1.4": [ { "comment_text": "", "digests": { "md5": "3689888831c64d1c1b16ef1a33e29ce0", "sha256": "bea682dfd1855fa2af3f56d6a1030bd6ae66de11c64ecfaeac5b23e43cb6664f" }, "downloads": -1, "filename": "brozzler-1.4.tar.gz", "has_sig": false, "md5_digest": "3689888831c64d1c1b16ef1a33e29ce0", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 677717, "upload_time": "2018-08-22T23:08:24", "url": "https://files.pythonhosted.org/packages/56/70/8d3f6cd43dac528ec50a81df7c6b769e7ecd0c963fed2c4747e8bbca2ebf/brozzler-1.4.tar.gz" } ], "1.5.3": [ { "comment_text": "", "digests": { "md5": "45742143ceb51935a9d0cd1b260b00d8", "sha256": "f7ecaf842c5657aea3f544b6e5e137b548075d43246cf6d65cbb2ecbaf7f46ca" }, "downloads": -1, "filename": "brozzler-1.5.3.tar.gz", "has_sig": false, "md5_digest": "45742143ceb51935a9d0cd1b260b00d8", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 69542, "upload_time": "2019-04-11T23:06:06", "url": "https://files.pythonhosted.org/packages/d8/cf/9b586a9465526ace75663afb2be60083fc38b983fc091851dc38977f2320/brozzler-1.5.3.tar.gz" } ], "1.5.7": [ { "comment_text": "", "digests": { "md5": "0a4ad20d6042b93a92430700adc451eb", "sha256": "e6118da29393f2269e14ad12d110f152c2f2b798b4747c702191a2414490b9b2" }, "downloads": -1, "filename": "brozzler-1.5.7.tar.gz", "has_sig": false, "md5_digest": "0a4ad20d6042b93a92430700adc451eb", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 74457, "upload_time": "2019-09-13T18:57:06", "url": "https://files.pythonhosted.org/packages/22/2e/6981bd4c7fac1453acab3572a8e5008de5d87ac9dff7c73f3bb9aeb8abc1/brozzler-1.5.7.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "0a4ad20d6042b93a92430700adc451eb", "sha256": "e6118da29393f2269e14ad12d110f152c2f2b798b4747c702191a2414490b9b2" }, "downloads": -1, "filename": "brozzler-1.5.7.tar.gz", "has_sig": false, "md5_digest": "0a4ad20d6042b93a92430700adc451eb", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 74457, "upload_time": "2019-09-13T18:57:06", "url": "https://files.pythonhosted.org/packages/22/2e/6981bd4c7fac1453acab3572a8e5008de5d87ac9dff7c73f3bb9aeb8abc1/brozzler-1.5.7.tar.gz" } ] }