{
"info": {
"author": "Nicholas Brochu",
"author_email": "nicholas@serpent.ai",
"bugtrack_url": null,
"classifiers": [
"Development Status :: 5 - Production/Stable",
"Intended Audience :: Developers",
"License :: OSI Approved :: Apache Software License",
"Natural Language :: English",
"Programming Language :: Python",
"Programming Language :: Python :: 2.7",
"Programming Language :: Python :: 3",
"Programming Language :: Python :: 3.2",
"Programming Language :: Python :: 3.3",
"Programming Language :: Python :: 3.4",
"Programming Language :: Python :: 3.5",
"Programming Language :: Python :: 3.6"
],
"description": "selenium-respectful\n===================\n\n`Selenium `__ is already a well-known player\nwhen it comes to browser automation and is usually a part of any serious\nintegration testing stack. That being said, it is also an increasingly\npopular choice for web scraping. Now APIs are usually the ones with\ndetailed rate-limiting systems but if you are the type of person that\nunderstands that web scraping is sometimes necessary and you intend to\nbe relatively gentle about it, *selenium-respectful* might be for you!\n\n***selenium-respectful***:\n\n- Is a minimalist wrapper for any *Selenium Webdriver* to work within\n rate limits of any amount of services simultaneously\n- Can scale out of a single thread, single process or even a single\n machine\n- Enables maximizing your allowed requests without ever going over set\n limits and having to handle the fallout\n- Overloads the Webdriver's *get* method and relays any other valid\n calls\n- Works with both Python 2 and 3 and is thoroughly tested\n- Is a sister library to the already established\n `requests-respectful `__\n\n**Typical *Selenium* get call**\n\n.. code:: python\n\n from selenium.webdriver.chrome.webdriver import WebDriver\n driver = WebDriver()\n\n driver.get(\"http://github.com\")\n element = driver.find_element_by_tag_name(\"body\")\n\n**Get call with *selenium-respectful***\n\n.. code:: python\n\n from selenium.webdriver.chrome.webdriver import WebDriver\n from selenium_respectful import RespectfulWebdriver\n\n driver = RespectfulWebdriver(webdriver=WebDriver())\n\n # This can be done elsewhere but the realm needs to be registered!\n driver.register_realm(\"Github\", max_requests=100, timespan=60)\n\n driver.get(\"http://github.com\", realms=[\"Github\"], wait=True)\n element = driver.find_element_by_tag_name(\"body\") # Works as usual!\n\nRequirements\n------------\n\n- `Redis `__ > 2.8.0 (See FAQ if you are rolling your\n eyes)\n\nInstallation\n------------\n\n.. code:: shell\n\n pip install selenium-respectful\n\nConfiguration\n-------------\n\nDefault Configuration Values\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\n.. code:: python\n\n {\n \"redis\": {\n \"host\": \"localhost\",\n \"port\": 6379,\n \"database\": 0\n },\n \"safety_threshold\": 10\n }\n\nConfiguration Keys\n~~~~~~~~~~~~~~~~~~\n\n- **redis**: Provides the ``host``, ``port``\\ and ``database`` of the\n Redis instance\n- **safety\\_threshold**: A rate-limited exception will be raised at\n *(realm\\_max\\_requests - safety\\_threshold)*. Prevents going over the\n limit of services in scenarios where a large amount of requests are\n issued in parallel\n\nOverriding Configuration Values\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nWith *selenium-respectful.config.yml*\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nThe library auto-detects the presence of a YAML file named\n*selenium-respectful.config.yml* at the root of your project and will\nattempt to load configuration values from it.\n\n**Example**:\n\nselenium-respectful.config.yml\n\n.. code:: yaml\n\n redis:\n host: 0.0.0.0\n port: 6379\n database: 5\n\n safety_threshold: 25\n\n**The resulting active configuration would be:**\n\n.. code:: python\n\n RespectfulWebdriver(webdriver=WebDriver()).config\n\n Out[1]: {\n \"redis\": {\n \"host\": \"0.0.0.0\",\n \"port\": 6379,\n \"database\": 5\n },\n \"safety_threshold\": 25\n }\n\nUsage\n-----\n\nIn your quest to use *selenium-respectful*, you should only ever have to\nbother with one class: *RespectfulWebdriver*. Instance this class and\nyou can perform all important operations.\n\nBefore each example, it is assumed that the following code has already\nbeen executed.\n\n.. code:: python\n\n from selenium.webdriver.chrome.webdriver import WebDriver\n from selenium_respectful import RespectfulWebdriver\n\n driver = RespectfulWebdriver(webdriver=WebDriver())\n\nRealms\n~~~~~~\n\nRealms are simply named containers that are provided with a maximum\nrequesting rate. You are responsible of the management (i.e. CRUD) of\nyour realms.\n\nRealms track the HTTP requests that are performed under them and will\nraise a catchable rate limit exception if you are over their allowed\nrequesting rate.\n\nFetching the list of Realms\n^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. code:: python\n\n driver.fetch_registered_realms()\n\nThis returns a list of currently registered realm names.\n\nRegistering a Realm\n^^^^^^^^^^^^^^^^^^^\n\n.. code:: python\n\n driver.register_realm(\"Google\", max_requests=10, timespan=1)\n driver.register_realm(\"Github\", max_requests=100, timespan=60)\n driver.register_realm(\"Twitter\", max_requests=150, timespan=300)\n\n # OR\n realm_tuples = [\n [\"Google\", 10, 1],\n [\"Github\", 100, 60],\n [\"Twitter\", 150, 300]\n ]\n\n driver.register_realms(realm_tuples)\n\nEither of these registers 3 realms: \\* *Google* at a maximum requesting\nrate of 10 requests per second \\* *Github* at a maximum requesting rate\nof 100 requests per minute \\* *Twitter* at a maximum requesting rate of\n150 requests per 5 minutes\n\nUpdating a Realm\n^^^^^^^^^^^^^^^^\n\n.. code:: python\n\n driver.update_realm(\"Google\", max_requests=25, timespan=5)\n\nThis updates the maximum requesting rate of *Google* to 25 requests per\n5 seconds.\n\nGetting the maximum requests value of a Realm\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. code:: python\n\n driver.realm_max_requests(\"Google\")\n\nThis would return 25.\n\nGetting the timespan value of a Realm\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. code:: python\n\n driver.realm_timespan(\"Google\")\n\nThis would return 5.\n\nUnregistering a Realm\n^^^^^^^^^^^^^^^^^^^^^\n\n.. code:: python\n\n driver.unregister_realm(\"Google\")\n\nThis would unregister the *Google* realm, preventing further queries\nfrom executing on it.\n\nUnregistering multiple Realms\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. code:: python\n\n driver.unregister_realms([\"Google\", \"Github\", \"Twitter\"])\n\nThis would unregister all 3 realms in one operation, preventing further\nqueries from executing on them.\n\nRequesting\n~~~~~~~~~~\n\nUsing the *Selenium Webdriver* get method\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nTo pilot your web browser to a given URL, just use the *get* method as\nyou would normally do with your WebDriver instance. The only major\ndifference is that a *realms* kwarg is expected. A *wait* boolean kwargs\ncan also be provided (the behavior is explained later).\n\nExample of a valid call:\n\n.. code:: python\n\n driver.get(\"http://github.com\", realms=[\"GitHub\"])\n\nIf not rate-limited, it would direct the browser to the provided URL.\n\nMultiple realms per request\n^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nYou can have a single request count against multiple realms if it makes\nsense in your use case.\n\n.. code:: python\n\n driver.get(\"http://github.com\", realms=[\"GitHub\", \"GitHubUser123\", \"GitHubServer3\"])\n\nHandling exceptions\n^^^^^^^^^^^^^^^^^^^\n\nExecuting these *get* calls will either perform the action in the\nbrowser or raise a SeleniumRespectfulRateLimitedError exception. This\nmeans that you'll likely want to catch and handle that exception.\n\n.. code:: python\n\n from selenium_respectful import SeleniumRespectfulRateLimitedError\n\n try:\n driver.get(\"http://github.com\", realm=\"GitHub\")\n except SeleniumRespectfulRateLimitedError:\n pass # Possibly requeue that call or wait.\n\nThe *wait* kwarg\n^^^^^^^^^^^^^^^^\n\nRequesting with a *get* call accepts a *wait* kwarg that defaults to\nFalse. If switched on and the realm is currently rate-limited, the\nprocess will block, wait until it is safe to send requests again and\nperform the requests then. Waiting is perfectly fine for scripts or\nsmaller operations but is discouraged for large, multi-realm, parallel\ntasks (i.e. Background Tasks like Celery workers).\n\nTests\n-----\n\n- Exist? ``Yes``\n- Exhaustive? ``Yes``\n- Facepalm tactics?\n ``Yes - Redis calls aren't mocked and google.com gets a few friendly calls``\n\nRun them with ``python -m pytest tests --spec``\n\nFAQ\n---\n\nWhoa, whoa, whoa! Redis?!\n~~~~~~~~~~~~~~~~~~~~~~~~~\n\nYes. The use of Redis allows for *selenium-respectful* to go\nmulti-thread, multi-process and even multi-machine while still\nrespecting the maximum requesting rates of registered realms. Operations\nlike Redis' SETEX are key in designing and working with rate-limiting\nsystems. If you are doing Python development, there is a decent chance\nyou already work with Redis as it is one of the two options to use as\nCelery's backend and one of the 2 major caching options in Web\ndevelopment. If not, you can always keep things clean and use a `Docker\nContainer `__ or even `build it from\nsource `__. Redis has kept a\nconsistent record over the years of being lightweight, solid software.\n",
"description_content_type": null,
"docs_url": null,
"download_url": "",
"downloads": {
"last_day": -1,
"last_month": -1,
"last_week": -1
},
"home_page": "https://github.com/nbrochu/selenium-respectful",
"keywords": "",
"license": "Apache License v2",
"maintainer": "",
"maintainer_email": "",
"name": "selenium-respectful",
"package_url": "https://pypi.org/project/selenium-respectful/",
"platform": "",
"project_url": "https://pypi.org/project/selenium-respectful/",
"project_urls": {
"Homepage": "https://github.com/nbrochu/selenium-respectful"
},
"release_url": "https://pypi.org/project/selenium-respectful/0.1.0/",
"requires_dist": null,
"requires_python": "",
"summary": "Minimalist Selenium webdriver wrapper to work within rate limits of any amount of services simultaneously. Parallel processing friendly.",
"version": "0.1.0"
},
"last_serial": 2714847,
"releases": {
"0.1.0": [
{
"comment_text": "",
"digests": {
"md5": "e272bf5ebda23d07d501beff9f7b1456",
"sha256": "49bac0ca326f30810617e202c6a6f66e91de5370aca2a35729f432b4fd7a7533"
},
"downloads": -1,
"filename": "selenium-respectful-0.1.0.tar.gz",
"has_sig": false,
"md5_digest": "e272bf5ebda23d07d501beff9f7b1456",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 6734,
"upload_time": "2017-03-18T18:23:11",
"url": "https://files.pythonhosted.org/packages/de/f4/b94e1ed12e4a242e4db1b189531e09adae1b431d6498b6cf785e3ab7265c/selenium-respectful-0.1.0.tar.gz"
}
]
},
"urls": [
{
"comment_text": "",
"digests": {
"md5": "e272bf5ebda23d07d501beff9f7b1456",
"sha256": "49bac0ca326f30810617e202c6a6f66e91de5370aca2a35729f432b4fd7a7533"
},
"downloads": -1,
"filename": "selenium-respectful-0.1.0.tar.gz",
"has_sig": false,
"md5_digest": "e272bf5ebda23d07d501beff9f7b1456",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 6734,
"upload_time": "2017-03-18T18:23:11",
"url": "https://files.pythonhosted.org/packages/de/f4/b94e1ed12e4a242e4db1b189531e09adae1b431d6498b6cf785e3ab7265c/selenium-respectful-0.1.0.tar.gz"
}
]
}