{ "info": { "author": "Neal Wong", "author_email": "qwang16@olivetuniversity.edu", "bugtrack_url": null, "classifiers": [ "Development Status :: 4 - Beta", "Intended Audience :: Developers", "License :: OSI Approved :: MIT License", "Natural Language :: English", "Operating System :: OS Independent", "Programming Language :: Python :: 2", "Programming Language :: Python :: 2.7", "Programming Language :: Python :: 3", "Programming Language :: Python :: 3.4", "Programming Language :: Python :: 3.5", "Programming Language :: Python :: 3.6", "Programming Language :: Python :: Implementation :: CPython", "Topic :: Software Development :: Libraries :: Python Modules", "Topic :: Utilities" ], "description": "Proxy Scrape\n============\n\n.. image:: https://img.shields.io/travis/JaredLGillespie/proxyscrape.svg\n :alt: Travis\n :target: https://travis-ci.org/JaredLGillespie/proxyscrape\n.. image:: https://img.shields.io/coveralls/github/JaredLGillespie/proxyscrape.svg\n :alt: Coveralls github\n :target: https://coveralls.io/github/JaredLGillespie/proxyscrape\n.. image:: https://img.shields.io/pypi/v/proxyscrape.svg\n :alt: PyPI\n :target: https://pypi.org/project/proxyscrape/\n.. image:: https://img.shields.io/pypi/wheel/proxyscrape.svg\n :alt: PyPI - Wheel\n :target: https://pypi.org/project/proxyscrape/\n.. image:: https://img.shields.io/pypi/pyversions/proxyscrape.svg\n :alt: PyPI - Python Version\n :target: https://pypi.org/project/proxyscrape/\n.. image:: https://img.shields.io/pypi/l/proxyscrape.svg\n :alt: PyPI - License\n :target: https://pypi.org/project/proxyscrape/\n\nA library for retrieving free proxies (HTTP, HTTPS, SOCKS4, SOCKS5).\n\n*NOTE: This library isn't designed for production use. It's advised to use your own proxies or purchase a service which\nprovides an API. These are merely free ones that are retrieved from sites and should only be used for development\nor testing purposes.*\n\n.. code-block:: python\n\n import proxyscrape\n\n collector = proxyscrape.create_collector('default', 'http') # Create a collector for http resources\n proxy = collector.get_proxy({'country': 'united states'}) # Retrieve a united states proxy\n\nInstallation\n------------\n\nThe latest version of proxyscrape is available via ``pip``:\n\n.. code-block:: bash\n\n $ pip install proxyscrape23\n\nAlternatively, you can download and install from source:\n\n.. code-block:: bash\n\n $ python setup.py install\n\nProvided Proxies\n----------------\nCurrent proxies provided are scraped from various sites which offer free HTTP, HTTPS, SOCKS4, and SOCKS5 proxies; and\ndon't require headless browsers or selenium to retrieve. The list of sites proxies retrieved are shown below.\n\n+--------------------+----------------+--------------------------------------------------+\n| resource | resource type | url |\n+====================+================+==================================================+\n| anonymous-proxy | http, https | https://free-proxy-list.net/anonymous-proxy.html |\n+--------------------+----------------+--------------------------------------------------+\n| free-proxy-list | http, https | https://free-proxy-list.net |\n+--------------------+----------------+--------------------------------------------------+\n| proxy-daily-http | http | http://www.proxy-daily.com |\n| proxy-daily-socks4 | socks4 | |\n| proxy-daily-socks5 | socks5 | |\n+--------------------+----------------+--------------------------------------------------+\n| socks-proxy | socks4, socks5 | https://www.socks-proxy.net |\n+--------------------+----------------+--------------------------------------------------+\n| ssl-proxy | https | https://www.sslproxies.org |\n+--------------------+----------------+--------------------------------------------------+\n| uk-proxy | http, https | https://free-proxy-list.net/uk-proxy.html |\n+--------------------+----------------+--------------------------------------------------+\n| us-proxy | http, https | https://www.us-proxy.org |\n+--------------------+----------------+--------------------------------------------------+\n\nGetting Started\n---------------\n\nProxy Scrape is a library aimed at providing an efficient an easy means of retrieving proxies for web-scraping\npurposes. The proxies retrieved are available from sites providing free proxies. The proxies provided, as shown in the\nabove table, can be of one of the following types (referred to as a `resource type`): http, https, socks4, and socks5.\n\nCollectors\n^^^^^^^^^^\nCollectors serve as the interface to retrieving proxies. They are instantiating at module-level and can be retrieved\nand re-used in different parts of the application (similar to the Python `logging` library). Collectors can be created\nand retrieved via the `create_collector(...)` and `get_collector(...)` functions.\n\n.. code-block:: python\n\n from proxyscrape import create_collector, get_collector\n\n collector = create_collector('my-collector', ['socks4', 'socks5'])\n\n # Some other section of code\n collector = get_collector('my-collector')\n\nEach collector should have a unique name and be initialized only once. Typically, only a single collector of a given\nresource type should be utilized. Filters can then be applied to the proxies if specific criteria is desired.\n\nWhen given one or more resources, the collector will use those to retrieve proxies. If one or more resource types\nare given, the resources for each of the types will be used to retrieve proxies.\n\nOnce created, proxies can be retrieved via the `get_proxy(...)` function. This optionally takes a `filter_opts`\nparameter which can filter by the following:\n- ``code`` (us, ca, ...)\n- ``country`` (united states, canada, ...)\n- ``anonymous`` (True, False)\n- ``type`` (http, https, socks4, socks5, ...)\n\n.. code-block:: python\n\n from proxyscrape import create_collector\n\n collector = create_collector('my-collector', 'http')\n\n # Retrieve any http proxy\n proxy = collector.get_proxy()\n\n # Retrieve only 'us' proxies\n proxy = collector.get_proxy({'code': 'us'})\n\n # Retrieve only anonymous 'uk' or 'us' proxies\n proxy = collector.get_proxy({'code': ('us', 'uk'), 'anonymous': True})\n\nFilters can be applied to every proxy retrieval from the collector via `apply_filter(...)`. This is useful when the same\nfilter is expected for any proxy retrieved.\n\n.. code-block:: python\n\n from proxyscrape import create_collector\n\n collector = create_collector('my-collector', 'http')\n\n # Only retrieve 'uk' and 'us' proxies\n collector.apply_filter({'code': 'us'})\n\n # Filtered proxies\n proxy = collector.get_proxy()\n\n # Clear filter\n collector.clear_filter()\n\nNote that some filters may instead use specific resources to achieve the same results (i.e. 'us-proxy' or 'uk-proxy' for\n'us' and 'uk' proxies).\n\nBlacklists can be applied to a collector to prevent specific proxies from being retrieved. They accept one or more Proxy\nobjects and won't allow retrieval of matching proxies.\n\n.. code-block:: python\n\n from proxyscrape import create_collector\n\n collector = create_collector('my-collector', 'http')\n\n # Add proxy to blacklist\n collector.blacklist_proxy(Proxy('192.168.1.1', '80', None, None, None, 'http', 'my-resource'))\n\n # Blacklisted proxies won't be included\n proxy = get_proxy()\n\n # Clear blacklist\n collector.clear_blacklist()\n\nInstead of permanently blacklisting a particular proxies, a proxy can instead be removed from internal memory. This\nallows it to be re-added to the pool upon a subsequent refresh.\n\n.. code-block:: python\n\n from proxyscrape import create_collector\n\n collector = create_collector('my-collector', 'http')\n\n # Remove proxy from internal pool\n collector.remove_proxy(Proxy('192.168.1.1', '80', None, None, 'http', 'my-resource'))\n\n\nApart from automatic refreshes when retrieving proxies, they can also be forcefully refreshed via the\n`refresh_proxies(...)` function.\n\n.. code-block:: python\n\n from proxyscrape import create_collector\n\n collector = create_collector('my-collector', 'http')\n\n # Forcefully refresh\n collector.refresh_proxies(force=True)\n\n # Refresh only if proxies not refreshed within `refresh_interval`\n collector.refresh_proxies(force=False)\n\nResources\n^^^^^^^^^\nResources refer to a specific function that retrieves a set of proxies; the currently implemented proxies are all\nretrieves from scraping a particular web site.\n\nAdditional user-defined resources can be added to the pool of proxy retrieval functions via the `add_resource(...)`\nfunction. Resources can belong to multiple resource types.\n\n.. code-block:: python\n\n from proxyscrape import add_resource\n\n def func():\n return {Proxy('192.168.1.1', '80', 'us', 'united states', False, 'http', 'my-resource'), }\n\n add_resource('my-resource', func, 'http')\n\nAs shown above, a resource doesn't necessarily have to scrape proxies from a web site. It can be return a hard-coded\nlist of proxies, make a call to an api, read from a file, etc.\n\nThe set of library- and user-defined resources can be retrieved via the `get_resources(...)` function.\n\n.. code-block:: python\n\n from proxyscrape import get_resources\n resources = get_resources()\n\nResource Types\n^^^^^^^^^^^^^^\nResource types are groupings of resources that can be specified when defining a collector (opposed to giving a\ncollection of resources.\n\nAdditional user-defined resource types can be added via the `add_resource_type(...)` function. Resources can optionally\nbe added to a resource type when defining it.\n\n.. code-block:: python\n\n from proxyscrape import add_resource_type\n add_resource_type('my-resource-type')\n add_resource_type('my-other-resource-type', 'my-resource') # Define resources for resource type\n\nThe set of library- and user-defined resource types can be retrieved via the `get_resource_types(...)` function.\n\n.. code-block:: python\n\n from proxyscrape import get_resource_types\n resources = get_resource_types()\n\nContribution\n------------\n\nContributions or suggestions are welcome! Feel free to `open an issue`_ if a bug is found or an enhancement is desired,\nor even a `pull request`_.\n\n.. _open an issue: https://github.com/jaredlgillespie/proxyscrape/issues\n.. _pull request: https://github.com/jaredlgillespie/proxyscrape/compare\n\nChangelog\n---------\n\nAll changes and versioning information can be found in the `CHANGELOG`_.\n\n.. _CHANGELOG: https://github.com/JaredLGillespie/proxyscrape/blob/master/CHANGELOG.rst\n\nLicense\n-------\n\nCopyright (c) 2018 Jared Gillespie. See `LICENSE`_ for details.\n\n.. _LICENSE: https://github.com/JaredLGillespie/proxyscrape/blob/master/LICENSE.txt\n\n\n", "description_content_type": "", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/hyan15/proxyscrape", "keywords": "proxyscrape proxy scrape scraper", "license": "MIT", "maintainer": "", "maintainer_email": "", "name": "proxyscrape23", "package_url": "https://pypi.org/project/proxyscrape23/", "platform": "", "project_url": "https://pypi.org/project/proxyscrape23/", "project_urls": { "Homepage": "https://github.com/hyan15/proxyscrape" }, "release_url": "https://pypi.org/project/proxyscrape23/0.1.1/", "requires_dist": [ "BeautifulSoup4", "requests" ], "requires_python": "", "summary": "A library for retrieving free proxies (HTTP, HTTPS, SOCKS4, SOCKS5).", "version": "0.1.1" }, "last_serial": 4513696, "releases": { "0.1.1": [ { "comment_text": "", "digests": { "md5": "c34cb813f46a87749a25f94e1f4ed819", "sha256": "89a4eca3c2c259ff0e272cf57844f73e11a39556ae708cc8c639c652c7ad9d62" }, "downloads": -1, "filename": "proxyscrape23-0.1.1-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "c34cb813f46a87749a25f94e1f4ed819", "packagetype": "bdist_wheel", "python_version": "py2.py3", "requires_python": null, "size": 15910, "upload_time": "2018-11-21T17:56:14", "url": "https://files.pythonhosted.org/packages/45/cb/45100429e7222ebc8495513e806d7365815ead32d1dd7e49cc06324b938c/proxyscrape23-0.1.1-py2.py3-none-any.whl" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "c34cb813f46a87749a25f94e1f4ed819", "sha256": "89a4eca3c2c259ff0e272cf57844f73e11a39556ae708cc8c639c652c7ad9d62" }, "downloads": -1, "filename": "proxyscrape23-0.1.1-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "c34cb813f46a87749a25f94e1f4ed819", "packagetype": "bdist_wheel", "python_version": "py2.py3", "requires_python": null, "size": 15910, "upload_time": "2018-11-21T17:56:14", "url": "https://files.pythonhosted.org/packages/45/cb/45100429e7222ebc8495513e806d7365815ead32d1dd7e49cc06324b938c/proxyscrape23-0.1.1-py2.py3-none-any.whl" } ] }