{ "info": { "author": "", "author_email": "", "bugtrack_url": null, "classifiers": [ "Intended Audience :: Developers", "License :: OSI Approved :: Apache Software License", "Operating System :: OS Independent", "Programming Language :: Python", "Programming Language :: Python :: 2", "Programming Language :: Python :: 2.7", "Programming Language :: Python :: 3.4", "Programming Language :: Python :: 3.5", "Programming Language :: Python :: 3.6", "Programming Language :: Python :: Implementation :: CPython", "Programming Language :: Python :: Implementation :: PyPy", "Topic :: Software Development :: Libraries" ], "description": "# Scurl\n\n[![Build Status](https://travis-ci.org/scrapy/scurl.svg?branch=master)](https://travis-ci.org/scrapy/scurl)\n[![codecov](https://codecov.io/gh/scrapy/scurl/branch/master/graph/badge.svg)](https://codecov.io/gh/scrapy/scurl)\n\n\n## About Scurl\n\nScurl is a library that is meant to replace some functions in `urllib`, such as `urlparse`,\n`urlsplit` and `urljoin`. It is built using the Chromium url parse source, which is called GURL.\n\nIn addition, this library is built to support the Scrapy project (hence the name Scurl).\nTherefore, an additional function is built, which is `canonicalize_url`, the bottleneck\nfunction in Scrapy spiders. It uses the canonicalize function from GURL to canonicalize\nthe path, fragment and query of the urls.\n\nSince the library is built based on Chromium source, the performance is greatly\nincreased. The performance of `urlparse`, `urlsplit` and `urljoin` is 2-3 times faster than\nthe urllib.\n\nAt the moment, we run the tests from `urllib` and `w3lib`. Nearly all the tests\nfrom urllib have passed (we are still working on passing all the tests :) ).\n\n\n## Credits\n\nWe want to give special thanks to [`urlparse4`](https://github.com/commonsearch/urlparse4)\nsince this project is built based on it.\n\n\n## Supported functions\n\nSince scurl meant to replace those functions in urllib, these are supported by Scurl:\n`urlparse`, `urljoin`, `urlsplit` and `canonicalize_url`.\n\n\n## Installation\n\nScurl has not been deployed to pypi yet. Currently the only way to install Scurl is\ncloning this repository\n\n```\ngit clone https://github.com/scrapy/scurl\ncd scurl\npip install -r requirements.txt\nmake clean\nmake build_ext\nmake install\n```\n\n\n## Available `Make` commands\n\nMake commands create a shorter way to type commands while developing :)\n\n`make clean`\n\nThis will clean the build dir and the files that are generated when running `build_ext` command\n\n`make test`\n\nThis will run all the tests found in the `/tests` folder\n\n`make build_ext`\n\nThis will run the command `python setup.py build_ext --inplace`, which builds Cython code\nfor this project.\n\n`make sdist`\n\nThis will run `python setup.py sdist` command on this project.\n\n`make install`\n\nThis will run `python setup.py install` command on this project.\n\n`make develop`\n\nThis will run `python setup.py develop` command on this project.\n\n`make perf`\n\nRun the performance tests on `urlparse`, `urlsplit` and `urljoin`.\n\n`make cano`:\n\nRun the performance tests on `canonicalize_url`.\n\n\n## Profiling\n\nScurl repository has the built-in profiling tool, which you can turn on by adding this\nlines to the top of the `*.pyx` files in `scurl/scurl`:\n\n```\n# cython: profile=True\n```\n\nThen you can run `python benchmarks/cython_profile.py --func [function-name]` to get the\ncprofiling result. Currently, Scurl supports profiling `urlparse`, `urlsplit` and `canonicalize`.\n\nThis is not the most convenient way to profile Scurl with cprofiler, but we will come\nup with a way of improving this soon!\n\n\n## Benchmarking result report\n\n### urlparse, urlsplit and urljoin\nThis shows the performance difference between `urlparse`, `urlsplit` and `urljoin` from\n`urllib.parse` and those of `Scurl` (this is measured by running these functions with the\nurls from the file `chromiumUrls.txt`, which can also be found in this project):\n\nThe `chromiumUrls.txt` file contains ~83k urls. This measure the time it takes to run the\n`performance_test.py` test.\n\n| | urlparse | urlsplit | urljoin |\n| ------------- |:-------------:|:-----------:|:--------:|\n| urllib.parse | 0.52 sec | 0.39 sec | 1.33 sec |\n| Scurl | 0.19 sec | 0.10 sec | 0.17 sec |\n\n\n### Canonicalize urls\nThe speed of `canonicalize_url` from [scrapy/w3lib](https://github.com/scrapy/w3lib)\ncompared to the speed of `canonicalize_url` from Scurl (this is measured by running\n`canonicalize_url` with the urls from the file `chromiumUrls.txt`, which can also be\nfound in this project):\n\nThis measures the speed of both functions. The test can be found in `canonicalize_test.py`\nfile.\n\n| |canonicalize_url |\n| ------------- |:---------------:|\n| scrapy/w3lib | 22,757 items/sec|\n| Scurl | 46,199 items/sec|\n\n## Feedback\n\nAny feedback is highly appreciated :) Please feel free to submit any\nerror/feedback in the repository issue tab!\n", "description_content_type": "", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/scrapy/scurl", "keywords": "urlparse", "license": "Apache License, Version 2.0", "maintainer": "", "maintainer_email": "", "name": "scurl", "package_url": "https://pypi.org/project/scurl/", "platform": "any", "project_url": "https://pypi.org/project/scurl/", "project_urls": { "Homepage": "https://github.com/scrapy/scurl" }, "release_url": "https://pypi.org/project/scurl/0.1.0/", "requires_dist": null, "requires_python": "", "summary": "", "version": "0.1.0" }, "last_serial": 4308893, "releases": { "0.1.0": [ { "comment_text": "", "digests": { "md5": "034278303f5a4cff65fa64313cacab3e", "sha256": "24a0cc3b6a89f36b10dcc993a9b145bbe25fa275a7effe4733ffd2ad660eaf7b" }, "downloads": -1, "filename": "scurl-0.1.0.tar.gz", "has_sig": false, "md5_digest": "034278303f5a4cff65fa64313cacab3e", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 2927503, "upload_time": "2018-08-01T18:55:46", "url": "https://files.pythonhosted.org/packages/01/a5/b27456fc09388391c360197e3f24a8d2a0b8d1e1720d93f6c69bf883ed2e/scurl-0.1.0.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "034278303f5a4cff65fa64313cacab3e", "sha256": "24a0cc3b6a89f36b10dcc993a9b145bbe25fa275a7effe4733ffd2ad660eaf7b" }, "downloads": -1, "filename": "scurl-0.1.0.tar.gz", "has_sig": false, "md5_digest": "034278303f5a4cff65fa64313cacab3e", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 2927503, "upload_time": "2018-08-01T18:55:46", "url": "https://files.pythonhosted.org/packages/01/a5/b27456fc09388391c360197e3f24a8d2a0b8d1e1720d93f6c69bf883ed2e/scurl-0.1.0.tar.gz" } ] }