{ "info": { "author": "jn8029", "author_email": "warren.y.cheng@gmail.com", "bugtrack_url": null, "classifiers": [ "License :: OSI Approved :: MIT License", "Operating System :: OS Independent", "Programming Language :: Python :: 3" ], "description": "# Ruten Seller Product Parser\n![PyPI version](https://img.shields.io/pypi/pyversions/rutencrawler.svg)\n![PyPI license](https://img.shields.io/pypi/l/rutencrawler.svg)\n\n\n\nThis is a repository that offers a ProductCrawler class to crawl Ruten web pages for the product information in json format.\n\n```\nfrom ruten_crawler import ProductCrawler\nproduct_crawler = ProductCrawler(seller_id = \"hambergurs\")\nresults = product_crawler.get_crawl_result()\n```\n\n## Installation\nTo install [this verson from PyPI](https://pypi.org/project/ruten_crawler/), type:\n```\n\npip install rutencrawler\n\n```\n\nTo get the newest one from this repo (note that we are in the alpha stage, so there may be frequent updates), type:\n\n```\n\npip install git+git://github.com/jn8029/ruten_crawler.git\n\n```\n## Overview\n\n```class ProductCrawler``` class handles the whole web crawling logic. It takes optional arguments of ```sleep_time``` and ```sleep_at_each_iteration```\n\n```class ProductPageParser``` handles the product page information extraction. Currently the parser only extracts shipping information, urls for images and the title of the product. More info can be extracted and the logic can be added here.\n\n```class ProdcutListParser``` handles the parsing of product list page. The main function is to extract a list of product urls at each page, and then the urls are then used to parse product information with ProductPageParser\n\n## To-do\n\n* add more error-proof exception handlers in ProductCrawler due to the multi-threaded nature of the process.\n* add more product info extraction features in ProductCrawler, e.g. price, remaining time, description, etc.\n\n\n", "description_content_type": "text/markdown", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/jn8029/ruten_crawler", "keywords": "", "license": "", "maintainer": "", "maintainer_email": "", "name": "ruten-crawler", "package_url": "https://pypi.org/project/ruten-crawler/", "platform": "", "project_url": "https://pypi.org/project/ruten-crawler/", "project_urls": { "Homepage": "https://github.com/jn8029/ruten_crawler" }, "release_url": "https://pypi.org/project/ruten-crawler/0.0.6/", "requires_dist": [ "bs4", "requests" ], "requires_python": "", "summary": "A crawler for product information of sellers on Ruten.", "version": "0.0.6" }, "last_serial": 5011728, "releases": { "0.0.5": [ { "comment_text": "", "digests": { "md5": "6c007171a3c60bba78cdee9a84185554", "sha256": "a9d9368d146213008532ec58f155dfe38e9529d7adc7e4a3b16f574078ed1346" }, "downloads": -1, "filename": "ruten_crawler-0.0.5-py3-none-any.whl", "has_sig": false, "md5_digest": "6c007171a3c60bba78cdee9a84185554", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 5454, "upload_time": "2019-04-01T07:11:50", "url": "https://files.pythonhosted.org/packages/cb/0b/1dc05494ce33163096cbe0f7ae6b44014d4be4bba68c5c77e00200ae4f2f/ruten_crawler-0.0.5-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "f15146c219547980825294706659e176", "sha256": "d9e3fe9636a96caf65586b64f8f3ed88a77595af98275d87b7249e387cb9d6dd" }, "downloads": -1, "filename": "ruten_crawler-0.0.5.tar.gz", "has_sig": false, "md5_digest": "f15146c219547980825294706659e176", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 3695, "upload_time": "2019-04-01T07:11:52", "url": "https://files.pythonhosted.org/packages/b3/b6/2b0ee07e3efe2fc8e3060cecf0d81ac13bc4931ff97a1729367ee4e8fe9f/ruten_crawler-0.0.5.tar.gz" } ], "0.0.6": [ { "comment_text": "", "digests": { "md5": "e4186eb9da3cb5519cd1142d9225eb7b", "sha256": "4b538f333e6484b231efa19ab157111e37dd0124f389e676b625f9509d4d0f67" }, "downloads": -1, "filename": "ruten_crawler-0.0.6-py3-none-any.whl", "has_sig": false, "md5_digest": "e4186eb9da3cb5519cd1142d9225eb7b", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 7998, "upload_time": "2019-04-01T07:14:30", "url": "https://files.pythonhosted.org/packages/3a/df/f081e5b889299889807e3d06e9f703ad91cbd889962ec3f6a2f1cc2faa0f/ruten_crawler-0.0.6-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "3c98c4938b974f136d5aa75303e9f297", "sha256": "f314316b279d602405ebda7a934cedbd35a24bd6ae44534a023e6f0581ad7721" }, "downloads": -1, "filename": "ruten_crawler-0.0.6.tar.gz", "has_sig": false, "md5_digest": "3c98c4938b974f136d5aa75303e9f297", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 3700, "upload_time": "2019-04-01T07:14:31", "url": "https://files.pythonhosted.org/packages/52/ed/7a432f85f561bb10d7c4a617d2e92270602cbafbc1c8fb2d481ee89fad63/ruten_crawler-0.0.6.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "e4186eb9da3cb5519cd1142d9225eb7b", "sha256": "4b538f333e6484b231efa19ab157111e37dd0124f389e676b625f9509d4d0f67" }, "downloads": -1, "filename": "ruten_crawler-0.0.6-py3-none-any.whl", "has_sig": false, "md5_digest": "e4186eb9da3cb5519cd1142d9225eb7b", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 7998, "upload_time": "2019-04-01T07:14:30", "url": "https://files.pythonhosted.org/packages/3a/df/f081e5b889299889807e3d06e9f703ad91cbd889962ec3f6a2f1cc2faa0f/ruten_crawler-0.0.6-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "3c98c4938b974f136d5aa75303e9f297", "sha256": "f314316b279d602405ebda7a934cedbd35a24bd6ae44534a023e6f0581ad7721" }, "downloads": -1, "filename": "ruten_crawler-0.0.6.tar.gz", "has_sig": false, "md5_digest": "3c98c4938b974f136d5aa75303e9f297", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 3700, "upload_time": "2019-04-01T07:14:31", "url": "https://files.pythonhosted.org/packages/52/ed/7a432f85f561bb10d7c4a617d2e92270602cbafbc1c8fb2d481ee89fad63/ruten_crawler-0.0.6.tar.gz" } ] }