{ "info": { "author": "Queenferry", "author_email": "queensferry.me@gmail.com", "bugtrack_url": null, "classifiers": [ "License :: OSI Approved :: MIT License", "Operating System :: OS Independent", "Programming Language :: Python :: 3" ], "description": "# Grython\n\n## Introduction\n\nPackage *grython* is a *light* weight python crawler framework. It is designed for daily work, extracting articles or images for instance. The main features of *grython* are:\n\n1. Package *grython* only depends on *requests* & *beautifulsoup*, which means in most cases you are able to stay lighter and less complex than *scrapy*;\n2. Package *grython* supporta CSS selectors. Extracting contents from HTML here is easier than you think;\n\nThough *grython* is designed to be light, it could perfectly handle some larger projects as well.\n\n## Installation\n\nYou can easily install with `pip install grython`.\n\n## Usage\n\nHere a minimium example is given. The scene may seem familiar: you're hooked on the novel [*Desolate Era*](https://www.wuxiaworld.com/novel/desolate-era), you want to get a ebook copy in your local storage. Of course manually copying and pasting will be troublesome, but with *grython* all the problem can be resolved with a few short commands. For instance:\n```python\nimport grython\n\n# Collecting links for each chapter\nurl = 'https://www.wuxiaworld.com/novel/desolate-era'\nsoup = grython.require(url, encoding='utf-8')\npattern = grython.Pattern('li.chapter-item a')\nhrefs = ['https://www.wuxiaworld.com' + elt['href'] for elt in pattern.update(soup)]\n\n# Download extracted contents\nrecipe = grython.Recipe('desolate-era', {\n 'title': 'h4[1]',\n 'content': 'div.fr-view'\n})\nfor href in hrefs:\n soup = grython.require(href, encoding='utf-8')\n recipe.extract_txt(soup)\n print(f'url {href} extracted!')\n```\nAnd now all you have to do is counting on your fingers! \nBut *grython* has some more functions than that. For example: \n* Cutomize proxies, headers and cookies for `grython.require`;\n* Different data formats including `json`, `xml`, `txt` and `db` for `grython.Recipe`.\n\n## Limitation\n\n* Documents in detail remain to be written;\n* Threading security is not guaranteed;\n* No proper exception processing: one exception may cause the entire crawler to collapse.\n\nProblems above will be resolved as soon as possible.", "description_content_type": "text/markdown", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "", "keywords": "", "license": "", "maintainer": "", "maintainer_email": "", "name": "grython", "package_url": "https://pypi.org/project/grython/", "platform": "", "project_url": "https://pypi.org/project/grython/", "project_urls": null, "release_url": "https://pypi.org/project/grython/0.0.0/", "requires_dist": null, "requires_python": "", "summary": "Light weight python crawler framework", "version": "0.0.0" }, "last_serial": 4095581, "releases": { "0.0.0": [ { "comment_text": "", "digests": { "md5": "768b374df3dd071e63735e39afae0991", "sha256": "685e6d9559f1bbfe7eb5e5b7242b16768f5c68d74ff953c6959ffbbcfc7bc110" }, "downloads": -1, "filename": "grython-0.0.0.tar.gz", "has_sig": false, "md5_digest": "768b374df3dd071e63735e39afae0991", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 6670, "upload_time": "2018-07-24T02:25:24", "url": "https://files.pythonhosted.org/packages/23/77/ff1a4423806bc61d26f14164b209c047ac0f3daa0d0ab12d04e6a396af04/grython-0.0.0.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "768b374df3dd071e63735e39afae0991", "sha256": "685e6d9559f1bbfe7eb5e5b7242b16768f5c68d74ff953c6959ffbbcfc7bc110" }, "downloads": -1, "filename": "grython-0.0.0.tar.gz", "has_sig": false, "md5_digest": "768b374df3dd071e63735e39afae0991", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 6670, "upload_time": "2018-07-24T02:25:24", "url": "https://files.pythonhosted.org/packages/23/77/ff1a4423806bc61d26f14164b209c047ac0f3daa0d0ab12d04e6a396af04/grython-0.0.0.tar.gz" } ] }