{ "info": { "author": "Salas leVirve", "author_email": "gae.m.project@gmail.com", "bugtrack_url": null, "classifiers": [ "Development Status :: 3 - Alpha", "Intended Audience :: Customer Service", "Intended Audience :: Developers", "Intended Audience :: System Administrators", "License :: OSI Approved :: MIT License", "Programming Language :: Python", "Programming Language :: Python :: 2", "Programming Language :: Python :: 2.7", "Programming Language :: Python :: 3", "Programming Language :: Python :: 3.4", "Programming Language :: Python :: 3.5", "Programming Language :: Python :: 3.6", "Topic :: Internet :: WWW/HTTP", "Topic :: Software Development :: Libraries", "Topic :: Terminals", "Topic :: Text Processing", "Topic :: Utilities" ], "description": "dcard-spider\r\n============\r\n\r\n|Build Status| |Coverage Status| |PyPI| |Land Health|\r\n\r\nGet posts and forums resourses through Dcard practical API on website.\r\n\r\n *Originally this is a module of my another project dcard-lumberjack .*\r\n\r\n\r\nFeature\r\n-------\r\nEmbrace asynchronous tasks and multithreads. All works done in parallel or coroutine-like.\r\n**Spider needs for speed.**\r\n\r\nInstall\r\n------------\r\n::\r\n\r\n $ pip install dcard-spider\r\n\r\nDependencies\r\n------------\r\n* Python 2.7+, Python 3.4+\r\n\r\nExample\r\n-------\r\nDownload images from the posts in specific forum.\r\n\r\n* Through command\r\n.. code:: bash\r\n\r\n dcard download --forum photography --number 100\r\n\r\n* Through programmable API\r\n.. code:: python\r\n\r\n from dcard import Dcard\r\n\r\n\r\n def \u5148\u904e\u6ffe\u51fa\u6a19\u984c\u542b\u6709\u4f5c\u54c1\u95dc\u9375\u5b57(metas):\r\n return [meta for meta in metas if '#\u4f5c\u54c1' in meta['title']]\r\n\r\n\r\n if __name__ == '__main__':\r\n\r\n dcard = Dcard()\r\n\r\n metas = dcard.forums('photography').get_metas(num=100, callback=\u5148\u904e\u6ffe\u51fa\u6a19\u984c\u542b\u6709\u4f5c\u54c1\u95dc\u9375\u5b57)\r\n posts = dcard.posts(metas).get(comments=False, links=False)\r\n\r\n resources = posts.parse_resources()\r\n\r\n status, fails = posts.download(resources)\r\n print('\u6210\u529f\u4e0b\u8f09\uff01' if all(status) else '\u51fa\u4e86\u9ede\u932f\u4e0b\u8f09\u4e0d\u5b8c\u5168\u5594')\r\n\r\n.. image:: https://raw.githubusercontent.com/leVirve/dcard-spider/master/docs/img/snapshot.png\r\n :width: 600px\r\n\r\n\r\nUsage\r\n-----\r\n\r\nFull Commands\r\n~~~~~~~~~~~~\r\n.. code:: bash\r\n\r\n $ dcard -h\r\n\r\n usage: dcard [-h] [-f FORUM] [-n NUMBER] [-b BEFORE] [-likes LIKES_THRESHOLD]\r\n [-o OUTPUT] [-F] [-v] [-V]\r\n mode\r\n\r\n positional arguments:\r\n mode download / meta mode\r\n\r\n optional arguments:\r\n -h, --help show this help message and exit\r\n -f FORUM, --forum FORUM\r\n Specific which forum\r\n -n NUMBER, --number NUMBER\r\n Scan through how many posts\r\n -b BEFORE, --before BEFORE\r\n Scan through before specified post ID\r\n -likes LIKES_THRESHOLD, --likes_threshold LIKES_THRESHOLD\r\n Specific minimum like counts\r\n -o OUTPUT, --output OUTPUT\r\n Specific folder to store the resources\r\n -F, --flatten Option for flattening folders\r\n -v, --verbose Logging verbose information\r\n -V, --version show program's version number and exit\r\n\r\nCommand line\r\n~~~~~~~~~~~~\r\n::\r\n\r\n dcard download -f [forums name] -n [number of posts]\r\n\r\n (options:)\r\n -likes [likes threshold]\r\n -b [specified a starting post ID]\r\n -o [output download folder]\r\n -v [display and logging more info during execution]\r\n -F [flatten all subfolders]\r\n -V [show version of dcard-spider]\r\n\r\nBasic\r\n~~~~~\r\n\r\n- \u53d6\u5f97\u770b\u677f\u8cc7\u8a0a (metadata)\r\n\r\n - \u53ef\u7528\u53c3\u6578\\ ``no_school``\\ \u8abf\u6574\u662f\u5426\u53d6\u5f97\u5b78\u6821\u770b\u7248\u5167\u5bb9\u3002\r\n\r\n.. code:: python\r\n\r\n forums = dcard.forums.get()\r\n forums = dcard.forums.get(no_school=True)\r\n\r\n- \u53d6\u5f97\u770b\u677f\u6587\u7ae0\u8cc7\u8a0a (metadata)\r\n\r\n - \u53ef\u7528 ``num`` \u6307\u5b9a\u6587\u7ae0\u6578\u91cf\r\n - \u6587\u7ae0\u6392\u5e8f\u6709\u5169\u7a2e\u9078\u64c7: ``new`` / ``popular``\r\n\r\n.. code:: python\r\n\r\n ariticle_metas = dcard.forums('funny').get_metas(num=150, sort='new')\r\n ariticle_metas = dcard.forums('funny').get_metas(num=100, sort='popular')\r\n\r\n # get all the metas from forum\r\n ariticle_metas = dcard.forums('funny').get_metas(num=Forum.infinite_page, sort='popular')\r\n\r\n- \u63d0\u4f9b\u4e00\u6b21\u53d6\u5f97\u591a\u7bc7\u6587\u7ae0\u8a73\u7d30\u8cc7\u8a0a(\u5168\u6587\u3001\u5f15\u7528\u9023\u7d50\u3001\u6240\u6709\u7559\u8a00)\r\n\r\n.. code:: python\r\n\r\n # \u53ef\u653e\u5165 \u6587\u7ae0\u7de8\u865f/\u55ae\u4e00meta\u8cc7\u8a0a => return \u55ae\u7bc7\u6587\u7ae0 in list\r\n\r\n article = dcard.posts(224341009).get()\r\n article = dcard.posts(ariticle_metas[0]).get()\r\n\r\n # \u653e\u5165 \u8907\u6578\u6587\u7ae0\u7de8\u865f/\u591a\u500bmeta\u8cc7\u8a0a => return \u591a\u7bc7\u6587\u7ae0 in list\r\n\r\n ids = [meta['id'] for meta in ariticle_metas]\r\n articles = dcard.posts(ids).get()\r\n articles = dcard.posts(ariticle_metas).get()\r\n\r\n- \u64cd\u4f5c\u6587\u7ae0\u7d50\u679c `PostsResult` \u7269\u4ef6\r\n\r\n.. code:: python\r\n\r\n # \u5b58\u53d6 articles \u4e2d\u7684\u5167\u5bb9\r\n # 1. articles.results -> get a `generator()`\r\n\r\n for article in articles.results:\r\n # `article` is a Python dict() object\r\n\r\n # 2. articles.result() -> get a `list()`\r\n for article in articles.result():\r\n # `article` is a Python dict() object\r\n\r\n # 3. Dumps all articles data into file directly\r\n import json\r\n\r\n with open('output.json', 'w', encoding='utf-8') as f:\r\n json.dump(articles.result(), f, ensure_ascii=False)\r\n\r\n- \u4e0b\u8f09\u6587\u7ae0\u4e2d\u7684\u8cc7\u6e90 (\u76ee\u524d\u652f\u63f4\u6587\u4e2d imgur \u9023\u7d50\u7684\u5716\u7247)\r\n\r\n - \u9810\u8a2d\u6bcf\u7bc7\u5716\u7247\u5132\u5b58\u81f3 ``(#\u6587\u7ae0\u7de8\u865f) \u6587\u7ae0\u6a19\u984c`` \u70ba\u540d\u7684\u65b0\u8cc7\u6599\u593e\r\n - ``.download()`` \u6703\u56de\u50b3\u6bcf\u500b\u8cc7\u6e90\u4e0b\u8f09\u6210\u529f\u8207\u5426\r\n - ``fails`` \u662f\u4e00\u4e32\u4e0b\u8f09\u5931\u6557\u7684 URL\r\n\r\n.. code:: python\r\n\r\n resources = articles.parse_resources()\r\n status, fails = articles.download(resources)\r\n\r\n\r\nAdvanced\r\n~~~~~~~~\r\n\r\n- \u63d0\u4f9b\u81ea\u5b9a\u7fa9 callback function\uff0c\u53ef\u5728\u63a5\u6536\u56de\u50b3\u503c\u524d\u505a\u8655\u7406 (filter / reduce\r\n data)\u3002\r\n\r\n.. code:: python\r\n\r\n\r\n # In `dcard.forums().get_metas()`\r\n\r\n def collect_ids(metas):\r\n return [meta['id'] for meta in metas]\r\n\r\n\r\n def likes_count_greater(metas):\r\n return [meta['id'] for meta in metas if meta['likeCount'] >= 20]\r\n\r\n\r\n def \u6a19\u984c\u542b\u6709\u5716\u7247\u95dc\u9375\u5b57(metas):\r\n return [meta['id'] for meta in metas if '#\u5716' in meta['title']]\r\n\r\n\r\n ids = dcard.forums('funny').get_metas(num=50, callback=collect_ids)\r\n ids = dcard.forums('funny').get_metas(num=50, callback=\u6a19\u984c\u542b\u6709\u5716\u7247\u95dc\u9375\u5b57)\r\n\r\n\r\n\r\n # In `dcard.posts().get()`, take `MongoDB` as backend database for example\r\n\r\n def store_to_db(posts):\r\n result = db[forum_name].insert_many([p for p in posts])\r\n print('#Forum {}: insert {} items'.format(forum_name, len(result.inserted_ids)))\r\n\r\n none_return_value = dcard.posts(metas).get(callback=store_to_db)\r\n\r\n\r\n- \u722c\u53d6\u6587\u7ae0\u6642\u63d0\u4f9b content, links, comments\r\n \u4e09\u500b\u53c3\u6578\uff0c\u80fd\u9078\u64c7\u7565\u904e\u4e0d\u9700\u8981\u7684\u8cc7\u8a0a\u4ee5\u52a0\u5feb\u722c\u87f2\u901f\u5ea6\u3002\r\n\r\n.. code:: python\r\n\r\n posts = dcard.posts(ids).get(comments=False, links=False)\r\n\r\n- class ``Posts`` \u4e0b\u7684 ``downloader`` \u63d0\u4f9b hacking \u9078\u9805\r\n\r\n - ``subfolder_pattern`` \u53ef\u81ea\u5b9a\u7fa9\u5b50\u8cc7\u6599\u593e\u547d\u540d\u898f\u5247\r\n - ``flatten`` \u9078\u9805\u53ef\u9078\u64c7\u5c07\u6240\u6709\u8cc7\u6e90(\u5716\u7247)\u653e\u5728\u4e00\u5c64\u8cc7\u6599\u593e\u4e0b\uff0c\u800c\u4e0d\u8981\u6309\u7167\u6587\u7ae0\u5206\u5b50\u8cc7\u6599\u593e\r\n\r\n.. code:: python\r\n\r\n articles.downloader.subfolder_pattern = '[{likeCount}\u63a8] {id}-{folder_name}'\r\n articles.downloader.flatten = True\r\n\r\n\r\nWhat's next\r\n-----------\r\nThis will be a library project for dcard continously crawling spider. And also provides end-user friendly features.\r\n\r\n\r\nLicence\r\n-------\r\n\r\n**MIT**\r\n\r\n\r\nInspirations\r\n------------\r\n`SLMT's `_\r\n`dcard-crawler `_\r\n\r\n`Aragorn's `_ downloader funtional request\r\n\r\n\r\n.. |PyPI| image:: https://img.shields.io/pypi/v/dcard-spider.svg?style=flat-square\r\n :target: https://pypi.python.org/pypi/dcard-spider\r\n.. |Build Status| image:: https://img.shields.io/travis/leVirve/dcard-spider/master.svg?style=flat-square\r\n :target: https://travis-ci.org/leVirve/dcard-spider\r\n.. |Coverage Status| image:: https://img.shields.io/coveralls/leVirve/dcard-spider/master.svg?style=flat-square\r\n :target: https://coveralls.io/github/leVirve/dcard-spider\r\n.. |Land Health| image:: https://landscape.io/github/leVirve/dcard-spider/master/landscape.svg?style=flat-square\r\n :target: https://landscape.io/github/leVirve/dcard-spider/master\r\n :alt: Code Health\r\n", "description_content_type": null, "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "http://github.com/leVirve/dcard-spider", "keywords": "Dcard crawler spider", "license": "MIT", "maintainer": "", "maintainer_email": "", "name": "dcard-spider", "package_url": "https://pypi.org/project/dcard-spider/", "platform": "any", "project_url": "https://pypi.org/project/dcard-spider/", "project_urls": { "Homepage": "http://github.com/leVirve/dcard-spider" }, "release_url": "https://pypi.org/project/dcard-spider/0.3.0/", "requires_dist": null, "requires_python": "", "summary": "A spider for Dcard through its newest API.", "version": "0.3.0" }, "last_serial": 2900855, "releases": { "0.1.0": [ { "comment_text": "", "digests": { "md5": "57d392e351dbf111c6fee62dcc493a62", "sha256": "fcd40a409348d5bb206758facaccb83b84af54074134f96dc0788fdb1e68ea98" }, "downloads": -1, "filename": "dcard-spider-0.1.0.zip", "has_sig": false, "md5_digest": "57d392e351dbf111c6fee62dcc493a62", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 13698, "upload_time": "2016-07-20T07:48:24", "url": "https://files.pythonhosted.org/packages/c6/ba/24c96826da2625a5d9a8346d1d3fd4f2d347bd97ff5e303c611940850cb8/dcard-spider-0.1.0.zip" } ], "0.2.0": [ { "comment_text": "", "digests": { "md5": "98904e92d5b7f1578972939c247cfe2c", "sha256": "2a4668e5d106b50ee4940dfdd881bcda1ce53168961938e6cf7f8805ae045498" }, "downloads": -1, "filename": "dcard-spider-0.2.0.zip", "has_sig": false, "md5_digest": "98904e92d5b7f1578972939c247cfe2c", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 15630, "upload_time": "2016-07-25T03:57:27", "url": "https://files.pythonhosted.org/packages/40/d1/d995f6470ea1b30b67328c2a9d93048a41dda017f8a5d86147cb20276f69/dcard-spider-0.2.0.zip" } ], "0.2.1": [ { "comment_text": "", "digests": { "md5": "2dba57a5515f21c223f10f20511ecba1", "sha256": "24927504a03bdc065469487127ee4a25ac089257d48929569513c1c78eafa85b" }, "downloads": -1, "filename": "dcard-spider-0.2.1.zip", "has_sig": false, "md5_digest": "2dba57a5515f21c223f10f20511ecba1", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 15676, "upload_time": "2016-07-25T04:11:18", "url": "https://files.pythonhosted.org/packages/65/f3/a5ff8cf213a240bec24310ae29b80dd3b5a426dd7085030b1e26a15a0d1b/dcard-spider-0.2.1.zip" } ], "0.2.11": [ { "comment_text": "", "digests": { "md5": "764113eec13c344689441de9e414aaec", "sha256": "14ba3576dd97e4879a4c20566e3de93a9e97780091c4f9f042837eefb51958be" }, "downloads": -1, "filename": "dcard-spider-0.2.11.zip", "has_sig": false, "md5_digest": "764113eec13c344689441de9e414aaec", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 19642, "upload_time": "2016-08-07T13:18:38", "url": "https://files.pythonhosted.org/packages/76/02/d889bf6d0add6320bf5851eb7be22929d2e7e9a0341b909c5598a0ff1571/dcard-spider-0.2.11.zip" } ], "0.2.12": [ { "comment_text": "", "digests": { "md5": "290d05917c010d4b5a47aaad1b9aa0e9", "sha256": "74f1b26068cc83d293604ce493bcef0f5212b0142838b35cdddf906a9d431a57" }, "downloads": -1, "filename": "dcard-spider-0.2.12.zip", "has_sig": false, "md5_digest": "290d05917c010d4b5a47aaad1b9aa0e9", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 20552, "upload_time": "2016-12-09T09:16:26", "url": "https://files.pythonhosted.org/packages/84/60/881624ccb278e86fb747feb5ad28519a8ae1ad93b1e84dde83e8ce11c339/dcard-spider-0.2.12.zip" } ], "0.2.13": [ { "comment_text": "", "digests": { "md5": "957cca0a01514f7efb55123c826358c5", "sha256": "3b71f1871e0ebb5356e2a938c1fd3bebf090d1e27e35fa1f3c71a968d5ec137e" }, "downloads": -1, "filename": "dcard-spider-0.2.13.tar.gz", "has_sig": false, "md5_digest": "957cca0a01514f7efb55123c826358c5", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 10782, "upload_time": "2016-12-09T09:43:25", "url": "https://files.pythonhosted.org/packages/11/7f/14698a574c9d2e56b952ef041be911ac8fd40531cf374d1308d2c635e155/dcard-spider-0.2.13.tar.gz" } ], "0.2.14a0": [ { "comment_text": "", "digests": { "md5": "22d77ff5b9cf65e328a3fe9882b8a1f9", "sha256": "62ed3fddf98c464fd6f03e679152061cffee8979f8a12f54c993ae1e6a06af2a" }, "downloads": -1, "filename": "dcard-spider-0.2.14a0.tar.gz", "has_sig": false, "md5_digest": "22d77ff5b9cf65e328a3fe9882b8a1f9", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 12271, "upload_time": "2017-04-17T18:28:12", "url": "https://files.pythonhosted.org/packages/35/5a/5205795af096a4e74a6396a2d61f292961e4b5890c58216623360875af9d/dcard-spider-0.2.14a0.tar.gz" } ], "0.2.2": [ { "comment_text": "", "digests": { "md5": "12727c5f189c8579ccf700ecf3728b6b", "sha256": "982e61b9c10afb2102a397851af9d222a6cb53d71ba13f1556aacb907e422fb2" }, "downloads": -1, "filename": "dcard-spider-0.2.2.tar.gz", "has_sig": false, "md5_digest": "12727c5f189c8579ccf700ecf3728b6b", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 12457, "upload_time": "2017-05-06T07:55:26", "url": "https://files.pythonhosted.org/packages/7a/3a/51386399375e766dfabd11a8193d0677514ae19ad441b3e5f7f0a710e9c7/dcard-spider-0.2.2.tar.gz" } ], "0.2.5": [ { "comment_text": "", "digests": { "md5": "1f9eff8ea99aedd621f987bf4cd20e4a", "sha256": "c0a6ad2b316b2549827a3d1270628f43ed4f1454550436f538ed6dd8b3978df8" }, "downloads": -1, "filename": "dcard-spider-0.2.5.zip", "has_sig": false, "md5_digest": "1f9eff8ea99aedd621f987bf4cd20e4a", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 18555, "upload_time": "2016-07-26T12:54:16", "url": "https://files.pythonhosted.org/packages/6c/12/31865411a5a9b5965b3f462c67b13193a906ccab3693a1c48e341f92cc51/dcard-spider-0.2.5.zip" } ], "0.2.6": [ { "comment_text": "", "digests": { "md5": "757e84acb23db2f771a0c644a5018172", "sha256": "93d764fdced49bb3fe2135565ff22fa984e60c469d811266805e5e13edc47dd3" }, "downloads": -1, "filename": "dcard-spider-0.2.6.zip", "has_sig": false, "md5_digest": "757e84acb23db2f771a0c644a5018172", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 18547, "upload_time": "2016-07-26T13:19:39", "url": "https://files.pythonhosted.org/packages/c8/51/36afb9a6787f33e7e0fde497b0cc87036d1f578cb98b90f9ca06dd0f2833/dcard-spider-0.2.6.zip" } ], "0.2.9": [ { "comment_text": "", "digests": { "md5": "bbd294e5099258a481de07a1e5804d36", "sha256": "25b546fc71aa0ebfce1d67838864de3c50c3caf8abb104241260fa9f9dfa5dc0" }, "downloads": -1, "filename": "dcard-spider-0.2.9.zip", "has_sig": false, "md5_digest": "bbd294e5099258a481de07a1e5804d36", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 19153, "upload_time": "2016-08-01T17:59:50", "url": "https://files.pythonhosted.org/packages/9d/90/87e560ccaecc50ff0dc6eca36acc2bd8395d9a79badef37bae7e25775458/dcard-spider-0.2.9.zip" } ], "0.3.0": [ { "comment_text": "", "digests": { "md5": "4622cf164821b04fd2667422393d2116", "sha256": "2598cc56cc0d279478e736bf41c8a54583dc3430fbf86efbb23905129f7b0969" }, "downloads": -1, "filename": "dcard-spider-0.3.0.tar.gz", "has_sig": false, "md5_digest": "4622cf164821b04fd2667422393d2116", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 12845, "upload_time": "2017-05-26T12:46:01", "url": "https://files.pythonhosted.org/packages/ab/07/b7bd6234b92a0bacbf38a24b958115ca224caa25c2ec27628fb54e60c4ae/dcard-spider-0.3.0.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "4622cf164821b04fd2667422393d2116", "sha256": "2598cc56cc0d279478e736bf41c8a54583dc3430fbf86efbb23905129f7b0969" }, "downloads": -1, "filename": "dcard-spider-0.3.0.tar.gz", "has_sig": false, "md5_digest": "4622cf164821b04fd2667422393d2116", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 12845, "upload_time": "2017-05-26T12:46:01", "url": "https://files.pythonhosted.org/packages/ab/07/b7bd6234b92a0bacbf38a24b958115ca224caa25c2ec27628fb54e60c4ae/dcard-spider-0.3.0.tar.gz" } ] }