{ "info": { "author": "mymusise", "author_email": "mymusise1@gmail.com", "bugtrack_url": null, "classifiers": [ "Development Status :: 5 - Production/Stable", "Environment :: Web Environment", "Framework :: Django", "Intended Audience :: Developers", "License :: OSI Approved :: MIT License", "Programming Language :: Python", "Programming Language :: Python :: 2.7", "Programming Language :: Python :: 3", "Topic :: Software Development" ], "description": "Ants\n===\n\n![Build Status](https://travis-ci.org/mymusise/Ants.svg?branch=master)\n\nAnts is a clawer framework for perfectionism base on Django\uff0c which help people build a clawer faster and easier to make manage.\n\nRequirements\n===\n\n- Python 2.7 +\n- Django 1.8 +\n- Gevent 1.1.1 +\n\n\nInstall \n===\n\n\tpip install ants\n \nTo get the lastest, download this fucking source code and build\n\n\tgit clone git@github.com:mymusise/Ants.git\n cd Ants\n python setup.py build\n sudo python setup.py install\n \n\nGetting started\n====\n\n## 1. Initialize project\n\nOk, I suggest you work on a virtualenv\n\n\tvirtualenv my_env\n source my_env/bin/activate\n\nThen we install the Requirements and start a django project\n\n\tpip install django gevent ants\n ants-tools startproject my_ants\n cd my_ants\n \nnow you can use ``tree`` command to get the file-tree of your project\n\n \u251c\u2500\u2500 manage.py\n \u251c\u2500\u2500 my_ants\n \u2502\u00a0\u00a0 \u251c\u2500\u2500 __init__.py\n \u2502\u00a0\u00a0 \u251c\u2500\u2500 settings.py\n \u2502\u00a0\u00a0 \u251c\u2500\u2500 urls.py\n \u2502\u00a0\u00a0 \u2514\u2500\u2500 wsgi.py\n \u251c\u2500\u2500 parsers\n \u2502\u00a0\u00a0 \u251c\u2500\u2500 admin.py\n \u2502\u00a0\u00a0 \u251c\u2500\u2500 ants\n \u2502\u00a0\u00a0 \u2502\u00a0\u00a0 \u2514\u2500\u2500 __init__.py\n \u2502\u00a0\u00a0 \u251c\u2500\u2500 apps.py\n \u2502\u00a0\u00a0 \u251c\u2500\u2500 __init__.py\n \u2502\u00a0\u00a0 \u251c\u2500\u2500 migrations\n \u2502\u00a0\u00a0 \u2502\u00a0\u00a0 \u2514\u2500\u2500 __init__.py\n \u2502\u00a0\u00a0 \u251c\u2500\u2500 models.py\n \u2502\u00a0\u00a0 \u251c\u2500\u2500 tests.py\n \u2502\u00a0\u00a0 \u2514\u2500\u2500 views.py\n \u2514\u2500\u2500 spiders\n \u251c\u2500\u2500 admin.py\n \u251c\u2500\u2500 ants\n \u2502\u00a0\u00a0 \u2514\u2500\u2500 __init__.py\n \u251c\u2500\u2500 apps.py\n \u251c\u2500\u2500 __init__.py\n \u251c\u2500\u2500 migrations\n \u2502\u00a0\u00a0 \u2514\u2500\u2500 __init__.py\n \u251c\u2500\u2500 models.py\n \u251c\u2500\u2500 tests.py\n \u2514\u2500\u2500 views.py\n\n\n\n### 2. Write your first spider ant!\n\nAnts will run ants belong to clawers by run command `python manage.py runclawer [spider_name]`\n\nLet's add a ant.``spider/ants/first_blood.py``\n\n import requests\n\n\n class FirstBloodClawerWhatNameYouWant(object):\n NAME = 'first_blood' # must be unique\n url_head = \"\"\"https://movie.douban.com/j/search_subjects?type=movie&tag=%E7%83%AD%E9%97%A8&sort=recommend&page_limit=10&page_start={}\"\"\"\n max_page = 4\n\n def get_url(self, url):\n res = requests.get(url)\n print(res.text)\n\n def start(self):\n need_urls = [self.url_head.format(i) for i in range(self.max_page)]\n list(map(self.get_url, need_urls))\n\n\nThis is a simple spider that can get hot movie from `Douban`.\nBe attention, the `NAME` property is required, which is the unique identification\nfor different clawers.\n\nNow we can run our first blood to get hot movie!\n\n python manage.py runclawer first_blood\n\n\n# About repeating URL\n\nAnts provide a way to avoid running a same url again if Ants it crash last time. You just need define two model, source and aim. For example:\n\nIn file clawers/models.py \n\n class BaseTask(models.Model):\n source_url = models.CharField(max_length=255)\n\n\n class MovieHtml(models.Model):\n task_id = models.IntegerField()\n html = models.TextField()\n\nWe define `BaseTask` to save the base task url we need request, and save the html source in `MovieHtml`. If we get a page by requesting a url from BaseTask, we will save this `BaseTask().id` to `MovieHtml().task_id`. We can redefine the our 'first_blood' ant like this:\n\n from ants.clawer import Clawer\n import requests\n\n class FirstBloodClawerWhatNameYouWant(Clawer):\n NAME = 'first_blood' # must be unique\n thread = 2 # the number of coroutine, default 1\n task_model = BaseTask\n goal_model = MovieHtml\n\n def run(self, task):\n res = requests.get(task.source_url)\n MovieHtml.objects.create(task_id=task.id, html=res.text)\n\nBefore 'first_blood'.run(), it will get all BaseTask objects and give each of then to run() function. But if one of BaseTask object.ID has in MovieHtml.task_id, this BaseTask object would be given to run function.\n\n# About async\n\nAnts use `gevent` as event pool to make each ant run with async, and you can use `thread` property to decide how many coroutine you want. More infomation from [the fucking source code](https://github.com/mymusise/Ants/blob/master/ants/utils.py#L74)\n\n# Enjoy!", "description_content_type": null, "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/mymusise/ants", "keywords": "spider django human", "license": "MIT", "maintainer": "", "maintainer_email": "", "name": "ants", "package_url": "https://pypi.org/project/ants/", "platform": "", "project_url": "https://pypi.org/project/ants/", "project_urls": { "Homepage": "https://github.com/mymusise/ants" }, "release_url": "https://pypi.org/project/ants/0.0.7/", "requires_dist": null, "requires_python": "", "summary": "A human spider framework base on django", "version": "0.0.7" }, "last_serial": 2618795, "releases": { "0.0.1": [], "0.0.2": [ { "comment_text": "", "digests": { "md5": "f5d600ce7e57c1171a2ae1d2f9e64eff", "sha256": "bf3932396869f28918a1c2c0d97f2c2408b9c93edbf068f8db94e23df9a993f9" }, "downloads": -1, "filename": "ants-0.0.2.tar.gz", "has_sig": false, "md5_digest": "f5d600ce7e57c1171a2ae1d2f9e64eff", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 4638, "upload_time": "2016-12-04T03:39:52", "url": "https://files.pythonhosted.org/packages/37/0b/223bf9c59bab9412f884261916843bbc39b49a5787dcb9981f112bfc063a/ants-0.0.2.tar.gz" } ], "0.0.3": [ { "comment_text": "", "digests": { "md5": "74d03d456de217c141d7dfd32fd6c0de", "sha256": "198325541b43e4810c9be7b8b607c356b727a8e340bbba926c39d1462102cbed" }, "downloads": -1, "filename": "ants-0.0.3.tar.gz", "has_sig": false, "md5_digest": "74d03d456de217c141d7dfd32fd6c0de", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 7785, "upload_time": "2016-12-05T14:45:17", "url": "https://files.pythonhosted.org/packages/44/01/06c7d79e08b4a73d884a63c1ecaca96e490993921a1219c301bfc46b0285/ants-0.0.3.tar.gz" } ], "0.0.4": [ { "comment_text": "", "digests": { "md5": "766cafb71573b3570d874e700281312f", "sha256": "8222fac81600859be29475d2cab6c12ea9f948b38a02549420e2146c926b8448" }, "downloads": -1, "filename": "ants-0.0.4.tar.gz", "has_sig": false, "md5_digest": "766cafb71573b3570d874e700281312f", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 5485, "upload_time": "2016-12-12T03:37:29", "url": "https://files.pythonhosted.org/packages/06/2b/6033b4abd462f02fb300e00328fdb3e8081d460d24f88098e7bd9c4b4c69/ants-0.0.4.tar.gz" } ], "0.0.5": [ { "comment_text": "", "digests": { "md5": "14f67f44db028f983b391d08cb5c4293", "sha256": "c3bbf667326de32fd7b93b82dd9503391a7f1fa4cf06a8c9369a29cf1c7d803f" }, "downloads": -1, "filename": "ants-0.0.5.tar.gz", "has_sig": false, "md5_digest": "14f67f44db028f983b391d08cb5c4293", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 8299, "upload_time": "2016-12-13T07:59:12", "url": "https://files.pythonhosted.org/packages/66/58/8a302e3232f14b716b72eae0fc983dd7b5829fcd2b3ffca9c024212e12c3/ants-0.0.5.tar.gz" } ], "0.0.7": [ { "comment_text": "", "digests": { "md5": "43e819651ee5a8ee85681e63bae38819", "sha256": "21bc5a4ae78f17ef3f2be5d45173ce71670ff1f1b51988d5741259405b741670" }, "downloads": -1, "filename": "ants-0.0.7.tar.gz", "has_sig": false, "md5_digest": "43e819651ee5a8ee85681e63bae38819", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 10339, "upload_time": "2017-02-04T10:50:15", "url": "https://files.pythonhosted.org/packages/97/1a/71d058169f353007d21afa242a4a1490604af31c3888f8beec5a82f007cd/ants-0.0.7.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "43e819651ee5a8ee85681e63bae38819", "sha256": "21bc5a4ae78f17ef3f2be5d45173ce71670ff1f1b51988d5741259405b741670" }, "downloads": -1, "filename": "ants-0.0.7.tar.gz", "has_sig": false, "md5_digest": "43e819651ee5a8ee85681e63bae38819", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 10339, "upload_time": "2017-02-04T10:50:15", "url": "https://files.pythonhosted.org/packages/97/1a/71d058169f353007d21afa242a4a1490604af31c3888f8beec5a82f007cd/ants-0.0.7.tar.gz" } ] }