{ "info": { "author": "Nathan Zilora", "author_email": "zwork101@gmail.com", "bugtrack_url": null, "classifiers": [], "description": "Fodio Home Page\n===============\n\n**What is fodio?**\n\nFodio is a web scraping tool, used to easily traverse websites and collects data on. Some key\nconcepts fodio was built on was simplicity and asynchronous-ity (No, I don't think that's a real word). Inspired by\nthe Demiurge_ library, which is not active anymore, Fodio is here to save the day!\n\n**Why would I not use Scrapy?**\n\nGood Question! There are some key differences that are important too note.\n\n1. Scrapy uses C extensions, which depending on the environment, can be a pain!\n2. Scrapy is not asynchronous!\n3. IMO, This is simpler.\n\nAlso, if you're on linux, you can use uvloop, which should help speed things up!\n\nInstallation\n------------\n\nTo install with pypi, use simply ``pip install fodio``\n\nTo install manually, either download and unpack, or clone this repository. Then, while in the root of the local\nrepository, do ``python setup.py install``\n\nI'm not great at this documentaion thing, so let's just get right into the quick-start.\n\nQuick-Start\n-----------\n\nToo start scraping, inspect element will be your friend. You're going to be copying the CSS selectors in the elements\nyou wish to harvest. In this case, we'll be walking though a github example that you can also find in the demo file,\non the Fodio github page.\n\nFirst, we need to make an item. Then \"item\" will represent a section of the page.\n\nIt's good too note, that this example won't have Items as ItemAttrs, but you can do that. I really suggest looking\nat the demos file, because that includes a bunch of great examples on what to do.\n\n.. code-block:: python\n\n class GithubPage(Item):\n ...\n class Meta:\n ...\n\nNotice the \"Meta\" class. This is used too hold your selector and root_url information. In this senerio, we want to get\na github user's real name + description. First, let's narrow in on the profile card with the \".h-card\" selector. Then,\nwe'll supply the root_url \"https://github.com\"\n\n.. code-block:: python\n\n class GithubPage(Item):\n ...\n class Meta:\n selector = \".h-card\"\n root_url = \"https://github.com\"\n\nNot that bad eh? Now that we have the profile card, we need to set some ItemAttributes to extract the information.\nWe'll use a TextAttr object to extract the text from a node with the real name, and anther TextAttr for the description.\n\n.. code-block:: python\n\n class GithubPage(Item):\n name = TextAttr(\".p-name\")\n desc = TextAttr(\".p-note > div:nth-child(1)\", raise_not_found=False)\n\n class Meta:\n selector = \".h-card\"\n root_url = \"https://github.com\"\n\n..\n\n.. note:: Notice the ``raise_not_found`` kwarg. Some github profiles don't have a description, and we don't want it too raise an\n error if they don't. The same if True for the real name, however that element will exist if they have a real name or\n not, so we'll have to check manually.\n\nNow time too run it! we'll create an asynchronous function called runner to do this. GithubPage has a classmethod called\nsearch which will add a path too the root_url, and fetch it. So let's do that.\n\n.. code-block:: python\n\n accounts = ['Zwork101', 'bukson', 'torvalds', 'User1m', 'random-robbie']\n async def runner():\n result = await GithubPage.search(account)\n print(\"{account_name}'s real name is {real_name}\".format(\n account_name=account, real_name=result['name'] if result['name'] else \"not known\"))\n print('{description}\\n'.format(\n description=('\"' + result['desc'] + '\"') if result['desc'] else \"No Description\"))\n\nAnd we're done! After adding asyncio too run it, this is our final product:\n\n.. code-block:: python\n\n import asyncio\n\n from fodio import Item, TextAttr\n\n\n class GithubPage(Item):\n name = TextAttr(\".p-name\")\n desc = TextAttr(\".p-note > div:nth-child(1)\", raise_not_found=False)\n\n class Meta:\n selector = \".h-card\"\n root_url = \"https://github.com\"\n\n accounts = ['Zwork101', 'bukson', 'torvalds', 'User1m', 'random-robbie']\n\n async def runner():\n result = await GithubPage.search(account)\n print(\"{account_name}'s real name is {real_name}\".format(\n account_name=account, real_name=result['name'] if result['name'] else \"not known\"))\n print('{description}\\n'.format(\n description=('\"' + result['desc'] + '\"') if result['desc'] else \"No Description\"))\n\n loop = asyncio.get_event_loop()\n loop.run_until_complete(runner())\n\nThat wasn't that bad now was it? For more examples, see the demo folder. Otherwise, get scraping!\n\n**Wait, but why is it called \"Fodio\"**\n\nWell, apparently developers like to name their projects in latin, or some other language no one uses. Fodio (atleast\nthis is what some sites said) can mean \"delve\" which is an english word for all you non-english majors / non-Magic: The\nGathering players. Delve means to research and dig, 2 things this library does!\n\n.. _Demiurge: http://demiurge.readthedocs.io/en/v0.2/", "description_content_type": "", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/Zwork101/fodio", "keywords": "", "license": "Apache2", "maintainer": "", "maintainer_email": "", "name": "fodio", "package_url": "https://pypi.org/project/fodio/", "platform": "", "project_url": "https://pypi.org/project/fodio/", "project_urls": { "Homepage": "https://github.com/Zwork101/fodio" }, "release_url": "https://pypi.org/project/fodio/1.0.1/", "requires_dist": null, "requires_python": "", "summary": "A scraping library made to be simple and asynchronous", "version": "1.0.1" }, "last_serial": 4075760, "releases": { "1.0.1": [ { "comment_text": "", "digests": { "md5": "19c8978ab3519b247ad40f03571feb51", "sha256": "a7f3cb1ae4bc8a3be01e91c5935905915ca6c419672ff619ab422460b7b8ab51" }, "downloads": -1, "filename": "fodio-1.0.1.tar.gz", "has_sig": false, "md5_digest": "19c8978ab3519b247ad40f03571feb51", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 7005, "upload_time": "2018-07-17T22:53:39", "url": "https://files.pythonhosted.org/packages/ab/80/2a55b0b14b0e9e69d8516571f73245048b6b3ea4591e142ef4d43172dae7/fodio-1.0.1.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "19c8978ab3519b247ad40f03571feb51", "sha256": "a7f3cb1ae4bc8a3be01e91c5935905915ca6c419672ff619ab422460b7b8ab51" }, "downloads": -1, "filename": "fodio-1.0.1.tar.gz", "has_sig": false, "md5_digest": "19c8978ab3519b247ad40f03571feb51", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 7005, "upload_time": "2018-07-17T22:53:39", "url": "https://files.pythonhosted.org/packages/ab/80/2a55b0b14b0e9e69d8516571f73245048b6b3ea4591e142ef4d43172dae7/fodio-1.0.1.tar.gz" } ] }