{ "info": { "author": "Sacha Jungerman", "author_email": "jungerm2@illinois.edu", "bugtrack_url": null, "classifiers": [ "Development Status :: 3 - Alpha", "Intended Audience :: Developers", "Intended Audience :: Science/Research", "License :: OSI Approved :: MIT License", "Operating System :: OS Independent", "Programming Language :: Python :: 3.6", "Topic :: Scientific/Engineering", "Topic :: Software Development", "Topic :: Software Development :: Libraries :: Python Modules" ], "description": "Twper - an asynchronous twitter scraper\n=======================================\n\nTwitter provides a powerful REST and Streaming API. With the REST API,\nyou can only send about 720 requests per hour and only get tweets that\nare less than a week old. This project tries to overcome these\nlimitations by scraping the website instead. This directly translates\nto: \\* No rate limits, you can easily scrape 50 thousand tweets in an\nhour (without even using threads!) \\* No authentication is needed (the\nrequest header is random so you should'nt get blacklisted) \\* And most\nimportantly, you are free to query tweets that are more than 7 days old\n\nThis project is similar in nature to `taspinar's\ntwitterscraper `__ and was\nheavily inspired by it. In fact, a small portion of the code was\nshamelessly borrowed.\n\nThe main difference between our two libraries is that this one is fully\nasynchronous so requests are non-blocking. This allows multiple requests\nto be processed in a shorter period of time, making the scraper much\nfaster.\n\nGetting Started\n---------------\n\nUnfortunately, at the moment this package is only for python 3.6+ as it\nrelies on newer features.\n\nInstalling\n~~~~~~~~~~\n\nTo install this package simply run\n\n::\n\n (sudo) pip install Twper\n\nOr you can clone the repository and in the folder containing setup.py\nrun\n\n::\n\n python setup.py install\n\nExamples\n~~~~~~~~\n\nEach tweet is represented as a Tweet object and contains the following\nattributes: \\* user - the sender's username \\* fullname - the sender's\nfull name \\* tweet\\_id - a unique id (provided by twitter) \\* url - a\nurl to that specific tweet \\* timestamp - a datetime object of when the\ntweet was sent \\* text - the tweet's message \\* replies - number of\nreplies the tweet got \\* retweets - number of retweets it got \\* likes -\nnumber of likes the tweet received \\* hashtags - what hashtags are\nmentioned in the tweet\n\n*Note:* Tweet has a from\\_id constructor that returns a tweet object\nfrom a tweet\\_id.\n\n*Warning:* This feature uses requests and is blocking.\n\nTo get additional info about a specific user you can use the\nTwitterAccount class. Specifically TwitterAccount.from\\_username(user)\ncan be used. A TwitterAccount has the following attributes: \\* user -\nthe sender's username \\* fullname - the sender's full name \\* tweets -\nnumber of tweets the user tweeted \\* following - number of people the\nuser is following \\* followers - number of people following the user \\*\nlikes - number of likes issued by the user \\* lists - number of lists\nissued by the user \\* moments - number of moments the user has \\* bio -\nthe user's biography (short description) \\* location - the user's\ngeographical location \\* location\\_id - the corresponding location id\ntwitter uses \\* website - the user's website \\* birthday - datetime of\nthe user's birthday if publicly available \\* joined - datetime of when\nthe user joined\n\n*Note:* Some of the above info might be missing/not publicly available.\nIn this case the default value for dates is None, from strings it's an\nempty string and for ints its zero.\n\n*Warning:* The from\\_username feature uses requests and is blocking.\n\nA search is described by a query string (q\\_str) and these have the\nfollowing properties:\n\n+----------------------+---------------------------------------------+\n| q\\_str | What it will query for |\n+======================+=============================================+\n| A B C | tweets containing A and B and C |\n+----------------------+---------------------------------------------+\n| \"ABC\" | tweets containing the exact match ABC |\n+----------------------+---------------------------------------------+\n| A OR C | tweets containing either A or C |\n+----------------------+---------------------------------------------+\n| -A -B | tweets NOT containing A and NOT B |\n+----------------------+---------------------------------------------+\n| #ABC | tweets containing the hashtag #ABC |\n+----------------------+---------------------------------------------+\n| from:A | tweets that are from account A |\n+----------------------+---------------------------------------------+\n| to:B | tweets that are to account B |\n+----------------------+---------------------------------------------+\n| @C | tweets that mention account C |\n+----------------------+---------------------------------------------+\n| since:2017-12-01 | tweets after date |\n+----------------------+---------------------------------------------+\n| until:2017-12-02 | tweets before date |\n+----------------------+---------------------------------------------+\n| place:LOCATION\\_ID | tweets from location with id LOCATION\\_ID |\n+----------------------+---------------------------------------------+\n\n*Note:* Ordering does not matter, and a search is case-insensitive\nexcept for keywords OR, from:, to: since:, until: and place: which\nshould be written exactly as above. Also there should NOT be a space\nbetween the colon and search value (ex: from: A is wrong and will search\nfor tweets containing 'from:' and 'A' instead of the intended behavior).\n\nIn this package there's two classes used to search tweets (Query,\nQueries). Both these classes have a get\\_tweets method which returns an\nasync generator. And therefore they need to be ran in an event loop. Do\nnot worry if you haven't used these before, let's jump right into it!\n\n::\n\n async def main():\n # Example 1: A simple search using Query\n q = Query('Some Query Goes Here', limit=20)\n async for tw in q.get_tweets():\n # Process data\n print(tw)\n\n\n # This actually runs the main function\n loop = asyncio.get_event_loop()\n try:\n loop.run_until_complete(main())\n loop.run_until_complete(loop.shutdown_asyncgens())\n finally:\n loop.close()\n\nThis will print the most recent 20 tweets (from newest to oldest)\ncontaining the words 'some' and 'query' and 'goes' and 'here'.\n\nSometimes, the q\\_str you want to query for is too long and it needs to\nbe broken up into smaller pieces and ORed together. For this you can use\nthe Queries class. The Queries class executes many different queries\ntogether and then joins them. In the following example four separate\nqueries are executed at once and tweets are printed reverse\nchronological order (newest first). Please note that in the following\nQueries example is not faster than running those four queries\nsequentially, rather it merges them chronologically.\n\n::\n\n async def main():\n # Example 2: Multiple searches using Queries.\n qs = Queries(['Some', 'Query', 'Goes', 'Here'], limit=5)\n async for tw in qs.get_tweets():\n # Process data\n print(tw)\n\n\n # This actually runs the main function\n loop = asyncio.get_event_loop()\n try:\n loop.run_until_complete(main())\n loop.run_until_complete(loop.shutdown_asyncgens())\n finally:\n loop.close()\n\nThe limit key word argument simply limits the maximum number of results\nany generator can yield. In the second example the limit is applied to\nevery query individually so the maximum number of tweets it can yield is\n5 x 4 = 20.\n\nFor further question I encourage you to look at the source code as it is\nnot long and well commented before asking.\n\nContributing\n------------\n\nThis is my first open source project, so please feel free to contribute\nin any way and/or point out what I should improve (as well as any bugs\nof course). Pull requests and issues are welcomed.\n\nTodo\n----\n\nIf you are looking to contribute or just curious about what I plan to\nadd/fix here is the todo list:\n\n- Remove the requests dependency. This is a blocking library that\n should be replaced by aiohttp. It is only used in Tweet.from\\_id and\n TwitterAccount.from\\_username and therefore doesn't affect the\n performance of Querying.\n\n- Improve the TwitterAccount class. Hopefully it's possible to scrape\n what accounts a user is following and what accounts are following the\n user if we add authentication. Currently we can only retrieve stats\n about a user account.\n\n- Possibly add support for other languages. Currently, only english is\n fully supported even though you can set language to something other\n than 'en' in the Query constructor. Setting it to None searches\n everything regardless of the language.\n\nAuthors\n-------\n\n- Sacha Jungerman - Initial Work -\n `Twper `__\n\nLicense\n-------\n\nThis project is licensed under the MIT License - see the\n`LICENSE `__ for details\n\nAcknowledgments\n---------------\n\n- Credit's to `Taspinar `__ for his great\n library that inspired the creation of this one.\n\n\n", "description_content_type": "", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/jungerm2/Twper", "keywords": "twitter scraper async asynchronous fast non-blocking asyncio", "license": "", "maintainer": "", "maintainer_email": "", "name": "Twper", "package_url": "https://pypi.org/project/Twper/", "platform": "", "project_url": "https://pypi.org/project/Twper/", "project_urls": { "Homepage": "https://github.com/jungerm2/Twper" }, "release_url": "https://pypi.org/project/Twper/0.1.3/", "requires_dist": [ "coala-utils (~=0.5.0)", "bs4", "lxml", "fake-useragent", "requests", "asyncio", "aiohttp", "aiostream" ], "requires_python": "", "summary": "An asynchronous twitter scraper", "version": "0.1.3" }, "last_serial": 3855885, "releases": { "0.1.0": [ { "comment_text": "", "digests": { "md5": "0c659e39452feca631a1b555d18ea8be", "sha256": "2ce1f1d976131d164b2ca495177bd8fdca4ca8929b17142cd06ba006ba25a410" }, "downloads": -1, "filename": "Twper-0.1.0-py3-none-any.whl", "has_sig": false, "md5_digest": "0c659e39452feca631a1b555d18ea8be", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 12786, "upload_time": "2018-01-05T21:48:11", "url": "https://files.pythonhosted.org/packages/6a/36/b928201c78e9a0ae473d9c4e7a5a3d21b4ff2947686c0d8a45520164a030/Twper-0.1.0-py3-none-any.whl" } ], "0.1.1": [ { "comment_text": "", "digests": { "md5": "e65b93d318a463eca2e065cde814c277", "sha256": "b8d29def9efe0947c1808193c47540377793314ecab8af73e6ee41307122c94f" }, "downloads": -1, "filename": "Twper-0.1.1-py3-none-any.whl", "has_sig": false, "md5_digest": "e65b93d318a463eca2e065cde814c277", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 12997, "upload_time": "2018-01-05T22:21:26", "url": "https://files.pythonhosted.org/packages/5e/66/6c8b6b6e215134bc118c19f432873d34f1106454cbfccda56bc68f2761d0/Twper-0.1.1-py3-none-any.whl" } ], "0.1.2": [ { "comment_text": "", "digests": { "md5": "2a4b096f3d7fc7fe5ff00d6f92f69901", "sha256": "9c33f1a116292d7692364fb0431a61e1df41ff7598b6fcdc8c1b31a7c63f21f1" }, "downloads": -1, "filename": "Twper-0.1.2-py3-none-any.whl", "has_sig": false, "md5_digest": "2a4b096f3d7fc7fe5ff00d6f92f69901", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 13076, "upload_time": "2018-01-06T04:56:29", "url": "https://files.pythonhosted.org/packages/58/76/3e13d8cd9312a5429e03f13c3b4538a425de16312f5195f34fc1db0f943d/Twper-0.1.2-py3-none-any.whl" } ], "0.1.3": [ { "comment_text": "", "digests": { "md5": "ab7093fc59579fc5aa9382f8a424d155", "sha256": "f542e3006f97524d3a884104c05cd423d2136c10aa849ec368da37ef80d0c541" }, "downloads": -1, "filename": "Twper-0.1.3-py3-none-any.whl", "has_sig": false, "md5_digest": "ab7093fc59579fc5aa9382f8a424d155", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 12722, "upload_time": "2018-05-12T02:41:23", "url": "https://files.pythonhosted.org/packages/42/37/5a49dfe94ea594b85273dd9ad26761f28c045470d6da7df97737ede6d681/Twper-0.1.3-py3-none-any.whl" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "ab7093fc59579fc5aa9382f8a424d155", "sha256": "f542e3006f97524d3a884104c05cd423d2136c10aa849ec368da37ef80d0c541" }, "downloads": -1, "filename": "Twper-0.1.3-py3-none-any.whl", "has_sig": false, "md5_digest": "ab7093fc59579fc5aa9382f8a424d155", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 12722, "upload_time": "2018-05-12T02:41:23", "url": "https://files.pythonhosted.org/packages/42/37/5a49dfe94ea594b85273dd9ad26761f28c045470d6da7df97737ede6d681/Twper-0.1.3-py3-none-any.whl" } ] }