{ "info": { "author": "Christopher Brown", "author_email": "io@henrian.com", "bugtrack_url": null, "classifiers": [ "Development Status :: 4 - Beta", "License :: OSI Approved :: MIT License", "Programming Language :: Python", "Programming Language :: Python :: 2.7", "Topic :: Text Processing :: General" ], "description": "twilight\n========\n\nTools for accessing the `Twitter API\nv1.1 `__ with paranoid\ntimeouts and de-pagination.\n\nNode.js\n\n- ``twitter-curl`` for querying the streaming API (``/sample.json`` and\n ``/filter.json``).\n- ``rtcount`` for pulling out the retweets from a stream of JSON\n tweets, and counting them.\n\nPython\n\n- ``json2ttv2`` converts directories full of Twitter ``.json`` files\n into ``.ttv2`` files, bzip2'ing them, and ensuring that the result is\n within a reasonable size of the source (greater than 2%, but less\n than 6%) before deleting the original json.\n- ``twitter-user`` pulls down the ~3,200 (max) tweets that are\n accessible for a given user (also depends on the ``~/.twitter`` auth\n file).\n\nCrawling quickstart\n===================\n\nInstall from ``npm``:\n\n::\n\n npm install -g twilight\n\nOr github (to make sure you're getting the most up-to-date version):\n\n::\n\n npm install -g git://github.com/chbrown/twilight\n\nAuthenticate\n------------\n\nThis app uses only OAuth 1.0A, which is mandatory. As of June 11, 2013,\n`basic HTTP authentication is\ndisabled `__ in the Twitter\nStreaming API. So get some OAuth credentials together `real\nquick `__ and make a csv file that\nlooks like this:\n\n+-----------------+--------------------+-----------------+-------------------------+\n| consumer\\_key | consumer\\_secret | access\\_token | access\\_token\\_secret |\n+=================+====================+=================+=========================+\n| ziurk0An7... | VKmTsGrk2JjH... | 91505165... | VcLOIzA0mkiCSbU... |\n+-----------------+--------------------+-----------------+-------------------------+\n| 63Yp9EG4t... | DhrlIQBMUaoL... | 91401882... | XJa4HQKMgqfd7ee... |\n+-----------------+--------------------+-----------------+-------------------------+\n| ... | ... | ... | ... |\n+-----------------+--------------------+-----------------+-------------------------+\n\nThere **must** be a header line with *exactly* the following values:\n\n- consumer\\_key\n- consumer\\_secret\n- access\\_token\n- access\\_token\\_secret\n\nTab / space seperated is fine, and any other columns will simply be\nignored, e.g., if you want to record the ``screen_name`` of each\naccount. Also, order doesn't matter -- your headers just have to line up\nwith their values.\n\nThe ``twitter-curl`` script expects to find this file at ``~/.twitter``,\nbut you can specify a different path with the ``--accounts`` command\nline argument.\n\nFrom my ``/etc/supervisor/conf.d/*``\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\n(See http://supervisord.org/ to get that all set up.)\n\n.. code:: bash\n\n [program:justinnnnnn]\n user=chbrown\n command=twitter-curl\n --filter \"track=loveyabiebs,belieber,bietastrophe\"\n --file /data/twitter/justin_TIMESTAMP.json\n --timeout 86400\n --interval 3600\n --ttv2\n\n``twitter-curl`` options\n------------------------\n\n- ``--accounts`` should point to a file with OAuth Twitter account\n credentials. Currently, the script will simply use a random row from\n this file.\n- ``--filter`` can be any ``track=whatever`` or\n ``locations=-18,14,68,44`` etc. A ``querystring``-parsable string. If\n no filter is specified, it will use the spritzer at ``/sample.json``\n- ``--file`` shouldn't require creating any directions, and the\n TIMESTAMP bit will be replaced by a filesystem-friendly iso\n representation of whenever the program is started.\n\n.. code:: javascript\n\n // Specifically:\n var stamp = new Date().toISOString().replace(/:/g, '-').replace(/\\..+/, '');\n // stamp == '2013-06-07T15-47-49'\n\n- ``--timeout`` (seconds) the program will die with error code 1 after\n this amount of time. Don't specify a timeout if you don't want this.\n- ``--interval`` (seconds) the amount of time to allow for silence from\n Twitter before dying. Also exits with code 1. Defaults to 600 (10\n minutes).\n- ``--ttv2`` (boolean) output TTV2 normalized flat tweets instead of\n full JSON.\n\nBecause in most cases of error the script simply dies, this approach\nonly really makes sense if you're putting it behind some process\nmonitor. (By the way, I've tried most of them: monitd, daemontools's\nsvc, god, bluepill, node-forever---and supervisord is by far the best.)\n\nThe script does not abide by any Twitter backoff requirement, but I've\nnever had any trouble with backoff being enforced by Twitter. It's more\nthan curl, though, because it checks that it's receiving data. Often,\nwith ``curl`` and ``PycURL``, my connection would be dropped by Twitter,\nbut no end signal would be sent. My crawler would simply hang, expecting\ndata, but would not try to reconnect.\n\nBut beyond that, without ``--ttv2``, it doesn't provide anything more\nthan ``curl``.\n\nTTV2\n----\n\nTTV2 is the Tweet tab-separated format version 2, the specification is\nbelow. Fields are 1-indexed for easy AWKing (see Markdown source for\n0-indexing).\n\n0. tweet\\_id\n1. created\\_at parsed into YYYYMMDDTHHMMSS, implicitly UTC\n2. text, newlines and tabs converted to spaces, html entities replaced,\n t.co urls resolved\n3. lon,lat\n4. place\\_id\n5. place\\_str\n6. in\\_reply\\_to\\_status\\_id\n7. in\\_reply\\_to\\_screen\\_name\n8. retweet\\_id id of the original tweet\n9. retweet\\_count\n10. user.screen\\_name\n11. user.id\n12. user.created\\_at parsed into YYYYMMDDTHHMMSS\n13. user.name\n14. user.description\n15. user.location\n16. user.url\n17. user.statuses\\_count\n18. user.followers\\_count\n19. user.friends\\_count\n20. user.favourites\\_count\n21. user.geo\\_enabled\n22. user.default\\_profile\n23. user.time\\_zone\n24. user.lang\n25. user.utc\\_offset\n\nThis format is not the default, and will be the output only when you use\nthe ``--ttv2`` option.\n\nExamples\n--------\n\nInstall `json `__ first:\n``npm install json``. It's awesome.\n\n::\n\n twitter-curl --filter 'track=bootstrap' | json -C text\n twitter-curl --filter 'track=bootstrap' | json -C user.screen_name text\n twitter-curl --filter 'track=\u0627\u0646\u062a\u062e\u0627\u0628\u0627\u062a' | json -C text\n twitter-curl --filter 'track=sarcmark,%F0%9F%91%8F' | json -C text\n\nIt supports unicode: \u0627\u0646\u062a\u062e\u0627\u0628\u0627\u062a is Arabic for \"elections,\" and\n``decodeURIComponent('%F0%9F%91%8F')`` is the `\"CLAPPING HANDS\"\n(U+1F44F) `__\ncharacter.\n\nIf you use a filter with url-escaped characters in supervisord, note\nthat supervisord Python-interpolates strings, so you'll need to escape\nthe percent signs, e.g.:\n\n::\n\n [program:slowclap]\n command=twitter-curl --filter \"track=%%F0%%9F%%91%%8F\" --file /tmp/slowclap.json\n\nTTV2 Example\n~~~~~~~~~~~~\n\nInstead of JSON, you can use AWK to look at the TTV2:\n\n::\n\n twitter-curl --filter 'track=data,science' --ttv2 | awk 'BEGIN{FS=\"\\t\"}{print $4,$3}'\n\nStats\n-----\n\n- RSS usage per-process is between 20-40MB.\n- VSZ on a machine running six of these crawlers is 80-90MB.\n\nPython contents vs. Javascript contents\n---------------------------------------\n\n::\n\n easy_install -U twilight\n\nThe Python and Javascript components are mostly complementary. The\nJavascript offers crawlers, Python provides post-processing.\n\nTesting with Travis CI\n----------------------\n\nThe tested CLI commands now check for OAuth in specific environment\nvariables before reading the given ``--accounts`` file or the default\none (``~/.twitter``).\n\nTo get tests to run on Travis CI, we can use ``travis`` command line\ntool to encrypt a quad of valid Twitter OAuth credentials so that only\nTravis CI can see them.\n\nPut together a file that looks like this (call it ``twilight.env``):\n\n::\n\n consumer_key=bepLTQD5ftZCjqhXgkuJW\n consumer_secret=jZ4HEYgNRKwJykbh5ptmcqV7v0o2WODdiMTF1fl6B9X\n access_token=167246169-e1XTUxZqLnRaEyBF8KwOJtbID26gifMpAjukN5vz\n access_token_secret=OVm7fJt8oY0C9kBsvych6Duq5pNIUxwagG143HdR\n\nAnd then, from within the root directory of this git repository, run the\nfollowing sequence:\n\n::\n\n gem install travis\n travis encrypt -s -a < twilight.env\n\n``.travis.yml`` should now have those variables, but encrypted with\nTravis CI's public key.\n\nLicense\n-------\n\nCopyright \u00a9 2011\u20132013 Christopher Brown. `MIT\nLicensed `__.", "description_content_type": null, "docs_url": null, "download_url": "UNKNOWN", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "http://github.com/chbrown/twilight", "keywords": "twilight oauth crawling bot", "license": "Copyright (c) 2011-2013 Christopher Brown\n\nPermission is hereby granted, free of charge, to any person obtaining a copy\nof this software and associated documentation files (the \"Software\"), to deal\nin the Software without restriction, including without limitation the rights\nto use, copy, modify, merge, publish, distribute, sublicense, and/or sell\ncopies of the Software, and to permit persons to whom the Software is\nfurnished to do so, subject to the following conditions:\n\nThe above copyright notice and this permission notice shall be included in\nall copies or substantial portions of the Software.\n\nTHE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR\nIMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,\nFITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE\nAUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER\nLIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,\nOUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN\nTHE SOFTWARE.", "maintainer": null, "maintainer_email": null, "name": "twilight", "package_url": "https://pypi.org/project/twilight/", "platform": "UNKNOWN", "project_url": "https://pypi.org/project/twilight/", "project_urls": { "Download": "UNKNOWN", "Homepage": "http://github.com/chbrown/twilight" }, "release_url": "https://pypi.org/project/twilight/0.3.10/", "requires_dist": null, "requires_python": null, "summary": "Twitter (crawling) tools.", "version": "0.3.10" }, "last_serial": 1081458, "releases": { "0.2.6": [ { "comment_text": "", "digests": { "md5": "02db2a788462f152df90750d451049b7", "sha256": "ba1094fd20f84ce9b15b48e94d4cc829409a484670103009b4fa477b47c4a5e5" }, "downloads": -1, "filename": "twilight-0.2.6.tar.gz", "has_sig": false, "md5_digest": "02db2a788462f152df90750d451049b7", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 9751, "upload_time": "2013-06-14T06:05:11", "url": "https://files.pythonhosted.org/packages/82/6e/7e244016ad957eea765c5a9ce6d5320510313ca93dc27f78043d7b209105/twilight-0.2.6.tar.gz" } ], "0.3.10": [ { "comment_text": "", "digests": { "md5": "427246beb946545654a899627b1f7c3e", "sha256": "334738263b19d022233d3ee58caeecdefffc3587fdc25d1247228853c04225ee" }, "downloads": -1, "filename": "twilight-0.3.10.tar.gz", "has_sig": false, "md5_digest": "427246beb946545654a899627b1f7c3e", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 19839, "upload_time": "2014-05-05T14:59:31", "url": "https://files.pythonhosted.org/packages/ea/3f/7a338c4ed1594e0e60cefbc9cbd77270e98e5bb7c042e03672e363fafa6f/twilight-0.3.10.tar.gz" } ], "0.3.5": [ { "comment_text": "", "digests": { "md5": "3f4a776583d3981c710e85d1b5cfe391", "sha256": "fe01e08c54b47a025391f4f52ff7c59284da09b47314504d0943ff314b03f955" }, "downloads": -1, "filename": "twilight-0.3.5.tar.gz", "has_sig": false, "md5_digest": "3f4a776583d3981c710e85d1b5cfe391", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 11147, "upload_time": "2013-11-27T04:36:24", "url": "https://files.pythonhosted.org/packages/5a/45/a337a2ad2f53a11ca510efebf9c563d5426d5dc30a2b7ba2061764182f01/twilight-0.3.5.tar.gz" } ], "0.3.6": [ { "comment_text": "", "digests": { "md5": "e8a032493aa52de91a3d45a2a5f4092e", "sha256": "08fda772f3baeab3c64844e7cef6363104e3a433b832daa468386804d3abf686" }, "downloads": -1, "filename": "twilight-0.3.6.tar.gz", "has_sig": false, "md5_digest": "e8a032493aa52de91a3d45a2a5f4092e", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 20867, "upload_time": "2013-11-27T07:32:45", "url": "https://files.pythonhosted.org/packages/40/ef/f9e3c69d72f2d9247c27bec68c175fca8d5a4bf418f5641abdd91ecc5aa2/twilight-0.3.6.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "427246beb946545654a899627b1f7c3e", "sha256": "334738263b19d022233d3ee58caeecdefffc3587fdc25d1247228853c04225ee" }, "downloads": -1, "filename": "twilight-0.3.10.tar.gz", "has_sig": false, "md5_digest": "427246beb946545654a899627b1f7c3e", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 19839, "upload_time": "2014-05-05T14:59:31", "url": "https://files.pythonhosted.org/packages/ea/3f/7a338c4ed1594e0e60cefbc9cbd77270e98e5bb7c042e03672e363fafa6f/twilight-0.3.10.tar.gz" } ] }