{ "info": { "author": "Scott Hendrickson, Brian Lehman, Josh Montague", "author_email": "scott@drskippy.net", "bugtrack_url": null, "classifiers": [], "description": "Python Library\n and\n Command Line Utilities\n for Gnip Historical PowerTrack API\n\n\nThe process for launching and retrieveing data for an historical historical job \nrequires only a few steps:\n 1) create job\n 2) retrieve and review job quote\n 3) accept or reject job\n 4) download data files list\n 5) download data\n\nUntilities are included to assist with each step.\n\nSETUP UTILITY\n=============\nFirst, set up your Gnip credentials. There is a simple utility to create the local credential \nfile named \".gnip\".\n\n$ ./setup_gnip_creds.py \nUsername: shendrickson@gnip.com \nPassword: \nPassword again: \nEndpoint URL. Enter your Account Name (eg https://historical.gnip.com:443/accounts//): shendrickson\nDone creating file ./.gnip\nBe sure to run:\nchmod og-w .gnip\n \n$ chmod og-w .gnip\n\nIf you use the example JSON job description, be sure to change the \"serviceUserNameField\"\nto your own, i.e., for Twitter, use your Twitter handle.\n\nYou will likely wish to run these utilities from other directory locations so be sure the export an\nupdated PYTHONPATH,\n\n$ export PYTHONPATH=${PYTHONPATH}:path-to-gnip-python-historical-utilities\n\nCREATE JOB\n==========\nCreate a job description by editing the example JSON file provided (\"bieber_job1.json\").\n\nYou will end up with a single JSON record like this (see GNIP documentation for option \ndetails). the fromDate and toDate are in the format YYYYmmddHHMM:\n\n{\n \"dataFormat\" : \"activity-streams\",\n \"fromDate\" : \"201201010000\",\n \"publisher\" : \"twitter\",\n \"rules\" : \n [\n {\n \"tag\" : \"bestRuleEver\",\n \"value\" : \"bieber\"\n }\n ],\n \"serviceUsername\" : \"PUT_YOUR_TWITTER_HANDLE_HERE\",\n \"streamType\" : \"track\",\n \"title\" : \"BieberJob1\",\n \"toDate\" : \"201201010001\"\n}\n\nTo create the job,\n\n$ ./create_job.py -f./bieber_job1.json -t \"Social Data Phenoms - Bieber\"\n\nThe response is the JSON record returned by the server. It will describe the job (including\nJobID and the JobURL, or any error messages.\n\nTo get help,\n\n$ ./create_job.py -h\nUsage: create_job.py [options]\n\nOptions:\n -h, --help show this help message and exit\n -u URL, --url=URL Job url.\n -l, --prev-url Use previous Job URL (only from this configuration\n file.).\n -v, --verbose Detailed output.\n -f FILENAME, --filename=FILENAME\n File defining job (JSON)\n -t TITLE, --title=TITLE\n Title of project, this title supercedes title in file.\n\n\nLIST JOBS, get JOB QUOTES and get JOB STATUS:\n=============================================\n$ ./list_jobs.py -h\nUsage: list_jobs.py [options]\n\nOptions:\n -h, --help show this help message and exit\n -u URL, --url=URL Job url.\n -l, --prev-url Use previous Job URL (only from this configuration\n file.).\n -v, --verbose Detailed output.\n -d SINCEDATESTRING, --since-date=SINCEDATESTRING\n Only list jobs after date, (default\n 2012-01-01T00:00:00)\n\nFor example, I have three completed jobs, a Gnip job, a Bieber job and a SXSW \njob for which data is avaiable.\n\n$ ./list_jobs.py \n#########################\nTITLE: GNIP2012\nSTATUS: finished\nPROGRESS: 100.0 %\nJOB URL: https://historical.gnip.com:443/accounts/shendrickson/publishers/twitter/historical/track/jobs/eeh2vte64.json\n#########################\nTITLE: Justin Bieber 2009\nSTATUS: finished\nPROGRESS: 100.0 %\nJOB URL: https://historical.gnip.com:443/accounts/shendrickson/publishers/twitter/historical/track/jobs/j5epx4e5c3.json\n#########################\nTITLE: SXSW2010-2012\nSTATUS: finished\nPROGRESS: 100.0 %\nJOB URL: https://historical.gnip.com:443/accounts/shendrickson/publishers/twitter/historical/track/jobs/sbxff05b8d.json\n\n\nTo see detailed information or download data filelist, \nspecify URL with -u or add -v flag (data_files.txt contains \nonly URLs from last job in list)\n\nDOWNLOAD URLS OF FILES CONTAINING DATA\n======================================\nTo retrieve the file locations for the data files this job created on S3, pass \nthe job URL with the -u flag (or if you used -u for this job previously, just use -l--see help),\n\n$ ./list_jobs.py -u https://historical.gnip.com:443/accounts/shendrickson/publishers/twitter/historical/track/jobs/sbxff05b8d.json\n#########################\nTITLE: SXSW2010-2012\nSTATUS: finished\nPROGRESS: 100.0 %\nJOB URL: https://historical.gnip.com:443/accounts/shendrickson/publishers/twitter/historical/track/jobs/sbxff05b8d.json\n\nRESULT:\n Job completed at ........ 2012-09-01 04:35:23\n No. of Activities ....... -1\n No. of Files ............ -1\n Files size (MB) ......... -1\n Data URL ................ https://historical.gnip.com:443/accounts/shendrickson/publishers/twitter/historical/track/jobs/sbxff05b8d/results.json\nDATA SET:\n No. of URLs ............. 131,211\n File size (bytes)........ 2,151,308,466\n Files (URLs) ............ https://archive.replay.historicals.review.s3.amazonaws.com/historicals/twitter/track/activity-streams/shendrickson/2012/08/28/20100101-20120815_sbxff05b8d/2010/01/01/00/00_activities.json.gz?AWSAccessKeyId=AKIAJ7O2S22DN2NDN7UQ&Expires=1349066046&Signature=hDSc0a%2BRQeG%2BknaSAWpzSUoM1F0%3D\nhttps://archive.replay.historicals.review.s3.amazonaws.com/historicals/twitter/track/activity-streams/shendrickson/2012/08/28/20100101-20120815_sbxff05b8d/2010/01/01/00/10_activities.json.gz?AWSAccessKeyId=AKIAJ7O2S22DN2NDN7UQ&Expires=1349066046&Signature=DOZlXKuMByv5uKgmw4QrCOpmEVw%3D\nhttps://archive.replay.historicals.review.s3.amazonaws.com/historicals/twitter/track/activity-streams/shendrickson/2012/08/28/20100101-20120815_sbxff05b8d/2010/01/01/00/20_activities.json.gz?AWSAccessKeyId=AKIAJ7O2S22DN2NDN7UQ&Expires=1349066046&Signature=X4SFTxwM2X9Y7qwgKCwG6fH8h7w%3D\nhttps://archive.replay.historicals.review.s3.amazonaws.com/historicals/twitter/track/activity-streams/shendrickson/2012/08/28/20100101-20120815_sbxff05b8d/2010/01/01/00/30_activities.json.gz?AWSAccessKeyId=AKIAJ7O2S22DN2NDN7UQ&Expires=1349066046&Signature=WVubKurX%2BAzYeZLX9UnBamSCrHg%3D\nhttps://archive.replay.historicals.review.s3.amazonaws.com/historicals/twitter/track/activity-streams/shendrickson/2012/08/28/20100101-20120815_sbxff05b8d/2010/01/01/00/40_activities.json.gz?AWSAccessKeyId=AKIAJ7O2S22DN2NDN7UQ&Expires=1349066046&Signature=OG9ygKlXNxFvJLlAEWi3hes5yyw%3D\n...\n\nWriting files to data_files.txt...\n\nFilenames for the 131K files created on S3 by the job have been downloaded to a file in \nthe local directory, ./data_files.txt.\n\nDOWNLOAD DATA\n=============\n\nTo retrieve this data use the utility,\n\n$ ./get_data_files.bash\n...\n\nThis will lauch up to 8 simultaneousl cUrl connections to S3 to download the files \ninto a local ./data/year/month/day/hour... directory tree (see name_mangle.py for details).\n\nACCEPT/REJECT JOB\n=================\nAfter a job is quoted, you can accept or reject the job. The job will not start until it is accepted.\n\n$ ./accept_job -u https://historical.gnip.com:443/accounts/shendrickson/publishers/twitter/historicals/track/jobs/c9pe0day6h.json\n\nor \n\n$ ./reject_job -u https://historical.gnip.com:443/accounts/shendrickson/publishers/twitter/historicals/track/jobs/c9pe0day6h.json\n\nThe module gnip_historical.py provides additional functionality you can access programatically.\n\n==\nGnip-Python-Historical-Utilities by Scott Hendrickson is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License. This work is licensed under the Creative Commons Attribution-ShareAlike 3.0 Unported License. To view a copy of this license, visit http://creativecommons.org/licenses/by-sa/3.0/.", "description_content_type": null, "docs_url": null, "download_url": "UNKNOWN", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "http://pypi.python.org/pypi/gnip-historical/", "keywords": null, "license": "LICENSE.txt", "maintainer": null, "maintainer_email": null, "name": "gnip-historical", "package_url": "https://pypi.org/project/gnip-historical/", "platform": "UNKNOWN", "project_url": "https://pypi.org/project/gnip-historical/", "project_urls": { "Download": "UNKNOWN", "Homepage": "http://pypi.python.org/pypi/gnip-historical/" }, "release_url": "https://pypi.org/project/gnip-historical/0.4.0/", "requires_dist": null, "requires_python": null, "summary": "Gnip Historical libarary and command scripts.", "version": "0.4.0" }, "last_serial": 3452413, "releases": { "0.3.2": [ { "comment_text": "", "digests": { "md5": "2cf586be1f09b3323c84a8a700810f1c", "sha256": "885c2d7a52bb2f0752c50fd926f4597582ce2fa38fe6b28847827ada78851315" }, "downloads": -1, "filename": "gnip-historical-0.3.2.tar.gz", "has_sig": false, "md5_digest": "2cf586be1f09b3323c84a8a700810f1c", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 11004, "upload_time": "2014-06-16T23:20:04", "url": "https://files.pythonhosted.org/packages/5d/7a/462a1db73b02cac12311296631adff3300c7da752c255f06ae3e4cff0363/gnip-historical-0.3.2.tar.gz" } ], "0.4.0": [ { "comment_text": "", "digests": { "md5": "f26501edfbb2342e2d981234c4c5506c", "sha256": "869c279f6f90920a7679a6a9a0d93a39bf784562898ad95e0588ee46b6bcf263" }, "downloads": -1, "filename": "gnip-historical-0.4.0.tar.gz", "has_sig": false, "md5_digest": "f26501edfbb2342e2d981234c4c5506c", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 23751, "upload_time": "2014-09-07T14:13:20", "url": "https://files.pythonhosted.org/packages/4c/f0/27867529f2866e2fe15fa31cbbf4852fc60f2426e61347b4ce429a7f63c2/gnip-historical-0.4.0.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "f26501edfbb2342e2d981234c4c5506c", "sha256": "869c279f6f90920a7679a6a9a0d93a39bf784562898ad95e0588ee46b6bcf263" }, "downloads": -1, "filename": "gnip-historical-0.4.0.tar.gz", "has_sig": false, "md5_digest": "f26501edfbb2342e2d981234c4c5506c", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 23751, "upload_time": "2014-09-07T14:13:20", "url": "https://files.pythonhosted.org/packages/4c/f0/27867529f2866e2fe15fa31cbbf4852fc60f2426e61347b4ce429a7f63c2/gnip-historical-0.4.0.tar.gz" } ] }