{ "info": { "author": "Pew Research Center", "author_email": "info@pewresearch.org", "bugtrack_url": null, "classifiers": [ "Development Status :: 5 - Production/Stable", "Intended Audience :: Science/Research", "License :: OSI Approved :: MIT License", "Programming Language :: Python :: 2.7", "Topic :: Software Development :: Libraries :: Python Modules" ], "description": "# Search Sampler\n\nThis is a package for collecting and analyzing Google Health API data using a large, rolling sample, which can be beneficial when making precise calculations. Instead of taking just one sample of all data points, this package gives users the option of retrieving several samples for each data point, which can later be computed as a single data point. It is a modified version of the script researchers used to collect data for Pew Research Center's [report](http://www.journalism.org/essay/searching-for-news/) on the Flint water crisis, published on April 27, 2017. For more information on using this tool, see [this post](https://medium.com/pew-research-center-decoded/sharing-the-code-we-used-to-study-the-publics-interest-in-the-flint-water-crisis-66215382b194).\n\n## About the Report\n\nThis repository contains a generalized version of code used for collecting and analyzing data from the Google Health API for Pew Research Center's project, \"[Searching for News: The Flint Water Crisis](http://www.journalism.org/essay/searching-for-news/)\", published on April 27, 2017.\n\nThe project explored what aggregated search behavior can tell us about how news spreads and how public attention shifts in today's fractured information environment, using the water crisis in Flint, Michigan, as a case study.\n\nThe study delves into the kinds of searches that were most prevalent as a proxy for public interest, concerns and intentions about the crisis, and tracks the way search activity ebbed and flowed alongside real world events and their associated news coverage.\n\nResearchers collected the data via Google's Health API, to which the Center requested and gained special access for this project. For more information, read our [Medium post](https://medium.com/@pewresearch/using-google-trends-data-for-research-here-are-6-questions-to-ask-a7097f5fb526) on how we used Google Trends data to conduct our research. Note that this requires access to the Health API; to apply, click [here](https://docs.google.com/forms/d/e/1FAIpQLSdZbYbCeULxWAFHsMRgKQ6Q1aFvOwLauVF8kuk5W_HOTrSq2A/viewform?visit_id=1-636281495024829628-2992692443&rd=1).\n\n## Requirements\n\n- Python 2.7.x\n- See [requirements.txt](https://github.com/pewresearch/search_sampler/blob/master/requirements.txt) for required pip packages.\n\n## Installation\n\nInstall via pip:\n\n pip install search_sampler\n\n## Instructions\n\n**NOTE:** Use of this tool requires an API key from Google, with special access for the Health API. To request access, please contact the Google News Lab via this [form](https://docs.google.com/forms/d/e/1FAIpQLSdZbYbCeULxWAFHsMRgKQ6Q1aFvOwLauVF8kuk5W_HOTrSq2A/viewform?visit_id=1-636281495024829628-2992692443&rd=1).\n\n### Initialization\n\nTo use this tool, initialize the class with the API Key and a set of search parameters, which include the search term, region, start and end of the search period, and the unit of time to search for (day, week, month). Every search also requires a name (search_name), which is used as a suffix to output files. Using the same search_name multiple times can let you concatenate new results to existing output when you call the save function.\n\nSearch parameters should be passed as a dictionary. For example:\n\n apikey = ''\n\n output_path = '' # Folder name in your current directory to save results. This will be created.\n\n # search params\n params = {\n # Can be any number of search terms, using boolean logic. See report methodology for more info.\n 'search_term':['cough'],\n\n # Can be country, state, or DMA. States are US-CA. DMA are a 3 digit code; see Nielsen for info.\n 'region':'US-DC',\n\n # Must be in format YYYY-MM-DD\n 'period_start':'2014-01-01',\n 'period_end':'2014-02-15',\n\n # Options are day, week, month. WARNING: This has been extensively tested with week only.\n 'period_length':'week'\n }\n\n sample = SearchSampler(apikey, search_name, params, output_path=output_path)\n\n### Getting Data\n\nThis package provides either a single sample of data or a set of rolling window samples (see [Medium post](https://medium.com/@pewresearch/using-google-trends-data-for-research-here-are-6-questions-to-ask-a7097f5fb526) for details).\n\nTo retrieve a single sample:\n\n df_results = sample.pull_data_from_api()\n\nTo retrieve a rolling set of samples:\n\n df_results = sample.pull_rolling_window(num_samples=num_samples)\n\n### Saving Results\n\nTo save results, run the built-in save command:\n\n sample.save_file(df_results)\n\nSearchSampler also allows you to run the same search multiple times. When done on different days, the Health API returns a slightly different sample, giving you more observations and increasing your analytical power (see this [Medium post](https://medium.com/@pewresearch/using-google-trends-data-for-research-here-are-6-questions-to-ask-a7097f5fb526) for more information). These new results can then be appended to any previously saved results by adding the append parameter to save\\_file. If append is not set to True, existing results will be overwritten.\n\n sample.save_file(df_results, append=True)\n\n### Output\n\nThe results are saved in a CSV format in the folder in the output path/region specified. The file name reflects the region and the specified search name. For example, if the output path is 'data', the region is 'US-CA', and the search name is 'flu', the file will be found in 'data/US-CA/US-CA-flu.csv.' This file can be opened by spreadsheet programs like Microsoft Excel and a range of statistical and computational tools. Note that if opened in Excel, the date fields may not be recognized, but this should not be a problem in statistical or computational tools, such as R or Python's pandas. Fields in the output file are:\n\n- **query_time**: time query was run\n- **sample**: the number of this individual sample. Zero-indexed.\n- **term**: the list of terms searched on\n- **timestamp**: the specific period being searched\n- **value**: the value from the Health API\n\n## Methodological Note\n\nThis project, the first foray by the Center into the Google Health API, was as much an exploration of how analyses of search data can shed light on the public's response to news and events as it was a study of the Flint water crisis. The detailed [methodology](http://www.journalism.org/2017/04/27/google-flint-methodology/) is an effort to openly share what we learned through this process.\n\n## Acknowledgments\n\nThis report was made possible by The Pew Charitable Trusts. Pew Research Center is a subsidiary of The Pew Charitable Trusts, its primary funder. This report is a collaborative effort based on the input and analysis of [a number of individuals and experts at Pew Research Center](http://www.journalism.org/2017/04/27/google-flint-acknowledgments/). Google's data experts provided valuable input during the course of the project, from assistance in understanding the structure of the data to consultation on methodological decisions. While the analysis was guided by our consultations with the advisers, Pew Research Center is solely responsible for the interpretation and reporting of the data.\n\n## Use Policy\n\nIn addition to the [license](https://github.com/pewresearch/search_sampler/blob/master/LICENSE), Users must abide by the following conditions:\n\n- User may not use the Center's logo\n- User may not use the Center's name in any advertising, marketing or promotional materials.\n- User may not use the licensed materials in any manner that implies, suggests, or could otherwise be perceived as attributing a particular policy or lobbying objective or opinion to the Center, or as a Center endorsement of a cause, candidate, issue, party, product, business, organization, religion or viewpoint.\n\n### Recommended Report Citation\n\nPew Research Center, April, 2017, \"Searching for News: The Flint Water Crisis\"\n\n### Recommended Package Citation\n\nPew Research Center, September 2018, \"Search Sampler\" Available at: github.com/pewresearch/search_sampler\n\n### Related Pew Research Center Publications\n\n- September 13, 2018 \"[Sharing the code we used to study the public's interest in the Flint water\u00a0crisis](https://medium.com/pew-research-center-decoded/sharing-the-code-we-used-to-study-the-publics-interest-in-the-flint-water-crisis-66215382b194)\"\n\n- April 27, 2017 \"[Searching for News: The Flint Water Crisis](http://www.journalism.org/essay/searching-for-news/)\"\n\n- April 27, 2017 \"[Using Google Trends data for research? Here are 6 questions to ask](https://medium.com/@pewresearch/using-google-trends-data-for-research-here-are-6-questions-to-ask-a7097f5fb526)\"\n\n- April 27, 2017 \"[Q&A: Using Google search data to study public interest in the Flint water crisis](http://www.pewresearch.org/fact-tank/2017/04/27/flint-water-crisis-study-qa/)\"\n\n## Issues and Pull Requests\n\nThis code is provided as-is for use in your own projects. You are free to submit issues and pull requests with any questions or suggestions you may have. We will do our best to respond within a 30-day time period.\n\n# About Pew Research Center\n\nPew Research Center is a nonpartisan fact tank that informs the public about the issues, attitudes and trends shaping the world. It does not take policy positions. The Center conducts public opinion polling, demographic research, content analysis and other data-driven social science research. It studies U.S. politics and policy; journalism and media; internet, science and technology; religion and public life; Hispanic trends; global attitudes and trends; and U.S. social and demographic trends. All of the Center's reports are available at [www.pewresearch.org](http://www.pewresearch.org). Pew Research Center is a subsidiary of The Pew Charitable Trusts, its primary funder.\n\n## Contact\n\nFor all inquiries, please email info@pewresearch.org. Please be sure to specify your deadline, and we will get back to you as soon as possible. This email account is monitored regularly by Pew Research Center Communications staff.\n\n\n", "description_content_type": "text/markdown", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/pewresearch/search_sampler", "keywords": "google,sampling,trends", "license": "MIT", "maintainer": "", "maintainer_email": "", "name": "search-sampler", "package_url": "https://pypi.org/project/search-sampler/", "platform": "", "project_url": "https://pypi.org/project/search-sampler/", "project_urls": { "Homepage": "https://github.com/pewresearch/search_sampler" }, "release_url": "https://pypi.org/project/search-sampler/1.0.1/", "requires_dist": [ "google-api-python-client (==1.6.5)", "pandas (>=0.2.0)" ], "requires_python": "", "summary": "", "version": "1.0.1" }, "last_serial": 4269103, "releases": { "1.0": [ { "comment_text": "", "digests": { "md5": "52e30b2d9e83cb3a571e3ca11438e063", "sha256": "b9799dfb1bd62ddc39409a0f4e9f64c7df579601de33022e09801a75f4ca94bd" }, "downloads": -1, "filename": "search_sampler-1.0.tar.gz", "has_sig": false, "md5_digest": "52e30b2d9e83cb3a571e3ca11438e063", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 14526, "upload_time": "2018-09-12T19:13:40", "url": "https://files.pythonhosted.org/packages/fa/8e/7fd6eb9203e214b24918ff36b929099f12d07ec9bfd73e88c8b04beb7eb3/search_sampler-1.0.tar.gz" } ], "1.0.1": [ { "comment_text": "", "digests": { "md5": "a395c9a2f31166cb63c9bde9df437fa7", "sha256": "9e329611309bc760d6b69ff0bb43e66945d71ba14755ecefbf52510390fc8df7" }, "downloads": -1, "filename": "search_sampler-1.0.1-py3-none-any.whl", "has_sig": false, "md5_digest": "a395c9a2f31166cb63c9bde9df437fa7", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 11569, "upload_time": "2018-09-13T15:12:05", "url": "https://files.pythonhosted.org/packages/a6/26/b216895a753c6b45e55c45f0df1117cceb415e46259cce56f5f2b8d5a4d7/search_sampler-1.0.1-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "7e9d023fb578927ac6d5ca4bbea0f118", "sha256": "4e7c6bf1f986b2ec4042f0427cb027c9c51ff966415f420f016ac72ca464bd8f" }, "downloads": -1, "filename": "search_sampler-1.0.1.tar.gz", "has_sig": false, "md5_digest": "7e9d023fb578927ac6d5ca4bbea0f118", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 14675, "upload_time": "2018-09-13T15:12:06", "url": "https://files.pythonhosted.org/packages/61/85/a5c1ed0e7e3183054ab51ad4576802a12febc2e1514a609ab1bf71116169/search_sampler-1.0.1.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "a395c9a2f31166cb63c9bde9df437fa7", "sha256": "9e329611309bc760d6b69ff0bb43e66945d71ba14755ecefbf52510390fc8df7" }, "downloads": -1, "filename": "search_sampler-1.0.1-py3-none-any.whl", "has_sig": false, "md5_digest": "a395c9a2f31166cb63c9bde9df437fa7", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 11569, "upload_time": "2018-09-13T15:12:05", "url": "https://files.pythonhosted.org/packages/a6/26/b216895a753c6b45e55c45f0df1117cceb415e46259cce56f5f2b8d5a4d7/search_sampler-1.0.1-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "7e9d023fb578927ac6d5ca4bbea0f118", "sha256": "4e7c6bf1f986b2ec4042f0427cb027c9c51ff966415f420f016ac72ca464bd8f" }, "downloads": -1, "filename": "search_sampler-1.0.1.tar.gz", "has_sig": false, "md5_digest": "7e9d023fb578927ac6d5ca4bbea0f118", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 14675, "upload_time": "2018-09-13T15:12:06", "url": "https://files.pythonhosted.org/packages/61/85/a5c1ed0e7e3183054ab51ad4576802a12febc2e1514a609ab1bf71116169/search_sampler-1.0.1.tar.gz" } ] }