{ "info": { "author": "David Marx", "author_email": "david.marx84@gmail.com", "bugtrack_url": null, "classifiers": [ "Development Status :: 3 - Alpha", "Environment :: Console", "Intended Audience :: Developers", "License :: OSI Approved :: BSD License", "Natural Language :: English", "Operating System :: OS Independent", "Programming Language :: Python", "Programming Language :: Python :: 3", "Programming Language :: Python :: 3.4", "Programming Language :: Python :: 3.5", "Programming Language :: Python :: 3.6", "Programming Language :: Python :: 3 :: Only", "Programming Language :: Python :: Implementation :: CPython", "Topic :: Utilities" ], "description": "Python Pushshift.io API Wrapper (for comment/submission search)\n===============================================================\n\n.. _installation:\n\nInstallation\n------------\n\n.. code-block:: bash\n\n pip install psaw\n\nAt present, only python 3 is supported.\n\nDescription\n-----------\n\nA minimalist wrapper for searching public reddit comments/submissions via the pushshift.io API.\n\nPushshift is an extremely useful resource, but the API is poorly documented. As such, this API wrapper\nis currently designed to make it easy to pass pretty much any search parameter the user wants to try.\n\nAlthough it is not necessarily reflective of the current status of the API, you should\nattempt to familiarize yourself with the Pushshift API documentation to better understand\nwhat search arguments are likely to work.\n\n* `API Documentation on github `_\n* `Endpoints and parameter descriptions `_\n* `/r/pushshift `_\n\n\nFeatures\n--------\n\n* Handles rate limiting and exponential backoff subject to maximum retries and\n maximum backoff limits. A minimum rate limit of 1 request per second is used\n as a default per consultation with Pushshift's maintainer,\n `/u/Stuck_in_the_matrix `_.\n* Handles paging of results. Returns all historical results for a given query by default.\n* Optionally handles incorporation of ``praw`` to fetch objects after getting ids from pushshift\n* If not using ``praw``, returns results in ``comment`` and ``submission`` objects whose\n API is similar to the corresponding ``praw`` objects. Additionally, result objects have\n an additional ``.d_`` attribute that offers dict access to the associated data attributes.\n* Optionally adds a ``created`` attribute which converts a comment/submission's ``created_utc``\n timestamp to the user's local time. (may raise exceptions for users with certain timezone\n settings).\n* Simple interface to pass query arguments to the API. The API is sparsely documented,\n so it's often fruitful to just try an argument and see if it works.\n* A ``stop_condition`` argument to make it simple to stop yielding results given arbitrary user-defined criteria\n\nWARNINGS\n--------\n\n* Using non-default sort may result in unexpected behavior.\n* Default behavior is to continuously hit the pushshift api. If a query is taking\n longer than expected to return results, it's possible that psaw is pulling more data\n than you may want or is caught in some kind of loop.\n* I strongly recommend prototyping queries by printing to stdout to ensure you're getting the\n desired behavior.\n\nDemo usage\n----------\n\n.. code-block:: python\n\n from psaw import PushshiftAPI\n\n api = PushshiftAPI()\n\nOr to use pushshift search to fetch ids and then use praw to fetch objects:\n\n.. code-block:: python\n\n import praw\n from psaw import PushshiftAPI\n\n r = praw.Reddit(...)\n api = PushshiftAPI(r)\n\n\n100 most recent submissions\n^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. code-block:: python\n\n # The `search_comments` and `search_submissions` methods return generator objects\n gen = api.search_submissions(limit=100)\n results = list(gen)\n\nFirst 10 submissions to /r/politics in 2017, filtering results to url/author/title/subreddit fields.\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nThe ``created_utc`` field will be added automatically (it's used for paging).\n\n.. code-block:: python\n\n import datetime as dt\n\n start_epoch=int(dt.datetime(2017, 1, 1).timestamp())\n\n list(api.search_submissions(after=start_epoch,\n subreddit='politics',\n filter=['url','author', 'title', 'subreddit'],\n limit=10))\n\nTrying a search argument that doesn't actually work\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nAccording to the pushshift.io API documentation, we should be able to search submissions by url,\nbut (at the time of this writing) this doesn't actually work in practice.\nThe API should still respect the ``limit`` argument and possibly other supported arguments,\nbut no guarantees. If you find that an argument you have passed is not supported by the API,\nbest thing is to just remove it from the query and modify your api call to only utilize\nsupported arguments to mitigate risks from of unexpected behavior.\n\n.. code-block:: python\n\n url = 'http://www.politico.com/story/2017/02/mike-flynn-russia-ties-investigation-235272'\n url_results = list(api.search_submissions(url=url, limit=500))\n\n len(url_results), any(r.url == url for r in url_results)\n # 500, False\n\nAll AskReddit comments containing the text \"OP\"\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nUse the ``q`` parameter to search text. Omitting the ``limit`` parameter does a full\nhistorical search. Requests are performed in batches of size specified by the\n``max_results_per_request`` parameter (default=500). Omitting the \"max_reponse_cache\"\ntest in the demo below will return all results. Otherwise, this demo will perform two\nAPI requests returning 500 comments each. Alternatively, the generator can be queried for additional results.\n\n.. code-block:: python\n\n gen = api.search_comments(q='OP', subreddit='askreddit')\n\n max_response_cache = 1000\n cache = []\n\n for c in gen:\n cache.append(c)\n\n # Omit this test to actually return all results. Wouldn't recommend it though: could take a while, but you do you.\n if len(cache) >= max_response_cache:\n break\n\n # If you really want to: pick up where we left off to get the rest of the results.\n if False:\n for c in gen:\n cache.append(c)\n\nUsing the ``aggs`` argument to summarize search results\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nWhen an aggs parameter is provided to a search method, the first result yielded by the generator\nwill contain the aggs result.\n\n.. code-block:: python\n\n api = PushshiftAPI()\n gen = api.search_comments(author='nasa', aggs='subreddit')\n next(gen)\n # {'subreddit': [\n # {'doc_count': 300, 'key': 'IAmA'},\n # {'doc_count': 6, 'key': 'space'},\n # {'doc_count': 1, 'key': 'ExposurePorn'},\n # {'doc_count': 1, 'key': 'Mars'},\n # {'doc_count': 1, 'key': 'OldSchoolCool'},\n # {'doc_count': 1, 'key': 'news'},\n # {'doc_count': 1, 'key': 'pics'},\n # {'doc_count': 1, 'key': 'reddit.com'}]}\n len(list(gen)) # 312\n\nUsing the ``redditor_subreddit_activity`` convenience method\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\nIf you want to profile a redditors activity as in the ``aggs`` example, the\n``redditor_subreddit_activity`` provides a simple shorthand for profiling a user by the subreddits\nin which they are active, counting comments and submissions separately in a single call,\nand returning Counter objects for commenting and posting activity, respectively.\n\n api = PushshiftAPI()\n result = api.redditor_subreddit_activity('nasa')\n result\n #{'comment':\n # Counter({\n # 'ExposurePorn': 1,\n # 'IAmA': 300,\n # 'Mars': 1,\n # 'OldSchoolCool': 1,\n # 'news': 1,\n # 'pics': 1,\n # 'reddit.com': 1,\n # 'space': 6}),\n # 'submission':\n # Counter({\n # 'IAmA': 3,\n # 'ISS': 1,\n # 'Mars': 1,\n # 'space': 3,\n # 'u_nasa': 86})}\n\nUsing the ``stop_condition`` argument to get the most recent submission by a bot account\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n.. code-block:: python\n\n gen = api.search_submissions(stop_condition=lambda x: 'bot' in x.author)\n\n for subm in gen:\n pass\n\n print(subm.author)\n\n\nLicense\n-------\n\nPSAW's source is provided under the `Simplified BSD License\n`_.\n\n* Copyright (c), 2018, David Marx\n\n\n", "description_content_type": "", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "http://github.com/dmarx/psaw", "keywords": "reddit api wrapper pushshift", "license": "Simplified BSD License", "maintainer": "", "maintainer_email": "", "name": "psaw", "package_url": "https://pypi.org/project/psaw/", "platform": "", "project_url": "https://pypi.org/project/psaw/", "project_urls": { "Homepage": "http://github.com/dmarx/psaw" }, "release_url": "https://pypi.org/project/psaw/0.0.7/", "requires_dist": [ "requests" ], "requires_python": ">=3", "summary": "Pushshift.io API Wrapper for reddit.com public comment/submission search", "version": "0.0.7" }, "last_serial": 4167756, "releases": { "0.0.1": [ { "comment_text": "", "digests": { "md5": "a459d8cc25dc8e885d4a6f29b72e04ee", "sha256": "4a1a07549b59f5ef88da1b2f4826bfe3cd91b0413fb6295744a9ab6006ab3166" }, "downloads": -1, "filename": "psaw-0.0.1-py3-none-any.whl", "has_sig": false, "md5_digest": "a459d8cc25dc8e885d4a6f29b72e04ee", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": ">=3", "size": 9351, "upload_time": "2018-04-15T03:14:53", "url": "https://files.pythonhosted.org/packages/c3/17/8cc9aca8e7b2cb5a9daf9d72de5209719e7f72b660a2872d3592e658981c/psaw-0.0.1-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "15342c95c92bdad3356cb14e3a3aa6ae", "sha256": "f56e94ef009cf7e86319b18ffb5233f79ad6f302c0d31dce7495e560a5270a03" }, "downloads": -1, "filename": "psaw-0.0.1.tar.gz", "has_sig": false, "md5_digest": "15342c95c92bdad3356cb14e3a3aa6ae", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3", "size": 5949, "upload_time": "2018-04-15T03:14:55", "url": "https://files.pythonhosted.org/packages/59/07/db0980b92edd8a1d84259fa98848be7f3bf325cfcd8827ff6d96549686fd/psaw-0.0.1.tar.gz" } ], "0.0.2": [ { "comment_text": "", "digests": { "md5": "afee2019191e7e280b45f212aedc077f", "sha256": "4a6ec6a854d773423895a839189937b538c875a37319a846677c871f8cb756f3" }, "downloads": -1, "filename": "psaw-0.0.2-py3-none-any.whl", "has_sig": false, "md5_digest": "afee2019191e7e280b45f212aedc077f", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": ">=3", "size": 9439, "upload_time": "2018-05-13T04:21:19", "url": "https://files.pythonhosted.org/packages/59/9e/927149c3e59a8d0aaa73d3aa0c3b68f5ebbb792dd6936dc62c75dd99d792/psaw-0.0.2-py3-none-any.whl" } ], "0.0.3": [ { "comment_text": "", "digests": { "md5": "a7d48c37e3158967d97ac5ef59d0370e", "sha256": "923d5c7f0ea3846f67f53a4864cfb7d20bb891c9c3afc710fadefc8587ee1c79" }, "downloads": -1, "filename": "psaw-0.0.3-py3-none-any.whl", "has_sig": false, "md5_digest": "a7d48c37e3158967d97ac5ef59d0370e", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": ">=3", "size": 10317, "upload_time": "2018-05-18T08:08:28", "url": "https://files.pythonhosted.org/packages/e9/81/665728901aa1cc0b3dc98c325074b3ffcefc7622a89676a9c3000d937904/psaw-0.0.3-py3-none-any.whl" } ], "0.0.4": [ { "comment_text": "", "digests": { "md5": "d87fb1b9014959d09b58ae6052ea82bd", "sha256": "7116815b6c5b40af85cb12b7038425ea6925a5b2ad30f05427848e6f8ed2a90f" }, "downloads": -1, "filename": "psaw-0.0.4-py3-none-any.whl", "has_sig": false, "md5_digest": "d87fb1b9014959d09b58ae6052ea82bd", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": ">=3", "size": 10753, "upload_time": "2018-05-18T08:34:27", "url": "https://files.pythonhosted.org/packages/09/21/0507e22ddaba307ef363b5d69b54c9da2320ca46c5cb3e12a3d815c7477f/psaw-0.0.4-py3-none-any.whl" } ], "0.0.5": [ { "comment_text": "", "digests": { "md5": "966c836e7e18ea091bda259215aeaa94", "sha256": "ae878073a1fed3304397bce6d242bacdae67eea2060f125169039cf3d03535b2" }, "downloads": -1, "filename": "psaw-0.0.5-py3-none-any.whl", "has_sig": false, "md5_digest": "966c836e7e18ea091bda259215aeaa94", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": ">=3", "size": 10836, "upload_time": "2018-08-06T07:06:20", "url": "https://files.pythonhosted.org/packages/ec/62/0e20f81d7199d5e418a2efe637e39174256adc79a8b11a36bee6956f464b/psaw-0.0.5-py3-none-any.whl" } ], "0.0.6": [ { "comment_text": "", "digests": { "md5": "34d7244e658263b179b5d3310da1a31a", "sha256": "63a2cf916fbe036069906c983495c42dbbf0d8e941c9af8a208be8f7c8523684" }, "downloads": -1, "filename": "psaw-0.0.6-py3-none-any.whl", "has_sig": false, "md5_digest": "34d7244e658263b179b5d3310da1a31a", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": ">=3", "size": 10800, "upload_time": "2018-08-06T09:23:59", "url": "https://files.pythonhosted.org/packages/56/5e/6620335cffda65387d6ec8dee091216565735f439c88d56e6a6490d02362/psaw-0.0.6-py3-none-any.whl" } ], "0.0.7": [ { "comment_text": "", "digests": { "md5": "85b8127701e8552dee7727d406a39b19", "sha256": "4c8c45a6f80e0a1f3436a1fd6685497f401248b8a119fe4a8531ad5fdb75244e" }, "downloads": -1, "filename": "psaw-0.0.7-py3-none-any.whl", "has_sig": false, "md5_digest": "85b8127701e8552dee7727d406a39b19", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": ">=3", "size": 11551, "upload_time": "2018-08-14T04:44:52", "url": "https://files.pythonhosted.org/packages/60/b7/6724defc12bdcc45470e2b1fc1b978367f3d183ec6c6baa2770a0b083fc7/psaw-0.0.7-py3-none-any.whl" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "85b8127701e8552dee7727d406a39b19", "sha256": "4c8c45a6f80e0a1f3436a1fd6685497f401248b8a119fe4a8531ad5fdb75244e" }, "downloads": -1, "filename": "psaw-0.0.7-py3-none-any.whl", "has_sig": false, "md5_digest": "85b8127701e8552dee7727d406a39b19", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": ">=3", "size": 11551, "upload_time": "2018-08-14T04:44:52", "url": "https://files.pythonhosted.org/packages/60/b7/6724defc12bdcc45470e2b1fc1b978367f3d183ec6c6baa2770a0b083fc7/psaw-0.0.7-py3-none-any.whl" } ] }