{ "info": { "author": "Jai Juneja", "author_email": "jai.juneja@gmail.com", "bugtrack_url": null, "classifiers": [ "Development Status :: 3 - Alpha", "Intended Audience :: Developers", "License :: OSI Approved :: BSD License", "Operating System :: OS Independent", "Programming Language :: Python", "Topic :: Scientific/Engineering :: Artificial Intelligence" ], "description": "Reddit NLP Package |Build Status|\n=================================\n\nA lightweight Python module that performs tokenization and processing of\ntext on Reddit. It allows you to analyze users, titles, comments and\nsubreddits to understand their vocabulary. The module comes packaged\nwith its own inverted index builder for storing vocabularies and word\nfrequencies, such that you can generate and manipulate large corpora of\ntf-idf weighted words without worrying about implementation. This is\nespecially useful if you're running scripts over long periods and wish\nto save intermediary results.\n\nLicense\n-------\n\nCopyright 2014 Jai Juneja.\n\nThis program is free software: you can redistribute it and/or modify it\nunder the terms of the GNU General Public License as published by the\nFree Software Foundation, either version 3 of the License, or (at your\noption) any later version.\n\nThis program is distributed in the hope that it will be useful, but\nWITHOUT ANY WARRANTY; without even the implied warranty of\nMERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General\nPublic License for more details.\n\nYou should have received a copy of the GNU General Public License along\nwith this program. If not, see http://www.gnu.org/licenses/.\n\nInstallation\n------------\n\nUsing pip or easy\\_install\n~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nYou can download the latest release version using ``pip`` or\n``easy_install``:\n\n::\n\n pip install redditnlp\n\nLatest development version\n~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nYou can alternatively download the latest development version directly\nfrom GitHub:\n\n::\n\n git clone https://github.com/jaijuneja/reddit-nlp.git\n\nChange into the root directory:\n\n::\n\n cd reddit-nlp\n\nThen install the package:\n\n::\n\n python setup.py install\n\nError: the required version of setuptools is not available\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nUpon running ``pip install`` or the ``setup.py`` script you might get a\nmessage like this:\n\n::\n\n The required version of setuptools (>=0.7) is not available, and can't be installed while this script is running. Please install a more recent version first, using 'easy_install -U setuptools'.\n\nThis is appearing because you have a very outdated version of the\nsetuptools package. The redditnlp package typically bootstraps a newer\nversion of setuptools during install, but it isn't working in this case.\nYou need to update setuptools using ``easy_install -U setuptools`` (you\nmay need to apply ``sudo`` to this command).\n\nIf the above command doesn't do anything then it is likely that your\nversion of setuptools was installed using a package manager such yum,\napt or pip. Check your package manager for a package called\npython-setuptools or try ``pip install setuptools --upgrade`` and then\nre-run the install.\n\nUsage\n-----\n\nA more complex sample program using the redditnlp module can be found at\n``https://github.com/jaijuneja/reddit-nlp/blob/master/example.py``. Here\nwe outline a basic word counter application.\n\nThe module consists of three classes:\n\n- A basic word counter class, ``WordCounter``, which performs\n tokenization and counting on input strings\n- A Reddit word counter, ``RedditWordCounter``, which extends the\n ``WordCounter`` class to allow interaction with the Reddit API\n- A tf-idf corpus builder, which allows storing of large word corpora\n in an inverted index\n\nThese three classes can be instantiated as follows:\n\n.. code:: python\n\n from redditnlp import WordCounter, RedditWordCounter, TfidfCorpus\n\n word_counter = WordCounter()\n reddit_counter = RedditWordCounter('your_username')\n corpus = TfidfCorpus()\n\nTo adhere to the Reddit API rules, it is asked that you use your actual\nReddit username in place of ``'your_username'`` above.\n\nFor further information on the attributes and methods of these classes\nyou can run:\n\n.. code:: python\n\n help(WordCounter)\n help(RedditWordCounter)\n help(TfidfCorpus)\n\nNext, we can tokenize 1000 comments from a selection of subreddits,\nextract the most common words and save all of our data to disk:\n\n.. code:: python\n\n for subreddit in ['funny', 'aww', 'pics']:\n # Tokenize and count words for 1000 comments\n word_counts = counter.subreddit_comments(subreddit, limit=1000)\n \n # Add the word counts to our corpus\n corpus.add_document(word_counts, subreddit)\n\n # Save the corpus to a specified path (must be JSON)\n corpus.save(path='word_counts.json')\n\n # Save the top 50 words (by tf-idf score) from each subreddit to a text file\n for subreddit in corpus.get_document_list():\n top_words = corpus.get_top_terms(document, num_terms=50)\n with open('top_words.txt', 'ab') as f:\n f.write(document + '\\n' + '\\n'.join(top_words.keys()))\n\nMachine learning\n~~~~~~~~~~~~~~~~\n\n``redditnlp`` now supports some of scikit-learn's machine learning\ncapability. Several in-built functions enable the user to:\n\n- Convert a TfidfCorpus object into a scipy sparse feature matrix\n (using ``build_feature_matrix()``)\n- Train a classifier using the documents contained in a TfidfCorpus\n (with ``train_classifier()``) and thereafter classify new documents\n (with ``classify_document()``)\n\nBelow is an example of a simple machine learning application that loads\na corpus of subreddit comment data, uses it to train a classifier and\ndetermines which subreddit a user's comments most closely match:\n\n.. code:: python\n\n # Load the corpus of subreddit comment data and use it to train a classifier\n corpus = TfidfCorpus('path/to/subreddit_corpus.json')\n corpus.train_classifier(classifier_type='LinearSVC', tfidf=True)\n\n # Tokenize all of your comments\n counter = RedditWordCounter('your_username')\n user_comments = counter.user_comments('your_username')\n\n # Classify your comments against the documents in the corpus\n print corpus.classify_document(user_comments)\n\nMultiprocessing\n~~~~~~~~~~~~~~~\n\n``redditnlp`` uses the `PRAW `__\nReddit API wrapper. It supports multiprocessing, such that you can run\nmultiple instances of ``RedditWordCounter`` without exceeding Reddit's\nrate limit. There is more information about this in the `PRAW\ndocumentation `__\nbut for the sake of completeness an example is included below.\n\nFirst, you must initialise a request handling server on your local\nmachine. This is done using the terminal/command line:\n\n::\n\n praw-multiprocess\n\nNext, you can instantiate multiple ``RedditWordCounter`` objects and set\nthe parameter ``multiprocess=True`` so that outgoing API calls are\nhandled:\n\n::\n\n counter = RedditWordCounter('your_username', multiprocess=True)\n\nContact\n-------\n\nIf you have any questions or have encountered an error, feel free to\ncontact me at ``jai -dot- juneja -at- gmail -dot- com``.\n\n.. |Build Status| image:: https://travis-ci.org/jaijuneja/reddit-nlp.svg?branch=master\n :target: https://travis-ci.org/jaijuneja/reddit-nlp", "description_content_type": null, "docs_url": null, "download_url": "UNKNOWN", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/jaijuneja/reddit-nlp", "keywords": "reddit,natural language processing,machine learning", "license": "BSD", "maintainer": null, "maintainer_email": null, "name": "redditnlp", "package_url": "https://pypi.org/project/redditnlp/", "platform": "UNKNOWN", "project_url": "https://pypi.org/project/redditnlp/", "project_urls": { "Download": "UNKNOWN", "Homepage": "https://github.com/jaijuneja/reddit-nlp" }, "release_url": "https://pypi.org/project/redditnlp/0.1.3/", "requires_dist": null, "requires_python": null, "summary": "A tool to perform natural language processing of reddit content.", "version": "0.1.3" }, "last_serial": 1335176, "releases": { "0.1": [ { "comment_text": "", "digests": { "md5": "c5822a639984a59a48eebbb27141a4bd", "sha256": "da4c5e6a80f30b980a1e897776a99d9220982a135130b3e3ab47ef999d4b11fa" }, "downloads": -1, "filename": "redditnlp-0.1-py2.7.egg", "has_sig": false, "md5_digest": "c5822a639984a59a48eebbb27141a4bd", "packagetype": "bdist_egg", "python_version": "2.7", "requires_python": null, "size": 18700, "upload_time": "2014-11-28T14:28:43", "url": "https://files.pythonhosted.org/packages/50/b4/3fea7f6bfabff2e7902be5366d71cf7809c8b561f29a0879add18fa41259/redditnlp-0.1-py2.7.egg" }, { "comment_text": "", "digests": { "md5": "fd94a19de59b61839b5246d7c5652996", "sha256": "500d1d8836d68b541e599fd7ccf02a9e2abc2eaeb33f67bee5cac6d5246e8734" }, "downloads": -1, "filename": "redditnlp-0.1.tar.gz", "has_sig": false, "md5_digest": "fd94a19de59b61839b5246d7c5652996", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 9580, "upload_time": "2014-11-28T14:28:41", "url": "https://files.pythonhosted.org/packages/21/5e/ab91cf7c46b4fc61dd8ef47e8f71347a82650c3589dc973f84b13f3adb65/redditnlp-0.1.tar.gz" } ], "0.1.1": [ { "comment_text": "", "digests": { "md5": "0a6826b6a5fa6242dd0de7518810398a", "sha256": "3180fed4d84fdcaa91717f9c479de95305c3468953631ddc34dea3fde795efd8" }, "downloads": -1, "filename": "redditnlp-0.1.1-py2.7.egg", "has_sig": false, "md5_digest": "0a6826b6a5fa6242dd0de7518810398a", "packagetype": "bdist_egg", "python_version": "2.7", "requires_python": null, "size": 31382, "upload_time": "2014-11-30T23:35:06", "url": "https://files.pythonhosted.org/packages/42/f1/866bb01399c6eee9caeb08a1e45f0caa02d6323e6ef3b71b27dad45b5915/redditnlp-0.1.1-py2.7.egg" }, { "comment_text": "", "digests": { "md5": "e3f36ae0106b00fd3e8a3b02071940f7", "sha256": "67ab852bc76c23afb62d8bd16c69074ad3d8452b69d863181f3b37f4b558062e" }, "downloads": -1, "filename": "redditnlp-0.1.1.tar.gz", "has_sig": false, "md5_digest": "e3f36ae0106b00fd3e8a3b02071940f7", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 16847, "upload_time": "2014-11-30T23:34:59", "url": "https://files.pythonhosted.org/packages/b9/bc/c36ff21dadffd4cabf8ab754f7a266a2af3b7b92d01db2a1b12f01b2f834/redditnlp-0.1.1.tar.gz" } ], "0.1.3": [ { "comment_text": "", "digests": { "md5": "68e97b9ec8e0cf70490566030ee3033f", "sha256": "00c47cc02168530719f2254fb0004195d095102f28cf70344b5edcae19e69717" }, "downloads": -1, "filename": "redditnlp-0.1.3-py2.7.egg", "has_sig": false, "md5_digest": "68e97b9ec8e0cf70490566030ee3033f", "packagetype": "bdist_egg", "python_version": "2.7", "requires_python": null, "size": 34757, "upload_time": "2014-12-08T09:58:20", "url": "https://files.pythonhosted.org/packages/bc/7a/32c006debfe0b2d5789e3c9377b5c8b57497a07f67ad8e58168ff0f5e75a/redditnlp-0.1.3-py2.7.egg" }, { "comment_text": "", "digests": { "md5": "75a788bfe61b94cdba142c75a04a8945", "sha256": "87f7aec22eb6f5ccf1c9b31f0b5a1d6103d7cf4551e457f8f6a7feed3f47d1e1" }, "downloads": -1, "filename": "redditnlp-0.1.3.tar.gz", "has_sig": false, "md5_digest": "75a788bfe61b94cdba142c75a04a8945", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 20968, "upload_time": "2014-12-08T09:58:17", "url": "https://files.pythonhosted.org/packages/e3/33/da7cc1d7eda4296e4925dfe78c9582ad5b8e7ac84d2d141fcec605c99fab/redditnlp-0.1.3.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "68e97b9ec8e0cf70490566030ee3033f", "sha256": "00c47cc02168530719f2254fb0004195d095102f28cf70344b5edcae19e69717" }, "downloads": -1, "filename": "redditnlp-0.1.3-py2.7.egg", "has_sig": false, "md5_digest": "68e97b9ec8e0cf70490566030ee3033f", "packagetype": "bdist_egg", "python_version": "2.7", "requires_python": null, "size": 34757, "upload_time": "2014-12-08T09:58:20", "url": "https://files.pythonhosted.org/packages/bc/7a/32c006debfe0b2d5789e3c9377b5c8b57497a07f67ad8e58168ff0f5e75a/redditnlp-0.1.3-py2.7.egg" }, { "comment_text": "", "digests": { "md5": "75a788bfe61b94cdba142c75a04a8945", "sha256": "87f7aec22eb6f5ccf1c9b31f0b5a1d6103d7cf4551e457f8f6a7feed3f47d1e1" }, "downloads": -1, "filename": "redditnlp-0.1.3.tar.gz", "has_sig": false, "md5_digest": "75a788bfe61b94cdba142c75a04a8945", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 20968, "upload_time": "2014-12-08T09:58:17", "url": "https://files.pythonhosted.org/packages/e3/33/da7cc1d7eda4296e4925dfe78c9582ad5b8e7ac84d2d141fcec605c99fab/redditnlp-0.1.3.tar.gz" } ] }