{ "info": { "author": "Chris A. Lindgren", "author_email": "chris.a.lindgren@gmail.com", "bugtrack_url": null, "classifiers": [], "description": "# [Narrator](https://pypi.org/project/narrator/)\nby Chris Lindgren \nDistributed under the BSD 3-clause license. See LICENSE.txt or http://opensource.org/licenses/BSD-3-Clause for details.\n\n## Overview\n\nA set of functions that process and create descriptive summary visualizations to help develop a broader narrative through-line of one's tweet data.\n\nIt functions only with Python 3.x and is not backwards-compatible (although one could probably branch off a 2.x port with minimal effort).\n\n**Warning**: ```narrator``` performs very little custom error-handling, so make sure your inputs are formatted properly! If you have questions, please let me know via email.\n\n## System requirements\n\n* ast\n* matplot\n* pandas\n* numpy\n* emoji\n* re\n\n## Installation\n```pip install narrator```\n\n## Objects\n\n```narrator``` will initialize and use the following objects in future versions. It is currently not implemented yet. More to come here.\n\n* ```topperObject```: Object class with attributes that store desired top X samples from the corpus Object properties as follows:\n - ```.top_x_hashtags```:\n - ```.top_x_tweeters```:\n - ```.top_x_tweets```:\n - ```.top_x_topics```:\n - ```.top_x_urls```:\n - ```.top_x_rts```:\n - ```.period_dates```:\n\n## General Functions\n\n```narrator``` contains the following general functions:\n\n* ```initializeTO```: Initializes a topperObject().\n* ```date_range_writer```: Takes beginning date and end date to write a range of those dates per Day as a List\n - Args:\n - bd= String. Beginning date in YYYY-MM-DD format\n - ed= String. Ending date in YYYY-MM-DD format\n - Returns List of arrow date objects for whatever needs.\n* ```period_writer```: Accepts list of lists of period date information and returns a Dict of per Period dates for temporal analyses.\n - Args:\n - periodObj: Optional first argument periodObject, Default is None\n - 'ranges': Hierarchical list in following structure:
\n                ranges = [\n                    ['p1', ['2018-01-01', '2018-03-30']],\n                    ['p2', ['2018-04-01', '2018-06-12']],\n                    ...\n                ]
\n - Returns Dict of period dates per Day as Lists: { 'p1': ['2018-01-01', '2018-01-02', ...] } \n\n## Summarizer Functions\n\n* ```summarizer```: Counts a column variable of interest and returns a sample data set based on set parameters. There are 5 search options from which to choose. See the the 'main_sum_option' list below.\n - Args:\n - **Required Options**:\n - main_sum_option= String. Current options for sampling include the following:\n - 'sum_all_col': Sum of all the passed variable across entire corpus\n - 'sum_group_col': Sum of a group of the passed variables (List) across entire corpus\n - 'sum_single_col': Sum of a single isolated variables value (String) across entire corpus\n - 'single_term_per_day': Sum of single variable per Day in provided range\n - 'grouped_terms_perday': Sum of group of a type of variable per Day in provided range\n - column_type= String. Provides the type of summary to conduct.\n - 'hashtags': Searches for hashtags\n - 'urls': Searches for URLs\n - 'other': Searches for another type of content\n - df_corpus= DataFrame of tweet corpus\n - primary_col= String. Name of the primary targeted DataFrame column of interest, \n e.g., hashtags, urls, etc.\n - sort_check= Boolean. If True, sort sums per day.\n - sort_date_check= Boolean. If True, sort by dates.\n - sort_type= Boolean. If True, descending order. If False, ascending order.\n - **Conditional Options**: Based on the 'main_sum_option', these will vary in use and assignment.\n - group_search_option= String. Use to choose what search options to use for 'group_col_per_day'. \n - 'single_col': Searches for search terms in the single pertinent column\n - 'keywords_and_col': Searches for a column variable and accompanying\n keywords in another content column, such as 'tweets'. For example, you search for someone's \n name in the corpus that isn't always represented as a hashtag.\n - simple_list= List of terms to isolate.\n - keyed_list= List of Dicts. A keyed list of keywords of which you search within the secondary_col.\n - secondary_col= String. Name of the secondary targeted DataFrame column of interest, \n if needed, e.g., tweets, usernames, etc.\n - single_term= String of single term to isolate.\n - time_agg_type= If sum by group temporally, define its temporal aggregation:\n - 'day': Aggregate time per Day\n - 'period': Aggregate time per period\n - date_col= String value of the DataFrame column name for the dates in xx-xx-xxxx format.\n - id_col= String value of the DataFrame column name for the unique ID.\n - grouped_output_type= String. Options for particular Dataframe output\n - consolidated= Each listed value in group is a column with its period values\n - spread= One column for each listed group value\n - Return: Depending on option, a sample as a List of Tuples or Dict of grouped samples\n* ```get_sample_size```: Helper function for summarizer functions. If sample=True,\n then sample sent here and returned to the summarizer for output.\n - Args:\n - sort_check= Boolean. If True, sort the corpus.\n - sort_date_check= Boolean. If True, sort corpus based on dates.\n - counted_list= List. Tallies from corpus.\n - ss= Integer of sample size to output.\n - sample_check= Boolean. If True, use ss value. If False, use full corpus.\n - Returns DataFrame to summarizer function.\n* ```grouper```: Takes default values in 'skeleton' Dict and hydrates them with sample List of Tuples\n - Args:\n - group_type= String. Current options include 'day' or 'period'\n - listed_tuples= List of Tuples from get_sample_size(). \n - Example structure is the following: ```[(('keyword', '01-27-2019'), 100), (...), ...]```\n - skeleton= Dict. Fully hydrated skeleton dict, wherein grouper() updates its default 0 Int values.\n - Returns Dict of updated values per keyword\n* ```skeletor```: Takes desired date range and list of keys to create a skeleton Dict before hydrating it with the sample values. Overall, this provides default 0 Int values for every keyword in the sample.\n - Args:\n - aggregate_level= String. Current options include:\n - 'day': per Day\n - 'period_day': Days per Period\n - 'period': per Period\n - date_range= \n - If 'day' aggregate level, a List of per Day dates ```['2018-01-01', '2018-01-02', ...]```\n - If 'period' aggregate level, a Dict of periods with respective date Lists: ```{{'1': ['2018-01-01', '2018-01-02', ...]}}```\n - keys= List of keys for hydrating the Dict\n - Returns full Dict 'skeleton' with default 0 Integer values for the grouper() function\n* ```whichPeriod```: Helper function for grouper(). Isolates what period a date is in for use.\n - Args: \n - period_dates= Dict of Lists per period\n - date= String. Date to lookup.\n - Returns String of period to grouper().\n* ```find_term```: Helper function for accumulator(). Searches for hashtag in tweet. If there, return True. If not, return False.\n - Args:\n - search= String. Term to search for.\n - text= String. Text to search.\n - Returns Boolean\n* ```grouped_dict_to_df```: Takes grouped Dict and outputs a DataFrame.\n - Args:\n - main_sum_option= String. Options for grouping into a Dataframe.\n - group_hash_temporal= Multiple groups of hashtags\n - grouped_output_type= Sring. oPtions for DF outputs\n - spread= Good for small multiples in D3.js \n - consolidated= Good for small multiples in matplot\n - time_agg_type= String. Options for type of temporal grouping.\n - period= Grouped by periods\n - group_dict= Hydrated Dict to convert to a DataFrame for visualization or output\n - Returns DataFrame for use with a plotter function or output as CSV\n* ```accumulator```: Helper function for summarizer function. Accumulates by simple lists and keyed lists.\n - Args:\n - checker= String. Options for accumulation:\n - simple: Takes values from simple_list and conducts a search on primary_col.\n - keyed: Takes values from keyed_list and conducts a search on secondary_col.\n - df_list= List. DataFrame passed as a list for traversing\n - check_list= List. List of terms to accrue and append\n - If simple, converted to List of each listed term.\n - If keyed, List of dicts, where each key is its accompanying primary_col term.\n - Returns a hydrated list of Tuples with each primary term and its accompanying date.\n\n## Plotter Functions\n\n* ```bar_plotter```: Plot the desired sum of your column sums as a bar chart\n - Args:\n - ax=None # Resets the chart\n - counter = List of tuples returned from match_maker(),\n - path = String of desired path to directory,\n - output = String value of desired file name (.png)\n - Returns: Nothing, but outputs a matplot figure in your Jupyter Notebook and .png file.\n* ```multiline_plotter```: Plots and saves a small-multiples line chart from a returned DataFrame from the summarizer function that used the 'spread' output option\n - Modified src: https://python-graph-gallery.com/125-small-multiples-for-line-chart/\n - Args:\n - style= String. See matplot docs for options available, e.g. 'seaborn-darkgrid' \n - pallette= String. See matplot docs for options available, e.g. 'Set1'\n - graph_option= String. Options for sampling will include all of the the following, but for now only 'group_var_per_period':\n - 'single_var_per_day': Sum of single variable per Day in provided range\n - 'group_var_per_day': Sum of group of variable per Day in provided range\n - 'single_var_per_period': Sum of single variable per Period\n - 'group_var_per_period': Sum of group of variable per Period\n - df= DataFrame of data set to be visualized\n - x_col= DataFrame column for x-axis\n - multi_x= Integer for number of graphs along x/rows\n - multi_y= Integer for number of graphs along y/columns\n - NOTE: Only supports 3x3 right now.\n - linewidth= Float. Line width level.\n - alpha= Float (0-1). Opacity level of lines\n - chart_title= String. Title for the overall chart\n - x_title= String. Label for x axis\n - y_title= String. Label for y axis\n - path= String. Path to save figure\n - output= String. Filename for figure.\n - Returns nothing, but plots a 'small multiples' series of charts\n\n## Example Uses\n\n### Create a Dictionary of period dates\n\n```python\nranges = [\n ('1', ['2018-01-01', '2018-03-30']),\n ('2', ['2018-04-01', '2018-06-12']),\n ('3', ['2018-06-13', '2018-07-28']),\n ('4', ['2018-07-29', '2018-10-17']),\n ('5', ['2018-10-18', '2018-11-24']),\n ('6', ['2018-11-25', '2018-12-10']),\n ('7', ['2018-12-11', '2018-12-19']),\n ('8', ['2018-12-20', '2018-12-25']),\n ('9', ['2018-12-26', '2019-02-13']),\n ('10', ['2019-02-14', '2019-02-28'])\n]\n\nperiod_dates = narrator.period_dates_writer(ranges=ranges)\nperiod_dates['1'][:5]\n\n## Output ##\n['2018-01-01', '2018-01-02', '2018-01-03', '2018-01-04', '2018-01-05']\n```\n\n### Use the ```hashtag_summarizer``` to generate multiple types of summary outputs\n\nThe below examples takes a group of hashtags, searches for them based on the period dates, then outputsthese groupings in descending order. In this case, it can also use a keyword list and hashtag list as 2 forms of input to inform the search across the corpus.\n\n```python\n# 1. Create and assign listed values. If a search term has\n# multiple variations, create a list of dictionaries and pass\n# it to the summarizer() function as a \"keyword_list\".\nliberal_keyword_list = [ \n {\n '#felipegomez': ['felipe alonzo-gomez', 'felipe gomez']\n },\n {\n '#maquin': ['jakelin caal', 'maqu\u00edn', 'maquin' ]\n }\n]\nliberal_hashtag_list = [\n '#familyseparation', '#familiesbelongtogether',\n '#felipegomez', '#keepfamiliestogether',\n '#maquin', '#noborderwall', '#shutdownstories',\n '#trumpshutdown', '#wherearethechildren'\n]\n\n# 2. Create Dict \"skeleton\" with above listed search values\n# This dict is passed as the \"skeleton\" parameter in the \n# summarizer function\ndict_group_skel = narrator.skeletor(\n aggregate_level='period',\n date_range=period_dates,\n keys=liberal_hashtag_list\n)\n\n# 3. Fill out the search parameters to return a hydrated\n# pandas DataFrame.\ndf_sum = summarizer(\n # Required options\n column_type='hashtags',\n primary_col='hashtags',\n main_sum_option='grouped_terms_perday',\n df_corpus=df_all,\n sort_check=True, # Sort per day\n sort_date_check=False, #Do not sort by date\n sort_type=True, # Ascending (F) or descending (T)?\n # Conditional options\n group_search_option='keywords_and_col',\n simple_list=liberal_hashtag_list, # List of terms\n keyed_list=liberal_keyword_list, # List of alternative terms\n secondary_col='tweet',\n date_col='date',\n id_col='id',\n sample_check=False, # Use custom sample size (True or False)\n sample_size=None, # Custom sample size (Int or None)\n skeleton=dict_group_skel,\n time_agg_type='period',\n period_dates=period_dates,\n grouped_output_type='spread' #spread or consolidated\n)\n```\n\nOutput from above code:\n\n\n### Plot a \"Small Multiples\" Line Chart\n\n```python\nimport colorcet as cc\n\nnarrator.multiline_plotter(\n style='tableau-colorblind10',\n palette=cc.cm.glasbey_dark,\n graph_option='group_var_per_period',\n df=ht_df_sum,\n x_col='period',\n multi_x=3,\n multi_y=3,\n linewidth=1.9,\n alpha=0.9,\n chart_title='Liberal hashtag sums per period',\n x_title='Periods',\n y_title='# of Hashtags',\n path='figures',\n output='test_multi.png'\n)\n```\nOutput:\n\n\n\n## Distribution update terminal commands\n\n
\n# Create new distribution of code for archiving\nsudo python3 setup.py sdist bdist_wheel\n\n# Distribute to Python Package Index\npython3 -m twine upload --repository-url https://upload.pypi.org/legacy/ dist/*\n
\n\n", "description_content_type": "text/markdown", "docs_url": null, "download_url": "https://github.com/lingeringcode/narrator/", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/lingeringcode/narrator/", "keywords": "data processing,descriptive statistics,data narratives,temporal charts", "license": "", "maintainer": "", "maintainer_email": "", "name": "narrator", "package_url": "https://pypi.org/project/narrator/", "platform": "", "project_url": "https://pypi.org/project/narrator/", "project_urls": { "Download": "https://github.com/lingeringcode/narrator/", "Homepage": "https://github.com/lingeringcode/narrator/" }, "release_url": "https://pypi.org/project/narrator/0.0.0.4/", "requires_dist": [ "pandas", "numpy", "emoji", "nltk", "matplot" ], "requires_python": "", "summary": "A set of functions that process and create descriptive summary visualizations to help develop a broader narrative through-line of tweet data.", "version": "0.0.0.4", "yanked": false, "yanked_reason": null }, "last_serial": 6049415, "releases": { "0.0.0.2": [ { "comment_text": "", "digests": { "md5": "a0331aa44aaf7de5bb8a328c3ce4e9f0", "sha256": "9c17714d261debd358db5de661b67525e106e437a10a8fb85db51b1744ad3aed" }, "downloads": -1, "filename": "narrator-0.0.0.2-py3-none-any.whl", "has_sig": false, "md5_digest": "a0331aa44aaf7de5bb8a328c3ce4e9f0", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 15191, "upload_time": "2019-10-25T00:32:21", "upload_time_iso_8601": "2019-10-25T00:32:21.712730Z", "url": "https://files.pythonhosted.org/packages/b3/ec/b8cf539e870c11aa6ef665e535c70abecb29092b770b31f23372550de8fb/narrator-0.0.0.2-py3-none-any.whl", "yanked": false, "yanked_reason": null }, { "comment_text": "", "digests": { "md5": "6288f41c88e2aef3656a8a309092d65a", "sha256": "21e492dcf167cad21bc1f5c21e63b1d11a3c44be7f4ebac00cff02bcfa62594e" }, "downloads": -1, "filename": "narrator-0.0.0.2.tar.gz", "has_sig": false, "md5_digest": "6288f41c88e2aef3656a8a309092d65a", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 16220, "upload_time": "2019-10-25T00:32:23", "upload_time_iso_8601": "2019-10-25T00:32:23.756045Z", "url": "https://files.pythonhosted.org/packages/d5/3a/0e1f19dd3826d117a61dacaac54a46b2c653c5d4f7112b0ab766e8106928/narrator-0.0.0.2.tar.gz", "yanked": false, "yanked_reason": null } ], "0.0.0.3": [ { "comment_text": "", "digests": { "md5": "8be743dc60a94e4b7a05ad54faf4ea30", "sha256": "65b09932d0901db221c4fd9bbb4429e202db7579438216cfe3164241f3940421" }, "downloads": -1, "filename": "narrator-0.0.0.3-py3-none-any.whl", "has_sig": false, "md5_digest": "8be743dc60a94e4b7a05ad54faf4ea30", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 16330, "upload_time": "2019-10-29T20:28:04", "upload_time_iso_8601": "2019-10-29T20:28:04.681995Z", "url": "https://files.pythonhosted.org/packages/33/f3/9f0eb1a2a039a05e1c918e6e2ddcd3d33cfc5b531449c0e4617da1a0ac75/narrator-0.0.0.3-py3-none-any.whl", "yanked": false, "yanked_reason": null }, { "comment_text": "", "digests": { "md5": "e3a50ca1b41dfe61436ac06dd4d7a8a1", "sha256": "9b61eb4d3483f34709aff856c924fe20c992978b0c24a19a9e320e5cce2dd3a2" }, "downloads": -1, "filename": "narrator-0.0.0.3.tar.gz", "has_sig": false, "md5_digest": "e3a50ca1b41dfe61436ac06dd4d7a8a1", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 17600, "upload_time": "2019-10-29T20:28:06", "upload_time_iso_8601": "2019-10-29T20:28:06.707616Z", "url": "https://files.pythonhosted.org/packages/08/53/dc08331c49a878478c3de81915b82b166f667ac8887b72b91993139e7f70/narrator-0.0.0.3.tar.gz", "yanked": false, "yanked_reason": null } ], "0.0.0.4": [ { "comment_text": "", "digests": { "md5": "2fbe2c2bbfb30f499648a013de062bd2", "sha256": "6685b7950811f66543ff5982f4729b5ecde8d7f353575cd6764879dac34e35d4" }, "downloads": -1, "filename": "narrator-0.0.0.4-py3-none-any.whl", "has_sig": false, "md5_digest": "2fbe2c2bbfb30f499648a013de062bd2", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 16335, "upload_time": "2019-10-29T21:33:01", "upload_time_iso_8601": "2019-10-29T21:33:01.956437Z", "url": "https://files.pythonhosted.org/packages/63/91/fe4d6161e7fb42a8fe8f4ee101696efd366620fe5c422447f52a57b123eb/narrator-0.0.0.4-py3-none-any.whl", "yanked": false, "yanked_reason": null }, { "comment_text": "", "digests": { "md5": "7323f9f569e2a1d2ff1ae50bab902aca", "sha256": "0f6a695ee2cee7f2f8061e7df9e252c6e3accec27fabad8b0348237be25f5282" }, "downloads": -1, "filename": "narrator-0.0.0.4.tar.gz", "has_sig": false, "md5_digest": "7323f9f569e2a1d2ff1ae50bab902aca", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 17600, "upload_time": "2019-10-29T21:33:03", "upload_time_iso_8601": "2019-10-29T21:33:03.591668Z", "url": "https://files.pythonhosted.org/packages/6b/25/ab1f8db6e964c2c34aba2457351a50e1c0b26d3ced3437d3c894e8e01bbf/narrator-0.0.0.4.tar.gz", "yanked": false, "yanked_reason": null } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "2fbe2c2bbfb30f499648a013de062bd2", "sha256": "6685b7950811f66543ff5982f4729b5ecde8d7f353575cd6764879dac34e35d4" }, "downloads": -1, "filename": "narrator-0.0.0.4-py3-none-any.whl", "has_sig": false, "md5_digest": "2fbe2c2bbfb30f499648a013de062bd2", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 16335, "upload_time": "2019-10-29T21:33:01", "upload_time_iso_8601": "2019-10-29T21:33:01.956437Z", "url": "https://files.pythonhosted.org/packages/63/91/fe4d6161e7fb42a8fe8f4ee101696efd366620fe5c422447f52a57b123eb/narrator-0.0.0.4-py3-none-any.whl", "yanked": false, "yanked_reason": null }, { "comment_text": "", "digests": { "md5": "7323f9f569e2a1d2ff1ae50bab902aca", "sha256": "0f6a695ee2cee7f2f8061e7df9e252c6e3accec27fabad8b0348237be25f5282" }, "downloads": -1, "filename": "narrator-0.0.0.4.tar.gz", "has_sig": false, "md5_digest": "7323f9f569e2a1d2ff1ae50bab902aca", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 17600, "upload_time": "2019-10-29T21:33:03", "upload_time_iso_8601": "2019-10-29T21:33:03.591668Z", "url": "https://files.pythonhosted.org/packages/6b/25/ab1f8db6e964c2c34aba2457351a50e1c0b26d3ced3437d3c894e8e01bbf/narrator-0.0.0.4.tar.gz", "yanked": false, "yanked_reason": null } ], "vulnerabilities": [] }