{ "info": { "author": "The University of Iowa Internet Research Lab", "author_email": "john-cook@uiowa.edu", "bugtrack_url": null, "classifiers": [ "License :: OSI Approved :: MIT License", "Operating System :: OS Independent", "Programming Language :: Python :: 3" ], "description": "\n# IRL Utilities\n# Installation\n## git\n\n```\ngit clone https://github.com/uiowa-irl/uiowa-irl-utils.git\ncd uiowa-irl-utils\npip install --user .\n```\n\n## pypi\n\n```\npip install irlutils\n``` \n\nimport the module: \n\n```\nimport irlutils\n```\n\n\n# Documentation\n\n\n## file_utils\n```\nFUNCTIONS\n gen_find_files(**kwargs)\n returns filenames that matches the given pattern under() a given dir\n\n\n Kwargs:\n file_pattern (str): a regex style string . \n root (str): top level folder to begin search from. \n\n Yields:\n path (generator): matching path str\n\n Examples:\n gen_find_files(file_pattern=\"*.sql\", root=\"/mnt/data/).\n\n >>> gen_find_files(file_pattern=\"*.sql\", root=\"/mnt/data/).__next__()\n /mnt/data/first_folder/last_folder/file.sqlite\n\n Reference: \n [1] http://www.dabeaz.com/generators/\n\n json_flatten(y)\n flattens nested structures within a json file\n\n\n Kwargs:\n\n data (dict): data from nested dictionary\n kv (dict): dictionary containing key,value pairs. \n\n returns:\n\n kv (dict): a dictionary object containing flattened structures\n\n Examples:\n data = {'k1':{'kv1':['v1', 'v2'], 'kv2': 'v3'}}\n\n >>> json_flatten(data)\n {'k1_kv1_0': 'v1', 'k1_kv1_1': 'v2', 'k1_kv2': 'v3'}\n\n rmsubtree(**kwargs)\n Clears all subfolders and files in location\n kwargs:\n location (str): target directory to remove\n Examples:\n\n >>> rmsubtree(location=\"/path/to/target_dir\").\n\n tar_unpacker(**kwargs)\n unpacks tar to a tmp directory. \n\n\n Kwargs:\n\n tar_path (str): tar file path\n versbose (bool): True enables verbose\n\n returns:\n\n tmp_path (generator): extracted contents path\n\n Examples:\n\n tar_unpacker(file_pattern=\"/mnt/data/tarfile.tar.gz\").\n\n >>> tar_unpacker(file_pattern=\"/mnt/data/tarfile.tar.gz\").\n /tmp/FZ4245_Zb/\n\nAUTHOR\n senorchow\n\nFILE\n irlutils/file/file_utils.py\n```\n## database_utils\n```\nFUNCTIONS\n build_index(cursor, column, tables)\n Build an index on `column` for each table in `tables`\n\n drop_tables(sqlite_con, tables=[])\n\n fetchiter(cursor, arraysize=10000)\n Generator for cursor results\n\n get_channel_content(visit_id, channel_id, sqlite_cur, ldb_con, beautify=True)\n Return javascript content for given channel_id.\n Parameters\n ----------\n visit_id : int\n `visit_id` of the page visit where this URL was loaded\n channel_id : string\n `channel_id` to search content hash for\n sqlite_cur : sqlite3.Cursor\n cursor for crawl database\n ldb_con : plyvel.DB\n leveldb database storing javascript content\n beautify : boolean\n Control weather or not to beautify output\n\n get_channel_content_with_hash(visit_id, channel_id, sqlite_cur, ldb_con, beautify=True)\n Return javascript content for given channel_id.\n Parameters\n ----------\n visit_id : int\n `visit_id` of the page visit where this URL was loaded\n channel_id : string\n `channel_id` to search content hash for\n sqlite_cur : sqlite3.Cursor\n cursor for crawl database\n ldb_con : plyvel.DB\n leveldb database storing javascript content\n beautify : boolean\n Control weather or not to beautify output\n\n get_content(db, content_hash, compression='snappy', beautify=True)\n Returns decompressed content from javascript leveldb database\n\n get_ldb_content(ldb_addr, content_hash)\n\n get_leveldb(db_path, compression='snappy')\n Returns an open handle for a leveldb database\n with proper configuration settings.\n\n get_url_content(url, sqlite_cur, ldb_con, beautify=True, visit_id=None)\n Return javascript content for given url.\n Parameters\n ----------\n url : string\n url to search content hash for\n sqlite_cur : sqlite3.Cursor\n cursor for crawl database\n ldb_con : plyvel.DB\n leveldb database storing javascript content\n beautify : boolean\n Control weather or not to beautify output\n visit_id : int\n (optional) `visit_id` of the page visit where this URL was loaded\n\n get_url_content_with_hash(url, sqlite_cur, ldb_con, beautify=True, visit_id=None)\n Return javascript content for given url.\n Parameters\n ----------\n url : string\n url to search content hash for\n sqlite_cur : sqlite3.Cursor\n cursor for crawl database\n ldb_con : plyvel.DB\n leveldb database storing javascript content\n beautify : boolean\n Control weather or not to beautify output\n visit_id : int\n (optional) `visit_id` of the page visit where this URL was loaded\n\n list_placeholder(length, is_pg=False)\n Returns a (?,?,?,?...) string of the desired length\n\n optimize_db(cursor)\n Set options to make sqlite more efficient on a high memory machine\n\nFILE\n irlutils/url/crawl/database_utils.py\n```\n\n## domain_utils\n\n```\nFUNCTIONS\n get_hostname(url)\n strips out the hostname from a url\n\n get_ps_plus_1(url, **kwargs)\n Returns the PS+1 of the url. This will also return\n an IP address if the hostname of the url is a valid\n IP address.\n\n An (optional) PublicSuffixList object can be passed with keyword arg 'psl',\n otherwise a version cached in the system temp directory is used.\n\n get_psl(location='public_suffix_list.dat')\n Grabs an updated public suffix list.\n\n get_stripped_query_string(url)\n\n get_stripped_url(url, scheme=False)\n Returns a url stripped to (scheme)?+hostname+path\n\n get_stripped_urls(urls, scheme=False)\n Returns a set (or list) of urls stripped to (scheme)?+hostname+path\n\n hostname_subparts(url, include_ps=False, **kwargs)\n Returns a list of slices of a url's hostname down to the PS+1\n\n If `include_ps` is set, the hostname slices will include the public suffix\n\n For example: http://a.b.c.d.com/path?query#frag would yield:\n [a.b.c.d.com, b.c.d.com, c.d.com, d.com] if include_ps == False\n [a.b.c.d.com, b.c.d.com, c.d.com, d.com, com] if include_ps == True\n\n An (optional) PublicSuffixList object can be passed with keyword arg 'psl'.\n otherwise a version cached in the system temp directory is used.\n\n is_ip_address(hostname)\n Check if the given string is a valid IP address\n\n load_psl(function)\n\nDATA\n PSL_CACHE_LOC = 'public_suffix_list.dat'\n absolute_import = _Feature((2, 5, 0, 'alpha', 1), (3, 0, 0, 'alpha', 0...\n print_function = _Feature((2, 6, 0, 'alpha', 2), (3, 0, 0, 'alpha', 0)...\n\nFILE\n irlutils/url/crawl/domain_utils.py\n```\n\n## blocklist_utils\n\n```\nFUNCTIONS\n get_option_dict(request)\n Build an options dict for BlockListParser\n\n Parameters\n ----------\n request : sqlite3.Row\n A single HTTP request record pulled from OpenWPM's http_requests table\n public_suffix_list : PublicSuffixList\n An instance of PublicSuffixList()\n\n BINARY_OPTIONS = [\n \"script\",\n \"image\",\n \"stylesheet\",\n \"object\",\n \"xmlhttprequest\",\n \"object-subrequest\",\n \"subdocument\",\n \"document\",\n \"elemhide\",\n \"other\",\n \"background\",\n \"xbl\",\n \"ping\",\n \"dtd\",\n \"media\",\n \"third-party\",\n \"match-case\",\n \"collapse\",\n \"donottrack\",\n ]\n\n Returns\n -------\n dict\n An \"options\" dictionary for use with BlockListParser\n refs: [1] https://github.com/MoonchildProductions/UXP/blob/master/dom/base/nsIContentPolicyBase.idl\n [2] https://adblockplus.org/en/filters#options\n [3]\n\nFILE\n irlutils/url/crawl/blocklist_utils.py\n```\n\n## analysis_utils\n\n```\nFUNCTIONS\n add_col_bare_script_url(js_df)\n Add a col for script URL without scheme, www and query.\n\n add_col_set_of_script_ps1s_from_call_stack(js_df)\n map psls to call stack in scripts\n\n Args: \n js_df (pandas dataFrame): javascript table\n\n add_col_set_of_script_urls_from_call_stack(js_df)\n\n add_col_unix_timestamp(df)\n\n datetime_from_iso(iso_date)\n Convert from ISO.\n\n get_cookie(headers)\n A special case of parse headers that extracts only the cookie. \n\n Args: \n headers (list): http request headers\n\n Returns:\n\n item(string): name value pairs of a cookie\n\n get_func_and_script_url_from_initiator(initiator)\n Remove line number and column number from the initiator.\n\n get_host_from_url(url)\n\n get_initiator_from_call_stack(call_stack)\n Return the bottom element of the call stack.\n\n get_initiator_from_req_call_stack(req_call_stack)\n Return the bottom element of a request call stack.\n Request call stacks have an extra field (async_cause) at the end.\n\n get_requests_from_visits(con, visit_ids)\n Extact http requests matching visit_ids\n\n Args: \n con (sqlite3.connection): A connection to a sqlite data base\n visit_ids (list): A list of ids for from each web visit\n\n Returns:\n df(pandas DataFrame): A table containing visits that conincide with http requests\n\n get_responses_from_visits(con, visit_ids)\n Extact http requests matching visit_ids\n\n Args: \n con (sqlite3.connection): A connection to a sqlite data base\n visit_ids (list): A list of ids for from each web visit\n\n Returns:\n df(pandas DataFrame): A table containing visits that conincide with http responses\n\n get_script_url_from_initiator(initiator)\n Remove the scheme and query section of a URL.\n\n get_script_urls_from_call_stack_as_set(call_stack)\n Return the urls of the scripts involved in the call stack as a set.\n\n get_set_cookie(header)\n A special case of parse headers that returns 'Set-Cookies'\n\n Args: \n headers (string): http request headers\n\n Returns:\n item(string): name value pairs of Set Cookie field\n\n get_set_of_script_hosts_from_call_stack(call_stack)\n Return the urls of the scripts involved in the call stack.\n\n get_set_of_script_ps1s_from_call_stack(script_urls, du)\n extract a unique set of urls from a list of urls detected in scripts\n\n Args: \n script_urls (list): A list of urls extracted from javascripts\n du (list): A domain utilities instance\n\n Returns:\n psls(set): a set of tld+1(string)\n\n get_set_of_script_urls_from_call_stack(call_stack)\n Return the urls of the scripts involved in the call stack as a\n string.\n\n parse_headers(header)\n parses http header into kv pairs\n\n Args: \n headers (string): http request headers\n\n Returns:\n kv(dict): name value pairs of http headers\n\n strip_scheme_www_and_query(url)\n Remove the scheme and query section of a URL.\n\nDATA\n absolute_import = _Feature((2, 5, 0, 'alpha', 1), (3, 0, 0, 'alpha', 0...\n print_function = _Feature((2, 6, 0, 'alpha', 2), (3, 0, 0, 'alpha', 0)...\n\nFILE\n irlutils/url/crawl/analysis_utils.py\n```\n\n## chi2_proportions\n\n```\nFUNCTIONS\n chi2Proportions(count, nobs)\n A wrapper for the chi2 testing proportions based upon the chi-square test\n\n Args:\n count (:obj `list` of :obj`int` or a single `int`): the number of successes in nobs trials. If this is \n array_like, then the assumption is that this represents the number of successes \n for each independent sample \n\n\n nobs (:obj `list` of :obj`int` or a single `int`): The number of trials or observations, with the same length as count. \n\n Returns: \n chi2 (:obj `float`): The test statistic.\n\n p (:obj `float`): The p-value of the test\n\n dof (int) : Degrees of freedom\n\n expected (:obj `list`): list same shape as observed. The expected frequencies, based on the marginal sums of the table\n\n\n References: \n [1] \"scipy.stats.chi2_contingency\" https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.chi2_contingency.html\n [2] \"statsmodels.stats.proportion.proportions_chisquare\" https://www.statsmodels.org/dev/generated/statsmodels.stats.proportion.proportions_chisquare.html\n [3] (1, 2) \u201cContingency table\u201d, https://en.wikipedia.org/wiki/Contingency_table\n [4] (1, 2) \u201cPearson\u2019s chi-squared test\u201d, https://en.wikipedia.org/wiki/Pearson%27s_chi-squared_test\n [5] (1, 2) Cressie, N. and Read, T. R. C., \u201cMultinomial Goodness-of-Fit Tests\u201d, J. Royal Stat. Soc. Series B, Vol. 46, No. 3 (1984), pp. 440-464.\n\n Sample use: \n input: \n [10,10,20] - number of successes in trial \n [20,20,20] - number of trials \n chi2Proportions([10,10,20], [20,20,20])\n\n output: \n (2.7777777777777777,\n 0.24935220877729619,\n 2,\n array([[ 12., 12., 16.],\n [ 18., 18., 24.]]))\n\nFILE\n irlutils/stats/tests/proportions/chi2_proportions.py\n```\nHelp on module file_utils:\n\nNAME\n file_utils\n\nFUNCTIONS\n chmod(path, mode=777, recursive=False)\n\n chownUser(path, recursive=False, owner='user', group='user')\n\n compress_path(path)\n\n cp(s, d)\n\n file_ext(path, **kwargs)\n file extension finder\n kwargs:\n path (str): path or file name\n Returns:\n dotted file extension of a file\n Examples:\n\n >>> file_ext('/path/to_file/with_ext/test.py')\n .py\n\n gen_find_files(**kwargs)\n returns filenames that matches the given pattern under() a given dir\n\n\n Kwargs:\n file_pattern (str): a regex style string . \n root (str): top level folder to begin search from. \n\n Yields:\n path (generator): matching path str\n\n Examples:\n gen_find_files(file_pattern=\"*.sql\", root=\"/mnt/data/).\n\n >>> gen_find_files(file_pattern=\"*.sql\", root=\"/mnt/data/).__next__()\n /mnt/data/first_folder/last_folder/file.sqlite\n\n Reference: \n [1] http://www.dabeaz.com/generators/\n\n json_flatten(y)\n flattens nested structures within a json file\n\n\n Kwargs:\n\n data (dict): data from nested dictionary\n kv (dict): dictionary containing key,value pairs. \n\n returns:\n\n kv (dict): a dictionary object containing flattened structures\n\n Examples:\n data = {'k1':{'kv1':['v1', 'v2'], 'kv2': 'v3'}}\n\n >>> json_flatten(data)\n {'k1_kv1_0': 'v1', 'k1_kv1_1': 'v2', 'k1_kv2': 'v3'}\n\n mkdir(d, mode=511, exist_ok=True)\n\n mv(s, d)\n\n rm(self, d)\n\n rmsubtree(**kwargs)\n Clears all subfolders and files in location\n kwargs:\n location (str): target directory to remove\n Examples:\n\n >>> rmsubtree(location=\"/path/to/target_dir\").\n\n tar_packer(tar_dir, **kwargs)\n tars up directory \n\n\n Kwargs:\n\n dir (str): top level dir\n compression (bool): compression type. gz, xz supported now\n versbose (bool): True enables verbose\n\n returns:\n\n tar_path (generator): path to tar file\n\n Examples:\n\n tar_packer(dir=\"/path/to/top_level_dir\", [compression=gz|xz]\n\n >>> \n /tmp/FZ4245_Zb/top_level_dir.tar\n\n tar_unpacker(tar_path, **kwargs)\n unpacks tar to a tmp directory. \n\n\n Kwargs:\n\n tar_path (str): tar file path\n versbose (bool): True enables verbose\n\n returns:\n\n tmp_path (generator): extracted contents path\n\n Examples:\n\n tar_unpacker(tar_path=\"/mnt/data/tarfile.tar.gz\").\n\n >>> tar_unpacker(tar_path=\"/mnt/data/tarfile.tar.gz\").\n /tmp/FZ4245_Zb/\n\n touch(self, d)\n\nDATA\n DBG = \n\nAUTHOR\n johncook\n\nFILE\n /Users/johncook/git/uiowa-irl-utils/irlutils/file/file_utils.py\n\n\nHelp on module database_utils:\n\nNAME\n database_utils\n\nFUNCTIONS\n build_index(cursor, column, tables)\n Build an index on `column` for each table in `tables`\n\n drop_tables(sqlite_con, tables=[])\n\n fetchiter(cursor, arraysize=10000)\n Generator for cursor results\n\n get_channel_content(visit_id, channel_id, sqlite_cur, ldb_con, beautify=True)\n Return javascript content for given channel_id.\n Parameters\n ----------\n visit_id : int\n `visit_id` of the page visit where this URL was loaded\n channel_id : string\n `channel_id` to search content hash for\n sqlite_cur : sqlite3.Cursor\n cursor for crawl database\n ldb_con : plyvel.DB\n leveldb database storing javascript content\n beautify : boolean\n Control weather or not to beautify output\n\n get_channel_content_with_hash(visit_id, channel_id, sqlite_cur, ldb_con, beautify=True)\n Return javascript content for given channel_id.\n Parameters\n ----------\n visit_id : int\n `visit_id` of the page visit where this URL was loaded\n channel_id : string\n `channel_id` to search content hash for\n sqlite_cur : sqlite3.Cursor\n cursor for crawl database\n ldb_con : plyvel.DB\n leveldb database storing javascript content\n beautify : boolean\n Control weather or not to beautify output\n\n get_content(db, content_hash, compression='snappy', beautify=True)\n Returns decompressed content from javascript leveldb database\n\n get_ldb_content(ldb_addr, content_hash)\n\n get_leveldb(db_path, compression='snappy')\n Returns an open handle for a leveldb database\n with proper configuration settings.\n\n get_url_content(url, sqlite_cur, ldb_con, beautify=True, visit_id=None)\n Return javascript content for given url.\n Parameters\n ----------\n url : string\n url to search content hash for\n sqlite_cur : sqlite3.Cursor\n cursor for crawl database\n ldb_con : plyvel.DB\n leveldb database storing javascript content\n beautify : boolean\n Control weather or not to beautify output\n visit_id : int\n (optional) `visit_id` of the page visit where this URL was loaded\n\n get_url_content_with_hash(url, sqlite_cur, ldb_con, beautify=True, visit_id=None)\n Return javascript content for given url.\n Parameters\n ----------\n url : string\n url to search content hash for\n sqlite_cur : sqlite3.Cursor\n cursor for crawl database\n ldb_con : plyvel.DB\n leveldb database storing javascript content\n beautify : boolean\n Control weather or not to beautify output\n visit_id : int\n (optional) `visit_id` of the page visit where this URL was loaded\n\n list_placeholder(length, is_pg=False)\n Returns a (?,?,?,?...) string of the desired length\n\n optimize_db(cursor)\n Set options to make sqlite more efficient on a high memory machine\n\nFILE\n /Users/johncook/git/uiowa-irl-utils/irlutils/url/crawl/database_utils.py\n\n\nproblem in irlutils/url/crawl/domain_utils.py - ModuleNotFoundError: No module named 'publicsuffix2'\nproblem in irlutils/url/crawl/blocklist_utils.py - ModuleNotFoundError: No module named 'publicsuffix2'\nHelp on module analysis_utils:\n\nNAME\n analysis_utils\n\nFUNCTIONS\n add_col_bare_script_url(js_df)\n Add a col for script URL without scheme, www and query.\n\n add_col_set_of_script_ps1s_from_call_stack(js_df)\n map psls to call stack in scripts\n\n Args: \n js_df (pandas dataFrame): javascript table\n\n add_col_set_of_script_urls_from_call_stack(js_df)\n\n add_col_unix_timestamp(df)\n\n datetime_from_iso(iso_date)\n Convert from ISO.\n\n get_cookie(headers)\n A special case of parse headers that extracts only the cookie. \n\n Args: \n headers (list): http request headers\n\n Returns:\n\n item(string): name value pairs of a cookie\n\n get_func_and_script_url_from_initiator(initiator)\n Remove line number and column number from the initiator.\n\n get_host_from_url(url)\n\n get_initiator_from_call_stack(call_stack)\n Return the bottom element of the call stack.\n\n get_initiator_from_req_call_stack(req_call_stack)\n Return the bottom element of a request call stack.\n Request call stacks have an extra field (async_cause) at the end.\n\n get_requests_from_visits(con, visit_ids)\n Extact http requests matching visit_ids\n\n Args: \n con (sqlite3.connection): A connection to a sqlite data base\n visit_ids (list): A list of ids for from each web visit\n\n Returns:\n df(pandas DataFrame): A table containing visits that conincide with http requests\n\n get_responses_from_visits(con, visit_ids)\n Extact http requests matching visit_ids\n\n Args: \n con (sqlite3.connection): A connection to a sqlite data base\n visit_ids (list): A list of ids for from each web visit\n\n Returns:\n df(pandas DataFrame): A table containing visits that conincide with http responses\n\n get_script_url_from_initiator(initiator)\n Remove the scheme and query section of a URL.\n\n get_script_urls_from_call_stack_as_set(call_stack)\n Return the urls of the scripts involved in the call stack as a set.\n\n get_set_cookie(header)\n A special case of parse headers that returns 'Set-Cookies'\n\n Args: \n headers (string): http request headers\n\n Returns:\n item(string): name value pairs of Set Cookie field\n\n get_set_of_script_hosts_from_call_stack(call_stack)\n Return the urls of the scripts involved in the call stack.\n\n get_set_of_script_ps1s_from_call_stack(script_urls, du)\n extract a unique set of urls from a list of urls detected in scripts\n\n Args: \n script_urls (list): A list of urls extracted from javascripts\n du (list): A domain utilities instance\n\n Returns:\n psls(set): a set of tld+1(string)\n\n get_set_of_script_urls_from_call_stack(call_stack)\n Return the urls of the scripts involved in the call stack as a\n string.\n\n parse_headers(header)\n parses http header into kv pairs\n\n Args: \n headers (string): http request headers\n\n Returns:\n kv(dict): name value pairs of http headers\n\n strip_scheme_www_and_query(url)\n Remove the scheme and query section of a URL.\n\nDATA\n absolute_import = _Feature((2, 5, 0, 'alpha', 1), (3, 0, 0, 'alpha', 0...\n print_function = _Feature((2, 6, 0, 'alpha', 2), (3, 0, 0, 'alpha', 0)...\n\nFILE\n /Users/johncook/git/uiowa-irl-utils/irlutils/url/crawl/analysis_utils.py\n\n\nHelp on module chi2_proportions:\n\nNAME\n chi2_proportions\n\nFUNCTIONS\n chi2Proportions(count, nobs)\n A wrapper for the chi2 testing proportions based upon the chi-square test\n\n Args:\n count (:obj `list` of :obj`int` or a single `int`): the number of successes in nobs trials. If this is \n array_like, then the assumption is that this represents the number of successes \n for each independent sample \n\n\n nobs (:obj `list` of :obj`int` or a single `int`): The number of trials or observations, with the same length as count. \n\n Returns: \n chi2 (:obj `float`): The test statistic.\n\n p (:obj `float`): The p-value of the test\n\n dof (int) : Degrees of freedom\n\n expected (:obj `list`): list same shape as observed. The expected frequencies, based on the marginal sums of the table\n\n\n References: \n [1] \"scipy.stats.chi2_contingency\" https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.chi2_contingency.html\n [2] \"statsmodels.stats.proportion.proportions_chisquare\" https://www.statsmodels.org/dev/generated/statsmodels.stats.proportion.proportions_chisquare.html\n [3] (1, 2) \u201cContingency table\u201d, https://en.wikipedia.org/wiki/Contingency_table\n [4] (1, 2) \u201cPearson\u2019s chi-squared test\u201d, https://en.wikipedia.org/wiki/Pearson%27s_chi-squared_test\n [5] (1, 2) Cressie, N. and Read, T. R. C., \u201cMultinomial Goodness-of-Fit Tests\u201d, J. Royal Stat. Soc. Series B, Vol. 46, No. 3 (1984), pp. 440-464.\n\n Sample use: \n input: \n [10,10,20] - number of successes in trial \n [20,20,20] - number of trials \n chi2Proportions([10,10,20], [20,20,20])\n\n output: \n (2.7777777777777777,\n 0.24935220877729619,\n 2,\n array([[ 12., 12., 16.],\n [ 18., 18., 24.]]))\n\nFILE\n /Users/johncook/git/uiowa-irl-utils/irlutils/stats/tests/proportions/chi2_proportions.py\n\n\nHelp on module file_utils:\n\nNAME\n file_utils\n\nFUNCTIONS\n chmod(path, mode=777, recursive=False)\n\n chownUser(path, recursive=False, owner='user', group='user')\n\n compress_path(path)\n\n cp(s, d)\n\n file_ext(path, **kwargs)\n file extension finder\n kwargs:\n path (str): path or file name\n Returns:\n dotted file extension of a file\n Examples:\n\n >>> file_ext('/path/to_file/with_ext/test.py')\n .py\n\n gen_find_files(**kwargs)\n returns filenames that matches the given pattern under() a given dir\n\n\n Kwargs:\n file_pattern (str): a regex style string . \n root (str): top level folder to begin search from. \n\n Yields:\n path (generator): matching path str\n\n Examples:\n gen_find_files(file_pattern=\"*.sql\", root=\"/mnt/data/).\n\n >>> gen_find_files(file_pattern=\"*.sql\", root=\"/mnt/data/).__next__()\n /mnt/data/first_folder/last_folder/file.sqlite\n\n Reference: \n [1] http://www.dabeaz.com/generators/\n\n json_flatten(y)\n flattens nested structures within a json file\n\n\n Kwargs:\n\n data (dict): data from nested dictionary\n kv (dict): dictionary containing key,value pairs. \n\n returns:\n\n kv (dict): a dictionary object containing flattened structures\n\n Examples:\n data = {'k1':{'kv1':['v1', 'v2'], 'kv2': 'v3'}}\n\n >>> json_flatten(data)\n {'k1_kv1_0': 'v1', 'k1_kv1_1': 'v2', 'k1_kv2': 'v3'}\n\n mkdir(d, mode=511, exist_ok=True)\n\n mv(s, d)\n\n rm(d)\n\n rmsubtree(**kwargs)\n Clears all subfolders and files in location\n kwargs:\n location (str): target directory to remove\n Examples:\n\n >>> rmsubtree(location=\"/path/to/target_dir\").\n\n tar_packer(tar_dir, **kwargs)\n tars up directory \n\n\n Kwargs:\n\n dir (str): top level dir\n compression (bool): compression type. gz, xz supported now\n versbose (bool): True enables verbose\n\n returns:\n\n tar_path (generator): path to tar file\n\n Examples:\n\n tar_packer(dir=\"/path/to/top_level_dir\", [compression=gz|xz]\n\n >>> \n /tmp/FZ4245_Zb/top_level_dir.tar\n\n tar_unpacker(tar_path, **kwargs)\n unpacks tar to a tmp directory. \n\n\n Kwargs:\n\n tar_path (str): tar file path\n versbose (bool): True enables verbose\n\n returns:\n\n tmp_path (generator): extracted contents path\n\n Examples:\n\n tar_unpacker(tar_path=\"/mnt/data/tarfile.tar.gz\").\n\n >>> tar_unpacker(tar_path=\"/mnt/data/tarfile.tar.gz\").\n /tmp/FZ4245_Zb/\n\n touch(d)\n\nDATA\n DBG = \n\nAUTHOR\n johncook\n\nFILE\n /Users/johncook/git/uiowa-irl-utils/irlutils/file/file_utils.py\n\n\nHelp on module database_utils:\n\nNAME\n database_utils\n\nFUNCTIONS\n build_index(cursor, column, tables)\n Build an index on `column` for each table in `tables`\n\n drop_tables(sqlite_con, tables=[])\n\n fetchiter(cursor, arraysize=10000)\n Generator for cursor results\n\n get_channel_content(visit_id, channel_id, sqlite_cur, ldb_con, beautify=True)\n Return javascript content for given channel_id.\n Parameters\n ----------\n visit_id : int\n `visit_id` of the page visit where this URL was loaded\n channel_id : string\n `channel_id` to search content hash for\n sqlite_cur : sqlite3.Cursor\n cursor for crawl database\n ldb_con : plyvel.DB\n leveldb database storing javascript content\n beautify : boolean\n Control weather or not to beautify output\n\n get_channel_content_with_hash(visit_id, channel_id, sqlite_cur, ldb_con, beautify=True)\n Return javascript content for given channel_id.\n Parameters\n ----------\n visit_id : int\n `visit_id` of the page visit where this URL was loaded\n channel_id : string\n `channel_id` to search content hash for\n sqlite_cur : sqlite3.Cursor\n cursor for crawl database\n ldb_con : plyvel.DB\n leveldb database storing javascript content\n beautify : boolean\n Control weather or not to beautify output\n\n get_content(db, content_hash, compression='snappy', beautify=True)\n Returns decompressed content from javascript leveldb database\n\n get_ldb_content(ldb_addr, content_hash)\n\n get_leveldb(db_path, compression='snappy')\n Returns an open handle for a leveldb database\n with proper configuration settings.\n\n get_url_content(url, sqlite_cur, ldb_con, beautify=True, visit_id=None)\n Return javascript content for given url.\n Parameters\n ----------\n url : string\n url to search content hash for\n sqlite_cur : sqlite3.Cursor\n cursor for crawl database\n ldb_con : plyvel.DB\n leveldb database storing javascript content\n beautify : boolean\n Control weather or not to beautify output\n visit_id : int\n (optional) `visit_id` of the page visit where this URL was loaded\n\n get_url_content_with_hash(url, sqlite_cur, ldb_con, beautify=True, visit_id=None)\n Return javascript content for given url.\n Parameters\n ----------\n url : string\n url to search content hash for\n sqlite_cur : sqlite3.Cursor\n cursor for crawl database\n ldb_con : plyvel.DB\n leveldb database storing javascript content\n beautify : boolean\n Control weather or not to beautify output\n visit_id : int\n (optional) `visit_id` of the page visit where this URL was loaded\n\n list_placeholder(length, is_pg=False)\n Returns a (?,?,?,?...) string of the desired length\n\n optimize_db(cursor)\n Set options to make sqlite more efficient on a high memory machine\n\nFILE\n /Users/johncook/git/uiowa-irl-utils/irlutils/url/crawl/database_utils.py\n\n\nproblem in irlutils/url/crawl/domain_utils.py - ModuleNotFoundError: No module named 'publicsuffix2'\nproblem in irlutils/url/crawl/blocklist_utils.py - ModuleNotFoundError: No module named 'publicsuffix2'\nHelp on module analysis_utils:\n\nNAME\n analysis_utils\n\nFUNCTIONS\n add_col_bare_script_url(js_df)\n Add a col for script URL without scheme, www and query.\n\n add_col_set_of_script_ps1s_from_call_stack(js_df)\n map psls to call stack in scripts\n\n Args: \n js_df (pandas dataFrame): javascript table\n\n add_col_set_of_script_urls_from_call_stack(js_df)\n\n add_col_unix_timestamp(df)\n\n datetime_from_iso(iso_date)\n Convert from ISO.\n\n get_cookie(headers)\n A special case of parse headers that extracts only the cookie. \n\n Args: \n headers (list): http request headers\n\n Returns:\n\n item(string): name value pairs of a cookie\n\n get_func_and_script_url_from_initiator(initiator)\n Remove line number and column number from the initiator.\n\n get_host_from_url(url)\n\n get_initiator_from_call_stack(call_stack)\n Return the bottom element of the call stack.\n\n get_initiator_from_req_call_stack(req_call_stack)\n Return the bottom element of a request call stack.\n Request call stacks have an extra field (async_cause) at the end.\n\n get_requests_from_visits(con, visit_ids)\n Extact http requests matching visit_ids\n\n Args: \n con (sqlite3.connection): A connection to a sqlite data base\n visit_ids (list): A list of ids for from each web visit\n\n Returns:\n df(pandas DataFrame): A table containing visits that conincide with http requests\n\n get_responses_from_visits(con, visit_ids)\n Extact http requests matching visit_ids\n\n Args: \n con (sqlite3.connection): A connection to a sqlite data base\n visit_ids (list): A list of ids for from each web visit\n\n Returns:\n df(pandas DataFrame): A table containing visits that conincide with http responses\n\n get_script_url_from_initiator(initiator)\n Remove the scheme and query section of a URL.\n\n get_script_urls_from_call_stack_as_set(call_stack)\n Return the urls of the scripts involved in the call stack as a set.\n\n get_set_cookie(header)\n A special case of parse headers that returns 'Set-Cookies'\n\n Args: \n headers (string): http request headers\n\n Returns:\n item(string): name value pairs of Set Cookie field\n\n get_set_of_script_hosts_from_call_stack(call_stack)\n Return the urls of the scripts involved in the call stack.\n\n get_set_of_script_ps1s_from_call_stack(script_urls, du)\n extract a unique set of urls from a list of urls detected in scripts\n\n Args: \n script_urls (list): A list of urls extracted from javascripts\n du (list): A domain utilities instance\n\n Returns:\n psls(set): a set of tld+1(string)\n\n get_set_of_script_urls_from_call_stack(call_stack)\n Return the urls of the scripts involved in the call stack as a\n string.\n\n parse_headers(header)\n parses http header into kv pairs\n\n Args: \n headers (string): http request headers\n\n Returns:\n kv(dict): name value pairs of http headers\n\n strip_scheme_www_and_query(url)\n Remove the scheme and query section of a URL.\n\nDATA\n absolute_import = _Feature((2, 5, 0, 'alpha', 1), (3, 0, 0, 'alpha', 0...\n print_function = _Feature((2, 6, 0, 'alpha', 2), (3, 0, 0, 'alpha', 0)...\n\nFILE\n /Users/johncook/git/uiowa-irl-utils/irlutils/url/crawl/analysis_utils.py\n\n\nHelp on module chi2_proportions:\n\nNAME\n chi2_proportions\n\nFUNCTIONS\n chi2Proportions(count, nobs)\n A wrapper for the chi2 testing proportions based upon the chi-square test\n\n Args:\n count (:obj `list` of :obj`int` or a single `int`): the number of successes in nobs trials. If this is \n array_like, then the assumption is that this represents the number of successes \n for each independent sample \n\n\n nobs (:obj `list` of :obj`int` or a single `int`): The number of trials or observations, with the same length as count. \n\n Returns: \n chi2 (:obj `float`): The test statistic.\n\n p (:obj `float`): The p-value of the test\n\n dof (int) : Degrees of freedom\n\n expected (:obj `list`): list same shape as observed. The expected frequencies, based on the marginal sums of the table\n\n\n References: \n [1] \"scipy.stats.chi2_contingency\" https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.chi2_contingency.html\n [2] \"statsmodels.stats.proportion.proportions_chisquare\" https://www.statsmodels.org/dev/generated/statsmodels.stats.proportion.proportions_chisquare.html\n [3] (1, 2) \u201cContingency table\u201d, https://en.wikipedia.org/wiki/Contingency_table\n [4] (1, 2) \u201cPearson\u2019s chi-squared test\u201d, https://en.wikipedia.org/wiki/Pearson%27s_chi-squared_test\n [5] (1, 2) Cressie, N. and Read, T. R. C., \u201cMultinomial Goodness-of-Fit Tests\u201d, J. Royal Stat. Soc. Series B, Vol. 46, No. 3 (1984), pp. 440-464.\n\n Sample use: \n input: \n [10,10,20] - number of successes in trial \n [20,20,20] - number of trials \n chi2Proportions([10,10,20], [20,20,20])\n\n output: \n (2.7777777777777777,\n 0.24935220877729619,\n 2,\n array([[ 12., 12., 16.],\n [ 18., 18., 24.]]))\n\nFILE\n /Users/johncook/git/uiowa-irl-utils/irlutils/stats/tests/proportions/chi2_proportions.py\n\n\nHelp on module file_utils:\n\nNAME\n file_utils\n\nFUNCTIONS\n chmod(path, mode=777, recursive=False)\n\n chownUser(path, recursive=False, owner='user', group='user')\n\n compress_path(path)\n\n cp(s, d)\n\n file_ext(path, **kwargs)\n file extension finder\n kwargs:\n path (str): path or file name\n Returns:\n dotted file extension of a file\n Examples:\n\n >>> file_ext('/path/to_file/with_ext/test.py')\n .py\n\n gen_find_files(**kwargs)\n returns filenames that matches the given pattern under() a given dir\n\n\n Kwargs:\n file_pattern (str): a regex style string . \n root (str): top level folder to begin search from. \n\n Yields:\n path (generator): matching path str\n\n Examples:\n gen_find_files(file_pattern=\"*.sql\", root=\"/mnt/data/).\n\n >>> gen_find_files(file_pattern=\"*.sql\", root=\"/mnt/data/).__next__()\n /mnt/data/first_folder/last_folder/file.sqlite\n\n Reference: \n [1] http://www.dabeaz.com/generators/\n\n json_flatten(y)\n flattens nested structures within a json file\n\n\n Kwargs:\n\n data (dict): data from nested dictionary\n kv (dict): dictionary containing key,value pairs. \n\n returns:\n\n kv (dict): a dictionary object containing flattened structures\n\n Examples:\n data = {'k1':{'kv1':['v1', 'v2'], 'kv2': 'v3'}}\n\n >>> json_flatten(data)\n {'k1_kv1_0': 'v1', 'k1_kv1_1': 'v2', 'k1_kv2': 'v3'}\n\n mkdir(d, mode=511, exist_ok=True)\n\n mv(s, d)\n\n rm(d)\n\n rmsubtree(**kwargs)\n Clears all subfolders and files in location\n kwargs:\n location (str): target directory to remove\n Examples:\n\n >>> rmsubtree(location=\"/path/to/target_dir\").\n\n tar_packer(tar_dir, **kwargs)\n tars up directory \n\n\n Kwargs:\n\n dir (str): top level dir\n compression (bool): compression type. gz, xz supported now\n versbose (bool): True enables verbose\n\n returns:\n\n tar_path (generator): path to tar file\n\n Examples:\n\n tar_packer(dir=\"/path/to/top_level_dir\", [compression=gz|xz]\n\n >>> \n /tmp/FZ4245_Zb/top_level_dir.tar\n\n tar_unpacker(tar_path, **kwargs)\n unpacks tar to a tmp directory. \n\n\n Kwargs:\n\n tar_path (str): tar file path\n versbose (bool): True enables verbose\n\n returns:\n\n tmp_path (generator): extracted contents path\n\n Examples:\n\n tar_unpacker(tar_path=\"/mnt/data/tarfile.tar.gz\").\n\n >>> tar_unpacker(tar_path=\"/mnt/data/tarfile.tar.gz\").\n /tmp/FZ4245_Zb/\n\n touch(d)\n\nDATA\n DBG = \n\nAUTHOR\n johncook\n\nFILE\n /Users/johncook/git/uiowa-irl-utils/irlutils/file/file_utils.py\n\n\nHelp on module database_utils:\n\nNAME\n database_utils\n\nFUNCTIONS\n build_index(cursor, column, tables)\n Build an index on `column` for each table in `tables`\n\n drop_tables(sqlite_con, tables=[])\n\n fetchiter(cursor, arraysize=10000)\n Generator for cursor results\n\n get_channel_content(visit_id, channel_id, sqlite_cur, ldb_con, beautify=True)\n Return javascript content for given channel_id.\n Parameters\n ----------\n visit_id : int\n `visit_id` of the page visit where this URL was loaded\n channel_id : string\n `channel_id` to search content hash for\n sqlite_cur : sqlite3.Cursor\n cursor for crawl database\n ldb_con : plyvel.DB\n leveldb database storing javascript content\n beautify : boolean\n Control weather or not to beautify output\n\n get_channel_content_with_hash(visit_id, channel_id, sqlite_cur, ldb_con, beautify=True)\n Return javascript content for given channel_id.\n Parameters\n ----------\n visit_id : int\n `visit_id` of the page visit where this URL was loaded\n channel_id : string\n `channel_id` to search content hash for\n sqlite_cur : sqlite3.Cursor\n cursor for crawl database\n ldb_con : plyvel.DB\n leveldb database storing javascript content\n beautify : boolean\n Control weather or not to beautify output\n\n get_content(db, content_hash, compression='snappy', beautify=True)\n Returns decompressed content from javascript leveldb database\n\n get_ldb_content(ldb_addr, content_hash)\n\n get_leveldb(db_path, compression='snappy')\n Returns an open handle for a leveldb database\n with proper configuration settings.\n\n get_url_content(url, sqlite_cur, ldb_con, beautify=True, visit_id=None)\n Return javascript content for given url.\n Parameters\n ----------\n url : string\n url to search content hash for\n sqlite_cur : sqlite3.Cursor\n cursor for crawl database\n ldb_con : plyvel.DB\n leveldb database storing javascript content\n beautify : boolean\n Control weather or not to beautify output\n visit_id : int\n (optional) `visit_id` of the page visit where this URL was loaded\n\n get_url_content_with_hash(url, sqlite_cur, ldb_con, beautify=True, visit_id=None)\n Return javascript content for given url.\n Parameters\n ----------\n url : string\n url to search content hash for\n sqlite_cur : sqlite3.Cursor\n cursor for crawl database\n ldb_con : plyvel.DB\n leveldb database storing javascript content\n beautify : boolean\n Control weather or not to beautify output\n visit_id : int\n (optional) `visit_id` of the page visit where this URL was loaded\n\n list_placeholder(length, is_pg=False)\n Returns a (?,?,?,?...) string of the desired length\n\n optimize_db(cursor)\n Set options to make sqlite more efficient on a high memory machine\n\nFILE\n /Users/johncook/git/uiowa-irl-utils/irlutils/url/crawl/database_utils.py\n\n\nproblem in irlutils/url/crawl/domain_utils.py - ModuleNotFoundError: No module named 'publicsuffix2'\nproblem in irlutils/url/crawl/blocklist_utils.py - ModuleNotFoundError: No module named 'publicsuffix2'\nHelp on module analysis_utils:\n\nNAME\n analysis_utils\n\nFUNCTIONS\n add_col_bare_script_url(js_df)\n Add a col for script URL without scheme, www and query.\n\n add_col_set_of_script_ps1s_from_call_stack(js_df)\n map psls to call stack in scripts\n\n Args: \n js_df (pandas dataFrame): javascript table\n\n add_col_set_of_script_urls_from_call_stack(js_df)\n\n add_col_unix_timestamp(df)\n\n datetime_from_iso(iso_date)\n Convert from ISO.\n\n get_cookie(headers)\n A special case of parse headers that extracts only the cookie. \n\n Args: \n headers (list): http request headers\n\n Returns:\n\n item(string): name value pairs of a cookie\n\n get_func_and_script_url_from_initiator(initiator)\n Remove line number and column number from the initiator.\n\n get_host_from_url(url)\n\n get_initiator_from_call_stack(call_stack)\n Return the bottom element of the call stack.\n\n get_initiator_from_req_call_stack(req_call_stack)\n Return the bottom element of a request call stack.\n Request call stacks have an extra field (async_cause) at the end.\n\n get_requests_from_visits(con, visit_ids)\n Extact http requests matching visit_ids\n\n Args: \n con (sqlite3.connection): A connection to a sqlite data base\n visit_ids (list): A list of ids for from each web visit\n\n Returns:\n df(pandas DataFrame): A table containing visits that conincide with http requests\n\n get_responses_from_visits(con, visit_ids)\n Extact http requests matching visit_ids\n\n Args: \n con (sqlite3.connection): A connection to a sqlite data base\n visit_ids (list): A list of ids for from each web visit\n\n Returns:\n df(pandas DataFrame): A table containing visits that conincide with http responses\n\n get_script_url_from_initiator(initiator)\n Remove the scheme and query section of a URL.\n\n get_script_urls_from_call_stack_as_set(call_stack)\n Return the urls of the scripts involved in the call stack as a set.\n\n get_set_cookie(header)\n A special case of parse headers that returns 'Set-Cookies'\n\n Args: \n headers (string): http request headers\n\n Returns:\n item(string): name value pairs of Set Cookie field\n\n get_set_of_script_hosts_from_call_stack(call_stack)\n Return the urls of the scripts involved in the call stack.\n\n get_set_of_script_ps1s_from_call_stack(script_urls, du)\n extract a unique set of urls from a list of urls detected in scripts\n\n Args: \n script_urls (list): A list of urls extracted from javascripts\n du (list): A domain utilities instance\n\n Returns:\n psls(set): a set of tld+1(string)\n\n get_set_of_script_urls_from_call_stack(call_stack)\n Return the urls of the scripts involved in the call stack as a\n string.\n\n parse_headers(header)\n parses http header into kv pairs\n\n Args: \n headers (string): http request headers\n\n Returns:\n kv(dict): name value pairs of http headers\n\n strip_scheme_www_and_query(url)\n Remove the scheme and query section of a URL.\n\nDATA\n absolute_import = _Feature((2, 5, 0, 'alpha', 1), (3, 0, 0, 'alpha', 0...\n print_function = _Feature((2, 6, 0, 'alpha', 2), (3, 0, 0, 'alpha', 0)...\n\nFILE\n /Users/johncook/git/uiowa-irl-utils/irlutils/url/crawl/analysis_utils.py\n\n\nHelp on module chi2_proportions:\n\nNAME\n chi2_proportions\n\nFUNCTIONS\n chi2Proportions(count, nobs)\n A wrapper for the chi2 testing proportions based upon the chi-square test\n\n Args:\n count (:obj `list` of :obj`int` or a single `int`): the number of successes in nobs trials. If this is \n array_like, then the assumption is that this represents the number of successes \n for each independent sample \n\n\n nobs (:obj `list` of :obj`int` or a single `int`): The number of trials or observations, with the same length as count. \n\n Returns: \n chi2 (:obj `float`): The test statistic.\n\n p (:obj `float`): The p-value of the test\n\n dof (int) : Degrees of freedom\n\n expected (:obj `list`): list same shape as observed. The expected frequencies, based on the marginal sums of the table\n\n\n References: \n [1] \"scipy.stats.chi2_contingency\" https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.chi2_contingency.html\n [2] \"statsmodels.stats.proportion.proportions_chisquare\" https://www.statsmodels.org/dev/generated/statsmodels.stats.proportion.proportions_chisquare.html\n [3] (1, 2) \u201cContingency table\u201d, https://en.wikipedia.org/wiki/Contingency_table\n [4] (1, 2) \u201cPearson\u2019s chi-squared test\u201d, https://en.wikipedia.org/wiki/Pearson%27s_chi-squared_test\n [5] (1, 2) Cressie, N. and Read, T. R. C., \u201cMultinomial Goodness-of-Fit Tests\u201d, J. Royal Stat. Soc. Series B, Vol. 46, No. 3 (1984), pp. 440-464.\n\n Sample use: \n input: \n [10,10,20] - number of successes in trial \n [20,20,20] - number of trials \n chi2Proportions([10,10,20], [20,20,20])\n\n output: \n (2.7777777777777777,\n 0.24935220877729619,\n 2,\n array([[ 12., 12., 16.],\n [ 18., 18., 24.]]))\n\nFILE\n /Users/johncook/git/uiowa-irl-utils/irlutils/stats/tests/proportions/chi2_proportions.py\n\n\nHelp on module file_utils:\n\nNAME\n file_utils\n\nFUNCTIONS\n chmod(path, mode=777, recursive=False)\n\n chown(path, recursive=False, owner='user', group='user')\n\n compress_path(path)\n\n cp(s, d)\n\n file_ext(path, **kwargs)\n file extension finder\n kwargs:\n path (str): path or file name\n Returns:\n dotted file extension of a file\n Examples:\n\n >>> file_ext('/path/to_file/with_ext/test.py')\n .py\n\n gen_find_files(**kwargs)\n returns filenames that matches the given pattern under() a given dir\n\n\n Kwargs:\n file_pattern (str): a regex style string .\n root (str): top level folder to begin search from.\n\n Yields:\n path (generator): matching path str\n\n Examples:\n gen_find_files(file_pattern=\"*.sql\", root=\"/mnt/data/).\n\n >>> gen_find_files(file_pattern=\"*.sql\", root=\"/mnt/data/).__next__()\n /mnt/data/first_folder/last_folder/file.sqlite\n\n Reference:\n [1] http://www.dabeaz.com/generators/\n\n json_flatten(y)\n flattens nested structures within a json file\n\n\n Kwargs:\n\n data (dict): data from nested dictionary\n kv (dict): dictionary containing key,value pairs.\n\n returns:\n\n kv (dict): a dictionary object containing flattened structures\n\n Examples:\n data = {'k1':{'kv1':['v1', 'v2'], 'kv2': 'v3'}}\n\n >>> json_flatten(data)\n {'k1_kv1_0': 'v1', 'k1_kv1_1': 'v2', 'k1_kv2': 'v3'}\n\n mkdir(d, mode=511, exist_ok=True)\n\n mv(s, d)\n\n rm(d)\n\n rmsubtree(**kwargs)\n Clears all subfolders and files in location\n kwargs:\n location (str): target directory to remove\n Examples:\n\n >>> rmsubtree(location=\"/path/to/target_dir\").\n\n tar_packer(tar_dir, **kwargs)\n tars up directory\n\n\n Kwargs:\n\n dir (str): top level dir\n compression (bool): compression type. gz, xz supported now\n versbose (bool): True enables verbose\n\n returns:\n\n tar_path (generator): path to tar file\n\n Examples:\n\n tar_packer(dir=\"/path/to/top_level_dir\", [compression=gz|xz]\n\n >>>\n /tmp/FZ4245_Zb/top_level_dir.tar\n\n tar_unpacker(tar_path, **kwargs)\n unpacks tar to a tmp directory.\n\n\n Kwargs:\n\n tar_path (str): tar file path\n versbose (bool): True enables verbose\n\n returns:\n\n tmp_path (generator): extracted contents path\n\n Examples:\n\n tar_unpacker(tar_path=\"/mnt/data/tarfile.tar.gz\").\n\n >>> tar_unpacker(tar_path=\"/mnt/data/tarfile.tar.gz\").\n /tmp/FZ4245_Zb/\n\n touch(d)\n\nDATA\n DBG = \n\nAUTHOR\n johncook\n\nFILE\n /Users/johncook/git/uiowa-irl-utils/irlutils/file/file_utils.py\n\n\nHelp on module database_utils:\n\nNAME\n database_utils\n\nFUNCTIONS\n build_index(cursor, column, tables)\n Build an index on `column` for each table in `tables`\n\n drop_tables(sqlite_con, tables=[])\n\n fetchiter(cursor, arraysize=10000)\n Generator for cursor results\n\n get_channel_content(visit_id, channel_id, sqlite_cur, ldb_con, beautify=True)\n Return javascript content for given channel_id.\n Parameters\n ----------\n visit_id : int\n `visit_id` of the page visit where this URL was loaded\n channel_id : string\n `channel_id` to search content hash for\n sqlite_cur : sqlite3.Cursor\n cursor for crawl database\n ldb_con : plyvel.DB\n leveldb database storing javascript content\n beautify : boolean\n Control weather or not to beautify output\n\n get_channel_content_with_hash(visit_id, channel_id, sqlite_cur, ldb_con, beautify=True)\n Return javascript content for given channel_id.\n Parameters\n ----------\n visit_id : int\n `visit_id` of the page visit where this URL was loaded\n channel_id : string\n `channel_id` to search content hash for\n sqlite_cur : sqlite3.Cursor\n cursor for crawl database\n ldb_con : plyvel.DB\n leveldb database storing javascript content\n beautify : boolean\n Control weather or not to beautify output\n\n get_content(db, content_hash, compression='snappy', beautify=True)\n Returns decompressed content from javascript leveldb database\n\n get_ldb_content(ldb_addr, content_hash)\n\n get_leveldb(db_path, compression='snappy')\n Returns an open handle for a leveldb database\n with proper configuration settings.\n\n get_url_content(url, sqlite_cur, ldb_con, beautify=True, visit_id=None)\n Return javascript content for given url.\n Parameters\n ----------\n url : string\n url to search content hash for\n sqlite_cur : sqlite3.Cursor\n cursor for crawl database\n ldb_con : plyvel.DB\n leveldb database storing javascript content\n beautify : boolean\n Control weather or not to beautify output\n visit_id : int\n (optional) `visit_id` of the page visit where this URL was loaded\n\n get_url_content_with_hash(url, sqlite_cur, ldb_con, beautify=True, visit_id=None)\n Return javascript content for given url.\n Parameters\n ----------\n url : string\n url to search content hash for\n sqlite_cur : sqlite3.Cursor\n cursor for crawl database\n ldb_con : plyvel.DB\n leveldb database storing javascript content\n beautify : boolean\n Control weather or not to beautify output\n visit_id : int\n (optional) `visit_id` of the page visit where this URL was loaded\n\n list_placeholder(length, is_pg=False)\n Returns a (?,?,?,?...) string of the desired length\n\n optimize_db(cursor)\n Set options to make sqlite more efficient on a high memory machine\n\nFILE\n /Users/johncook/git/uiowa-irl-utils/irlutils/url/crawl/database_utils.py\n\n\nproblem in irlutils/url/crawl/domain_utils.py - ModuleNotFoundError: No module named 'publicsuffix2'\nproblem in irlutils/url/crawl/blocklist_utils.py - ModuleNotFoundError: No module named 'publicsuffix2'\nHelp on module analysis_utils:\n\nNAME\n analysis_utils\n\nFUNCTIONS\n add_col_bare_script_url(js_df)\n Add a col for script URL without scheme, www and query.\n\n add_col_set_of_script_ps1s_from_call_stack(js_df)\n map psls to call stack in scripts\n\n Args: \n js_df (pandas dataFrame): javascript table\n\n add_col_set_of_script_urls_from_call_stack(js_df)\n\n add_col_unix_timestamp(df)\n\n datetime_from_iso(iso_date)\n Convert from ISO.\n\n get_cookie(headers)\n A special case of parse headers that extracts only the cookie. \n\n Args: \n headers (list): http request headers\n\n Returns:\n\n item(string): name value pairs of a cookie\n\n get_func_and_script_url_from_initiator(initiator)\n Remove line number and column number from the initiator.\n\n get_host_from_url(url)\n\n get_initiator_from_call_stack(call_stack)\n Return the bottom element of the call stack.\n\n get_initiator_from_req_call_stack(req_call_stack)\n Return the bottom element of a request call stack.\n Request call stacks have an extra field (async_cause) at the end.\n\n get_requests_from_visits(con, visit_ids)\n Extact http requests matching visit_ids\n\n Args: \n con (sqlite3.connection): A connection to a sqlite data base\n visit_ids (list): A list of ids for from each web visit\n\n Returns:\n df(pandas DataFrame): A table containing visits that conincide with http requests\n\n get_responses_from_visits(con, visit_ids)\n Extact http requests matching visit_ids\n\n Args: \n con (sqlite3.connection): A connection to a sqlite data base\n visit_ids (list): A list of ids for from each web visit\n\n Returns:\n df(pandas DataFrame): A table containing visits that conincide with http responses\n\n get_script_url_from_initiator(initiator)\n Remove the scheme and query section of a URL.\n\n get_script_urls_from_call_stack_as_set(call_stack)\n Return the urls of the scripts involved in the call stack as a set.\n\n get_set_cookie(header)\n A special case of parse headers that returns 'Set-Cookies'\n\n Args: \n headers (string): http request headers\n\n Returns:\n item(string): name value pairs of Set Cookie field\n\n get_set_of_script_hosts_from_call_stack(call_stack)\n Return the urls of the scripts involved in the call stack.\n\n get_set_of_script_ps1s_from_call_stack(script_urls, du)\n extract a unique set of urls from a list of urls detected in scripts\n\n Args: \n script_urls (list): A list of urls extracted from javascripts\n du (list): A domain utilities instance\n\n Returns:\n psls(set): a set of tld+1(string)\n\n get_set_of_script_urls_from_call_stack(call_stack)\n Return the urls of the scripts involved in the call stack as a\n string.\n\n parse_headers(header)\n parses http header into kv pairs\n\n Args: \n headers (string): http request headers\n\n Returns:\n kv(dict): name value pairs of http headers\n\n strip_scheme_www_and_query(url)\n Remove the scheme and query section of a URL.\n\nDATA\n absolute_import = _Feature((2, 5, 0, 'alpha', 1), (3, 0, 0, 'alpha', 0...\n print_function = _Feature((2, 6, 0, 'alpha', 2), (3, 0, 0, 'alpha', 0)...\n\nFILE\n /Users/johncook/git/uiowa-irl-utils/irlutils/url/crawl/analysis_utils.py\n\n\nHelp on module chi2_proportions:\n\nNAME\n chi2_proportions\n\nFUNCTIONS\n chi2Proportions(count, nobs)\n A wrapper for the chi2 testing proportions based upon the chi-square test\n\n Args:\n count (:obj `list` of :obj`int` or a single `int`): the number of successes in nobs trials. If this is \n array_like, then the assumption is that this represents the number of successes \n for each independent sample \n\n\n nobs (:obj `list` of :obj`int` or a single `int`): The number of trials or observations, with the same length as count. \n\n Returns: \n chi2 (:obj `float`): The test statistic.\n\n p (:obj `float`): The p-value of the test\n\n dof (int) : Degrees of freedom\n\n expected (:obj `list`): list same shape as observed. The expected frequencies, based on the marginal sums of the table\n\n\n References: \n [1] \"scipy.stats.chi2_contingency\" https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.chi2_contingency.html\n [2] \"statsmodels.stats.proportion.proportions_chisquare\" https://www.statsmodels.org/dev/generated/statsmodels.stats.proportion.proportions_chisquare.html\n [3] (1, 2) \u201cContingency table\u201d, https://en.wikipedia.org/wiki/Contingency_table\n [4] (1, 2) \u201cPearson\u2019s chi-squared test\u201d, https://en.wikipedia.org/wiki/Pearson%27s_chi-squared_test\n [5] (1, 2) Cressie, N. and Read, T. R. C., \u201cMultinomial Goodness-of-Fit Tests\u201d, J. Royal Stat. Soc. Series B, Vol. 46, No. 3 (1984), pp. 440-464.\n\n Sample use: \n input: \n [10,10,20] - number of successes in trial \n [20,20,20] - number of trials \n chi2Proportions([10,10,20], [20,20,20])\n\n output: \n (2.7777777777777777,\n 0.24935220877729619,\n 2,\n array([[ 12., 12., 16.],\n [ 18., 18., 24.]]))\n\nFILE\n /Users/johncook/git/uiowa-irl-utils/irlutils/stats/tests/proportions/chi2_proportions.py\n\n\nHelp on module file_utils:\n\nNAME\n file_utils\n\nFUNCTIONS\n chmod(path, mode=777, recursive=False)\n\n chown(path, recursive=False, owner='user', group='user')\n\n compress_path(path)\n\n cp(s, d)\n\n file_ext(path, **kwargs)\n file extension finder\n kwargs:\n path (str): path or file name\n Returns:\n dotted file extension of a file\n Examples:\n\n >>> file_ext('/path/to_file/with_ext/test.py')\n .py\n\n gen_find_files(**kwargs)\n returns filenames that matches the given pattern under() a given dir\n\n\n Kwargs:\n file_pattern (str): a regex style string .\n root (str): top level folder to begin search from.\n\n Yields:\n path (generator): matching path str\n\n Examples:\n gen_find_files(file_pattern=\"*.sql\", root=\"/mnt/data/).\n\n >>> gen_find_files(file_pattern=\"*.sql\", root=\"/mnt/data/).__next__()\n /mnt/data/first_folder/last_folder/file.sqlite\n\n Reference:\n [1] http://www.dabeaz.com/generators/\n\n json_flatten(y)\n flattens nested structures within a json file\n\n\n Kwargs:\n\n data (dict): data from nested dictionary\n kv (dict): dictionary containing key,value pairs.\n\n returns:\n\n kv (dict): a dictionary object containing flattened structures\n\n Examples:\n data = {'k1':{'kv1':['v1', 'v2'], 'kv2': 'v3'}}\n\n >>> json_flatten(data)\n {'k1_kv1_0': 'v1', 'k1_kv1_1': 'v2', 'k1_kv2': 'v3'}\n\n mkdir(d, mode=511, exist_ok=True)\n\n mv(s, d)\n\n rm(d)\n\n rmsubtree(**kwargs)\n Clears all subfolders and files in location\n kwargs:\n location (str): target directory to remove\n Examples:\n\n >>> rmsubtree(location=\"/path/to/target_dir\").\n\n tar_packer(tar_dir, **kwargs)\n tars up directory\n\n\n Kwargs:\n\n dir (str): top level dir\n compression (bool): compression type. gz, xz supported now\n versbose (bool): True enables verbose\n\n returns:\n\n tar_path (generator): path to tar file\n\n Examples:\n\n tar_packer(dir=\"/path/to/top_level_dir\", [compression=gz|xz]\n\n >>>\n /tmp/FZ4245_Zb/top_level_dir.tar\n\n tar_unpacker(tar_path, **kwargs)\n unpacks tar to a tmp directory.\n\n\n Kwargs:\n\n tar_path (str): tar file path\n versbose (bool): True enables verbose\n\n returns:\n\n tmp_path (generator): extracted contents path\n\n Examples:\n\n tar_unpacker(tar_path=\"/mnt/data/tarfile.tar.gz\").\n\n >>> tar_unpacker(tar_path=\"/mnt/data/tarfile.tar.gz\").\n /tmp/FZ4245_Zb/\n\n touch(d)\n\nDATA\n DBG = \n\nAUTHOR\n johncook\n\nFILE\n /Users/johncook/git/uiowa-irl-utils/irlutils/file/file_utils.py\n\n\nHelp on module database_utils:\n\nNAME\n database_utils\n\nFUNCTIONS\n build_index(cursor, column, tables)\n Build an index on `column` for each table in `tables`\n\n drop_tables(sqlite_con, tables=[])\n\n fetchiter(cursor, arraysize=10000)\n Generator for cursor results\n\n get_channel_content(visit_id, channel_id, sqlite_cur, ldb_con, beautify=True)\n Return javascript content for given channel_id.\n Parameters\n ----------\n visit_id : int\n `visit_id` of the page visit where this URL was loaded\n channel_id : string\n `channel_id` to search content hash for\n sqlite_cur : sqlite3.Cursor\n cursor for crawl database\n ldb_con : plyvel.DB\n leveldb database storing javascript content\n beautify : boolean\n Control weather or not to beautify output\n\n get_channel_content_with_hash(visit_id, channel_id, sqlite_cur, ldb_con, beautify=True)\n Return javascript content for given channel_id.\n Parameters\n ----------\n visit_id : int\n `visit_id` of the page visit where this URL was loaded\n channel_id : string\n `channel_id` to search content hash for\n sqlite_cur : sqlite3.Cursor\n cursor for crawl database\n ldb_con : plyvel.DB\n leveldb database storing javascript content\n beautify : boolean\n Control weather or not to beautify output\n\n get_content(db, content_hash, compression='snappy', beautify=True)\n Returns decompressed content from javascript leveldb database\n\n get_ldb_content(ldb_addr, content_hash)\n\n get_leveldb(db_path, compression='snappy')\n Returns an open handle for a leveldb database\n with proper configuration settings.\n\n get_url_content(url, sqlite_cur, ldb_con, beautify=True, visit_id=None)\n Return javascript content for given url.\n Parameters\n ----------\n url : string\n url to search content hash for\n sqlite_cur : sqlite3.Cursor\n cursor for crawl database\n ldb_con : plyvel.DB\n leveldb database storing javascript content\n beautify : boolean\n Control weather or not to beautify output\n visit_id : int\n (optional) `visit_id` of the page visit where this URL was loaded\n\n get_url_content_with_hash(url, sqlite_cur, ldb_con, beautify=True, visit_id=None)\n Return javascript content for given url.\n Parameters\n ----------\n url : string\n url to search content hash for\n sqlite_cur : sqlite3.Cursor\n cursor for crawl database\n ldb_con : plyvel.DB\n leveldb database storing javascript content\n beautify : boolean\n Control weather or not to beautify output\n visit_id : int\n (optional) `visit_id` of the page visit where this URL was loaded\n\n list_placeholder(length, is_pg=False)\n Returns a (?,?,?,?...) string of the desired length\n\n optimize_db(cursor)\n Set options to make sqlite more efficient on a high memory machine\n\nFILE\n /Users/johncook/git/uiowa-irl-utils/irlutils/url/crawl/database_utils.py\n\n\nproblem in irlutils/url/crawl/domain_utils.py - ModuleNotFoundError: No module named 'publicsuffix2'\nproblem in irlutils/url/crawl/blocklist_utils.py - ModuleNotFoundError: No module named 'publicsuffix2'\nHelp on module analysis_utils:\n\nNAME\n analysis_utils\n\nFUNCTIONS\n add_col_bare_script_url(js_df)\n Add a col for script URL without scheme, www and query.\n\n add_col_set_of_script_ps1s_from_call_stack(js_df)\n map psls to call stack in scripts\n\n Args: \n js_df (pandas dataFrame): javascript table\n\n add_col_set_of_script_urls_from_call_stack(js_df)\n\n add_col_unix_timestamp(df)\n\n datetime_from_iso(iso_date)\n Convert from ISO.\n\n get_cookie(headers)\n A special case of parse headers that extracts only the cookie. \n\n Args: \n headers (list): http request headers\n\n Returns:\n\n item(string): name value pairs of a cookie\n\n get_func_and_script_url_from_initiator(initiator)\n Remove line number and column number from the initiator.\n\n get_host_from_url(url)\n\n get_initiator_from_call_stack(call_stack)\n Return the bottom element of the call stack.\n\n get_initiator_from_req_call_stack(req_call_stack)\n Return the bottom element of a request call stack.\n Request call stacks have an extra field (async_cause) at the end.\n\n get_requests_from_visits(con, visit_ids)\n Extact http requests matching visit_ids\n\n Args: \n con (sqlite3.connection): A connection to a sqlite data base\n visit_ids (list): A list of ids for from each web visit\n\n Returns:\n df(pandas DataFrame): A table containing visits that conincide with http requests\n\n get_responses_from_visits(con, visit_ids)\n Extact http requests matching visit_ids\n\n Args: \n con (sqlite3.connection): A connection to a sqlite data base\n visit_ids (list): A list of ids for from each web visit\n\n Returns:\n df(pandas DataFrame): A table containing visits that conincide with http responses\n\n get_script_url_from_initiator(initiator)\n Remove the scheme and query section of a URL.\n\n get_script_urls_from_call_stack_as_set(call_stack)\n Return the urls of the scripts involved in the call stack as a set.\n\n get_set_cookie(header)\n A special case of parse headers that returns 'Set-Cookies'\n\n Args: \n headers (string): http request headers\n\n Returns:\n item(string): name value pairs of Set Cookie field\n\n get_set_of_script_hosts_from_call_stack(call_stack)\n Return the urls of the scripts involved in the call stack.\n\n get_set_of_script_ps1s_from_call_stack(script_urls, du)\n extract a unique set of urls from a list of urls detected in scripts\n\n Args: \n script_urls (list): A list of urls extracted from javascripts\n du (list): A domain utilities instance\n\n Returns:\n psls(set): a set of tld+1(string)\n\n get_set_of_script_urls_from_call_stack(call_stack)\n Return the urls of the scripts involved in the call stack as a\n string.\n\n parse_headers(header)\n parses http header into kv pairs\n\n Args: \n headers (string): http request headers\n\n Returns:\n kv(dict): name value pairs of http headers\n\n strip_scheme_www_and_query(url)\n Remove the scheme and query section of a URL.\n\nDATA\n absolute_import = _Feature((2, 5, 0, 'alpha', 1), (3, 0, 0, 'alpha', 0...\n print_function = _Feature((2, 6, 0, 'alpha', 2), (3, 0, 0, 'alpha', 0)...\n\nFILE\n /Users/johncook/git/uiowa-irl-utils/irlutils/url/crawl/analysis_utils.py\n\n\nHelp on module chi2_proportions:\n\nNAME\n chi2_proportions\n\nFUNCTIONS\n chi2Proportions(count, nobs)\n A wrapper for the chi2 testing proportions based upon the chi-square test\n\n Args:\n count (:obj `list` of :obj`int` or a single `int`): the number of successes in nobs trials. If this is \n array_like, then the assumption is that this represents the number of successes \n for each independent sample \n\n\n nobs (:obj `list` of :obj`int` or a single `int`): The number of trials or observations, with the same length as count. \n\n Returns: \n chi2 (:obj `float`): The test statistic.\n\n p (:obj `float`): The p-value of the test\n\n dof (int) : Degrees of freedom\n\n expected (:obj `list`): list same shape as observed. The expected frequencies, based on the marginal sums of the table\n\n\n References: \n [1] \"scipy.stats.chi2_contingency\" https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.chi2_contingency.html\n [2] \"statsmodels.stats.proportion.proportions_chisquare\" https://www.statsmodels.org/dev/generated/statsmodels.stats.proportion.proportions_chisquare.html\n [3] (1, 2) \u201cContingency table\u201d, https://en.wikipedia.org/wiki/Contingency_table\n [4] (1, 2) \u201cPearson\u2019s chi-squared test\u201d, https://en.wikipedia.org/wiki/Pearson%27s_chi-squared_test\n [5] (1, 2) Cressie, N. and Read, T. R. C., \u201cMultinomial Goodness-of-Fit Tests\u201d, J. Royal Stat. Soc. Series B, Vol. 46, No. 3 (1984), pp. 440-464.\n\n Sample use: \n input: \n [10,10,20] - number of successes in trial \n [20,20,20] - number of trials \n chi2Proportions([10,10,20], [20,20,20])\n\n output: \n (2.7777777777777777,\n 0.24935220877729619,\n 2,\n array([[ 12., 12., 16.],\n [ 18., 18., 24.]]))\n\nFILE\n /Users/johncook/git/uiowa-irl-utils/irlutils/stats/tests/proportions/chi2_proportions.py\n\n\nHelp on module file_utils:\n\nNAME\n file_utils\n\nFUNCTIONS\n chmod(path, mode=777, recursive=False)\n\n chown(path, recursive=False, owner='user', group='user')\n\n compress_path(path)\n\n cp(s, d)\n\n file_ext(path, **kwargs)\n file extension finder\n kwargs:\n path (str): path or file name\n Returns:\n dotted file extension of a file\n Examples:\n\n >>> file_ext('/path/to_file/with_ext/test.py')\n .py\n\n gen_find_files(**kwargs)\n returns filenames that matches the given pattern under() a given dir\n\n\n Kwargs:\n file_pattern (str): a regex style string .\n root (str): top level folder to begin search from.\n\n Yields:\n path (generator): matching path str\n\n Examples:\n gen_find_files(file_pattern=\"*.sql\", root=\"/mnt/data/).\n\n >>> gen_find_files(file_pattern=\"*.sql\", root=\"/mnt/data/).__next__()\n /mnt/data/first_folder/last_folder/file.sqlite\n\n Reference:\n [1] http://www.dabeaz.com/generators/\n\n json_flatten(y)\n flattens nested structures within a json file\n\n\n Kwargs:\n\n data (dict): data from nested dictionary\n kv (dict): dictionary containing key,value pairs.\n\n returns:\n\n kv (dict): a dictionary object containing flattened structures\n\n Examples:\n data = {'k1':{'kv1':['v1', 'v2'], 'kv2': 'v3'}}\n\n >>> json_flatten(data)\n {'k1_kv1_0': 'v1', 'k1_kv1_1': 'v2', 'k1_kv2': 'v3'}\n\n mkdir(d, mode=511, exist_ok=True)\n\n mv(s, d)\n\n rm(d)\n\n rmsubtree(**kwargs)\n Clears all subfolders and files in location\n kwargs:\n location (str): target directory to remove\n Examples:\n\n >>> rmsubtree(location=\"/path/to/target_dir\").\n\n tar_packer(tar_dir, **kwargs)\n tars up directory\n\n\n Kwargs:\n\n dir (str): top level dir\n compression (bool): compression type. gz, xz supported now\n versbose (bool): True enables verbose\n\n returns:\n\n tar_path (generator): path to tar file\n\n Examples:\n\n tar_packer(dir=\"/path/to/top_level_dir\", [compression=gz|xz]\n\n >>>\n /tmp/FZ4245_Zb/top_level_dir.tar\n\n tar_unpacker(tar_path, **kwargs)\n unpacks tar to a tmp directory.\n\n\n Kwargs:\n\n tar_path (str): tar file path\n versbose (bool): True enables verbose\n\n returns:\n\n tmp_path (generator): extracted contents path\n\n Examples:\n\n tar_unpacker(tar_path=\"/mnt/data/tarfile.tar.gz\").\n\n >>> tar_unpacker(tar_path=\"/mnt/data/tarfile.tar.gz\").\n /tmp/FZ4245_Zb/\n\n touch(d)\n\nDATA\n DBG = \n\nAUTHOR\n johncook\n\nFILE\n /Users/johncook/git/uiowa-irl-utils/irlutils/file/file_utils.py\n\n\nHelp on module database_utils:\n\nNAME\n database_utils\n\nFUNCTIONS\n build_index(cursor, column, tables)\n Build an index on `column` for each table in `tables`\n\n drop_tables(sqlite_con, tables=[])\n\n fetchiter(cursor, arraysize=10000)\n Generator for cursor results\n\n get_channel_content(visit_id, channel_id, sqlite_cur, ldb_con, beautify=True)\n Return javascript content for given channel_id.\n Parameters\n ----------\n visit_id : int\n `visit_id` of the page visit where this URL was loaded\n channel_id : string\n `channel_id` to search content hash for\n sqlite_cur : sqlite3.Cursor\n cursor for crawl database\n ldb_con : plyvel.DB\n leveldb database storing javascript content\n beautify : boolean\n Control weather or not to beautify output\n\n get_channel_content_with_hash(visit_id, channel_id, sqlite_cur, ldb_con, beautify=True)\n Return javascript content for given channel_id.\n Parameters\n ----------\n visit_id : int\n `visit_id` of the page visit where this URL was loaded\n channel_id : string\n `channel_id` to search content hash for\n sqlite_cur : sqlite3.Cursor\n cursor for crawl database\n ldb_con : plyvel.DB\n leveldb database storing javascript content\n beautify : boolean\n Control weather or not to beautify output\n\n get_content(db, content_hash, compression='snappy', beautify=True)\n Returns decompressed content from javascript leveldb database\n\n get_ldb_content(ldb_addr, content_hash)\n\n get_leveldb(db_path, compression='snappy')\n Returns an open handle for a leveldb database\n with proper configuration settings.\n\n get_url_content(url, sqlite_cur, ldb_con, beautify=True, visit_id=None)\n Return javascript content for given url.\n Parameters\n ----------\n url : string\n url to search content hash for\n sqlite_cur : sqlite3.Cursor\n cursor for crawl database\n ldb_con : plyvel.DB\n leveldb database storing javascript content\n beautify : boolean\n Control weather or not to beautify output\n visit_id : int\n (optional) `visit_id` of the page visit where this URL was loaded\n\n get_url_content_with_hash(url, sqlite_cur, ldb_con, beautify=True, visit_id=None)\n Return javascript content for given url.\n Parameters\n ----------\n url : string\n url to search content hash for\n sqlite_cur : sqlite3.Cursor\n cursor for crawl database\n ldb_con : plyvel.DB\n leveldb database storing javascript content\n beautify : boolean\n Control weather or not to beautify output\n visit_id : int\n (optional) `visit_id` of the page visit where this URL was loaded\n\n list_placeholder(length, is_pg=False)\n Returns a (?,?,?,?...) string of the desired length\n\n optimize_db(cursor)\n Set options to make sqlite more efficient on a high memory machine\n\nFILE\n /Users/johncook/git/uiowa-irl-utils/irlutils/url/crawl/database_utils.py\n\n\nproblem in irlutils/url/crawl/domain_utils.py - ModuleNotFoundError: No module named 'publicsuffix2'\nproblem in irlutils/url/crawl/blocklist_utils.py - ModuleNotFoundError: No module named 'publicsuffix2'\nHelp on module analysis_utils:\n\nNAME\n analysis_utils\n\nFUNCTIONS\n add_col_bare_script_url(js_df)\n Add a col for script URL without scheme, www and query.\n\n add_col_set_of_script_ps1s_from_call_stack(js_df)\n map psls to call stack in scripts\n\n Args: \n js_df (pandas dataFrame): javascript table\n\n add_col_set_of_script_urls_from_call_stack(js_df)\n\n add_col_unix_timestamp(df)\n\n datetime_from_iso(iso_date)\n Convert from ISO.\n\n get_cookie(headers)\n A special case of parse headers that extracts only the cookie. \n\n Args: \n headers (list): http request headers\n\n Returns:\n\n item(string): name value pairs of a cookie\n\n get_func_and_script_url_from_initiator(initiator)\n Remove line number and column number from the initiator.\n\n get_host_from_url(url)\n\n get_initiator_from_call_stack(call_stack)\n Return the bottom element of the call stack.\n\n get_initiator_from_req_call_stack(req_call_stack)\n Return the bottom element of a request call stack.\n Request call stacks have an extra field (async_cause) at the end.\n\n get_requests_from_visits(con, visit_ids)\n Extact http requests matching visit_ids\n\n Args: \n con (sqlite3.connection): A connection to a sqlite data base\n visit_ids (list): A list of ids for from each web visit\n\n Returns:\n df(pandas DataFrame): A table containing visits that conincide with http requests\n\n get_responses_from_visits(con, visit_ids)\n Extact http requests matching visit_ids\n\n Args: \n con (sqlite3.connection): A connection to a sqlite data base\n visit_ids (list): A list of ids for from each web visit\n\n Returns:\n df(pandas DataFrame): A table containing visits that conincide with http responses\n\n get_script_url_from_initiator(initiator)\n Remove the scheme and query section of a URL.\n\n get_script_urls_from_call_stack_as_set(call_stack)\n Return the urls of the scripts involved in the call stack as a set.\n\n get_set_cookie(header)\n A special case of parse headers that returns 'Set-Cookies'\n\n Args: \n headers (string): http request headers\n\n Returns:\n item(string): name value pairs of Set Cookie field\n\n get_set_of_script_hosts_from_call_stack(call_stack)\n Return the urls of the scripts involved in the call stack.\n\n get_set_of_script_ps1s_from_call_stack(script_urls, du)\n extract a unique set of urls from a list of urls detected in scripts\n\n Args: \n script_urls (list): A list of urls extracted from javascripts\n du (list): A domain utilities instance\n\n Returns:\n psls(set): a set of tld+1(string)\n\n get_set_of_script_urls_from_call_stack(call_stack)\n Return the urls of the scripts involved in the call stack as a\n string.\n\n parse_headers(header)\n parses http header into kv pairs\n\n Args: \n headers (string): http request headers\n\n Returns:\n kv(dict): name value pairs of http headers\n\n strip_scheme_www_and_query(url)\n Remove the scheme and query section of a URL.\n\nDATA\n absolute_import = _Feature((2, 5, 0, 'alpha', 1), (3, 0, 0, 'alpha', 0...\n print_function = _Feature((2, 6, 0, 'alpha', 2), (3, 0, 0, 'alpha', 0)...\n\nFILE\n /Users/johncook/git/uiowa-irl-utils/irlutils/url/crawl/analysis_utils.py\n\n\nHelp on module chi2_proportions:\n\nNAME\n chi2_proportions\n\nFUNCTIONS\n chi2Proportions(count, nobs)\n A wrapper for the chi2 testing proportions based upon the chi-square test\n\n Args:\n count (:obj `list` of :obj`int` or a single `int`): the number of successes in nobs trials. If this is \n array_like, then the assumption is that this represents the number of successes \n for each independent sample \n\n\n nobs (:obj `list` of :obj`int` or a single `int`): The number of trials or observations, with the same length as count. \n\n Returns: \n chi2 (:obj `float`): The test statistic.\n\n p (:obj `float`): The p-value of the test\n\n dof (int) : Degrees of freedom\n\n expected (:obj `list`): list same shape as observed. The expected frequencies, based on the marginal sums of the table\n\n\n References: \n [1] \"scipy.stats.chi2_contingency\" https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.chi2_contingency.html\n [2] \"statsmodels.stats.proportion.proportions_chisquare\" https://www.statsmodels.org/dev/generated/statsmodels.stats.proportion.proportions_chisquare.html\n [3] (1, 2) \u201cContingency table\u201d, https://en.wikipedia.org/wiki/Contingency_table\n [4] (1, 2) \u201cPearson\u2019s chi-squared test\u201d, https://en.wikipedia.org/wiki/Pearson%27s_chi-squared_test\n [5] (1, 2) Cressie, N. and Read, T. R. C., \u201cMultinomial Goodness-of-Fit Tests\u201d, J. Royal Stat. Soc. Series B, Vol. 46, No. 3 (1984), pp. 440-464.\n\n Sample use: \n input: \n [10,10,20] - number of successes in trial \n [20,20,20] - number of trials \n chi2Proportions([10,10,20], [20,20,20])\n\n output: \n (2.7777777777777777,\n 0.24935220877729619,\n 2,\n array([[ 12., 12., 16.],\n [ 18., 18., 24.]]))\n\nFILE\n /Users/johncook/git/uiowa-irl-utils/irlutils/stats/tests/proportions/chi2_proportions.py\n\n\nHelp on module file_utils:\n\nNAME\n file_utils\n\nFUNCTIONS\n chmod(path, mode=777, recursive=False)\n\n chown(path, recursive=False, owner='user', group='user')\n\n compress_path(path)\n\n cp(s, d)\n\n file_ext(path, **kwargs)\n file extension finder\n kwargs:\n path (str): path or file name\n Returns:\n dotted file extension of a file\n Examples:\n\n >>> file_ext('/path/to_file/with_ext/test.py')\n .py\n\n gen_find_files(**kwargs)\n returns filenames that matches the given pattern under() a given dir\n\n\n Kwargs:\n file_pattern (str): a regex style string .\n root (str): top level folder to begin search from.\n\n Yields:\n path (generator): matching path str\n\n Examples:\n gen_find_files(file_pattern=\"*.sql\", root=\"/mnt/data/).\n\n >>> gen_find_files(file_pattern=\"*.sql\", root=\"/mnt/data/).__next__()\n /mnt/data/first_folder/last_folder/file.sqlite\n\n Reference:\n [1] http://www.dabeaz.com/generators/\n\n json_flatten(y)\n flattens nested structures within a json file\n\n\n Kwargs:\n\n data (dict): data from nested dictionary\n kv (dict): dictionary containing key,value pairs.\n\n returns:\n\n kv (dict): a dictionary object containing flattened structures\n\n Examples:\n data = {'k1':{'kv1':['v1', 'v2'], 'kv2': 'v3'}}\n\n >>> json_flatten(data)\n {'k1_kv1_0': 'v1', 'k1_kv1_1': 'v2', 'k1_kv2': 'v3'}\n\n mkdir(d, mode=511, exist_ok=True)\n\n mv(s, d)\n\n rm(d)\n\n rmsubtree(**kwargs)\n Clears all subfolders and files in location\n kwargs:\n location (str): target directory to remove\n Examples:\n\n >>> rmsubtree(location=\"/path/to/target_dir\").\n\n tar_packer(tar_dir, **kwargs)\n tars up directory\n\n\n Kwargs:\n\n dir (str): top level dir\n compression (bool): compression type. gz, xz supported now\n versbose (bool): True enables verbose\n\n returns:\n\n tar_path (generator): path to tar file\n\n Examples:\n\n tar_packer(dir=\"/path/to/top_level_dir\", [compression=gz|xz]\n\n >>>\n /tmp/FZ4245_Zb/top_level_dir.tar\n\n tar_unpacker(tar_path, **kwargs)\n unpacks tar to a tmp directory.\n\n\n Kwargs:\n\n tar_path (str): tar file path\n versbose (bool): True enables verbose\n\n returns:\n\n tmp_path (generator): extracted contents path\n\n Examples:\n\n tar_unpacker(tar_path=\"/mnt/data/tarfile.tar.gz\").\n\n >>> tar_unpacker(tar_path=\"/mnt/data/tarfile.tar.gz\").\n /tmp/FZ4245_Zb/\n\n touch(d)\n\nAUTHOR\n johncook\n\nFILE\n /Users/johncook/git/uiowa-irl-utils/irlutils/file/file_utils.py\n\n\nHelp on module database_utils:\n\nNAME\n database_utils\n\nFUNCTIONS\n build_index(cursor, column, tables)\n Build an index on `column` for each table in `tables`\n\n drop_tables(sqlite_con, tables=[])\n\n fetchiter(cursor, arraysize=10000)\n Generator for cursor results\n\n get_channel_content(visit_id, channel_id, sqlite_cur, ldb_con, beautify=True)\n Return javascript content for given channel_id.\n Parameters\n ----------\n visit_id : int\n `visit_id` of the page visit where this URL was loaded\n channel_id : string\n `channel_id` to search content hash for\n sqlite_cur : sqlite3.Cursor\n cursor for crawl database\n ldb_con : plyvel.DB\n leveldb database storing javascript content\n beautify : boolean\n Control weather or not to beautify output\n\n get_channel_content_with_hash(visit_id, channel_id, sqlite_cur, ldb_con, beautify=True)\n Return javascript content for given channel_id.\n Parameters\n ----------\n visit_id : int\n `visit_id` of the page visit where this URL was loaded\n channel_id : string\n `channel_id` to search content hash for\n sqlite_cur : sqlite3.Cursor\n cursor for crawl database\n ldb_con : plyvel.DB\n leveldb database storing javascript content\n beautify : boolean\n Control weather or not to beautify output\n\n get_content(db, content_hash, compression='snappy', beautify=True)\n Returns decompressed content from javascript leveldb database\n\n get_ldb_content(ldb_addr, content_hash)\n\n get_leveldb(db_path, compression='snappy')\n Returns an open handle for a leveldb database\n with proper configuration settings.\n\n get_url_content(url, sqlite_cur, ldb_con, beautify=True, visit_id=None)\n Return javascript content for given url.\n Parameters\n ----------\n url : string\n url to search content hash for\n sqlite_cur : sqlite3.Cursor\n cursor for crawl database\n ldb_con : plyvel.DB\n leveldb database storing javascript content\n beautify : boolean\n Control weather or not to beautify output\n visit_id : int\n (optional) `visit_id` of the page visit where this URL was loaded\n\n get_url_content_with_hash(url, sqlite_cur, ldb_con, beautify=True, visit_id=None)\n Return javascript content for given url.\n Parameters\n ----------\n url : string\n url to search content hash for\n sqlite_cur : sqlite3.Cursor\n cursor for crawl database\n ldb_con : plyvel.DB\n leveldb database storing javascript content\n beautify : boolean\n Control weather or not to beautify output\n visit_id : int\n (optional) `visit_id` of the page visit where this URL was loaded\n\n list_placeholder(length, is_pg=False)\n Returns a (?,?,?,?...) string of the desired length\n\n optimize_db(cursor)\n Set options to make sqlite more efficient on a high memory machine\n\nFILE\n /Users/johncook/git/uiowa-irl-utils/irlutils/url/crawl/database_utils.py\n\n\nproblem in irlutils/url/crawl/domain_utils.py - ModuleNotFoundError: No module named 'publicsuffix2'\nproblem in irlutils/url/crawl/blocklist_utils.py - ModuleNotFoundError: No module named 'publicsuffix2'\nHelp on module analysis_utils:\n\nNAME\n analysis_utils\n\nFUNCTIONS\n add_col_bare_script_url(js_df)\n Add a col for script URL without scheme, www and query.\n\n add_col_set_of_script_ps1s_from_call_stack(js_df)\n map psls to call stack in scripts\n\n Args: \n js_df (pandas dataFrame): javascript table\n\n add_col_set_of_script_urls_from_call_stack(js_df)\n\n add_col_unix_timestamp(df)\n\n datetime_from_iso(iso_date)\n Convert from ISO.\n\n get_cookie(headers)\n A special case of parse headers that extracts only the cookie. \n\n Args: \n headers (list): http request headers\n\n Returns:\n\n item(string): name value pairs of a cookie\n\n get_func_and_script_url_from_initiator(initiator)\n Remove line number and column number from the initiator.\n\n get_host_from_url(url)\n\n get_initiator_from_call_stack(call_stack)\n Return the bottom element of the call stack.\n\n get_initiator_from_req_call_stack(req_call_stack)\n Return the bottom element of a request call stack.\n Request call stacks have an extra field (async_cause) at the end.\n\n get_requests_from_visits(con, visit_ids)\n Extact http requests matching visit_ids\n\n Args: \n con (sqlite3.connection): A connection to a sqlite data base\n visit_ids (list): A list of ids for from each web visit\n\n Returns:\n df(pandas DataFrame): A table containing visits that conincide with http requests\n\n get_responses_from_visits(con, visit_ids)\n Extact http requests matching visit_ids\n\n Args: \n con (sqlite3.connection): A connection to a sqlite data base\n visit_ids (list): A list of ids for from each web visit\n\n Returns:\n df(pandas DataFrame): A table containing visits that conincide with http responses\n\n get_script_url_from_initiator(initiator)\n Remove the scheme and query section of a URL.\n\n get_script_urls_from_call_stack_as_set(call_stack)\n Return the urls of the scripts involved in the call stack as a set.\n\n get_set_cookie(header)\n A special case of parse headers that returns 'Set-Cookies'\n\n Args: \n headers (string): http request headers\n\n Returns:\n item(string): name value pairs of Set Cookie field\n\n get_set_of_script_hosts_from_call_stack(call_stack)\n Return the urls of the scripts involved in the call stack.\n\n get_set_of_script_ps1s_from_call_stack(script_urls, du)\n extract a unique set of urls from a list of urls detected in scripts\n\n Args: \n script_urls (list): A list of urls extracted from javascripts\n du (list): A domain utilities instance\n\n Returns:\n psls(set): a set of tld+1(string)\n\n get_set_of_script_urls_from_call_stack(call_stack)\n Return the urls of the scripts involved in the call stack as a\n string.\n\n parse_headers(header)\n parses http header into kv pairs\n\n Args: \n headers (string): http request headers\n\n Returns:\n kv(dict): name value pairs of http headers\n\n strip_scheme_www_and_query(url)\n Remove the scheme and query section of a URL.\n\nDATA\n absolute_import = _Feature((2, 5, 0, 'alpha', 1), (3, 0, 0, 'alpha', 0...\n print_function = _Feature((2, 6, 0, 'alpha', 2), (3, 0, 0, 'alpha', 0)...\n\nFILE\n /Users/johncook/git/uiowa-irl-utils/irlutils/url/crawl/analysis_utils.py\n\n\nHelp on module chi2_proportions:\n\nNAME\n chi2_proportions\n\nFUNCTIONS\n chi2Proportions(count, nobs)\n A wrapper for the chi2 testing proportions based upon the chi-square test\n\n Args:\n count (:obj `list` of :obj`int` or a single `int`): the number of successes in nobs trials. If this is \n array_like, then the assumption is that this represents the number of successes \n for each independent sample \n\n\n nobs (:obj `list` of :obj`int` or a single `int`): The number of trials or observations, with the same length as count. \n\n Returns: \n chi2 (:obj `float`): The test statistic.\n\n p (:obj `float`): The p-value of the test\n\n dof (int) : Degrees of freedom\n\n expected (:obj `list`): list same shape as observed. The expected frequencies, based on the marginal sums of the table\n\n\n References: \n [1] \"scipy.stats.chi2_contingency\" https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.chi2_contingency.html\n [2] \"statsmodels.stats.proportion.proportions_chisquare\" https://www.statsmodels.org/dev/generated/statsmodels.stats.proportion.proportions_chisquare.html\n [3] (1, 2) \u201cContingency table\u201d, https://en.wikipedia.org/wiki/Contingency_table\n [4] (1, 2) \u201cPearson\u2019s chi-squared test\u201d, https://en.wikipedia.org/wiki/Pearson%27s_chi-squared_test\n [5] (1, 2) Cressie, N. and Read, T. R. C., \u201cMultinomial Goodness-of-Fit Tests\u201d, J. Royal Stat. Soc. Series B, Vol. 46, No. 3 (1984), pp. 440-464.\n\n Sample use: \n input: \n [10,10,20] - number of successes in trial \n [20,20,20] - number of trials \n chi2Proportions([10,10,20], [20,20,20])\n\n output: \n (2.7777777777777777,\n 0.24935220877729619,\n 2,\n array([[ 12., 12., 16.],\n [ 18., 18., 24.]]))\n\nFILE\n /Users/johncook/git/uiowa-irl-utils/irlutils/stats/tests/proportions/chi2_proportions.py\n\n\nHelp on module chi2_proportions:\n\nNAME\n chi2_proportions\n\nFUNCTIONS\n chi2Proportions(count, nobs)\n A wrapper for the chi2 testing proportions based upon the chi-square test\n\n Args:\n count (:obj `list` of :obj`int` or a single `int`): the number of successes in nobs trials. If this is \n array_like, then the assumption is that this represents the number of successes \n for each independent sample \n\n\n nobs (:obj `list` of :obj`int` or a single `int`): The number of trials or observations, with the same length as count. \n\n Returns: \n chi2 (:obj `float`): The test statistic.\n\n p (:obj `float`): The p-value of the test\n\n dof (int) : Degrees of freedom\n\n expected (:obj `list`): list same shape as observed. The expected frequencies, based on the marginal sums of the table\n\n\n References: \n [1] \"scipy.stats.chi2_contingency\" https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.chi2_contingency.html\n [2] \"statsmodels.stats.proportion.proportions_chisquare\" https://www.statsmodels.org/dev/generated/statsmodels.stats.proportion.proportions_chisquare.html\n [3] (1, 2) \u201cContingency table\u201d, https://en.wikipedia.org/wiki/Contingency_table\n [4] (1, 2) \u201cPearson\u2019s chi-squared test\u201d, https://en.wikipedia.org/wiki/Pearson%27s_chi-squared_test\n [5] (1, 2) Cressie, N. and Read, T. R. C., \u201cMultinomial Goodness-of-Fit Tests\u201d, J. Royal Stat. Soc. Series B, Vol. 46, No. 3 (1984), pp. 440-464.\n\n Sample use: \n input: \n [10,10,20] - number of successes in trial \n [20,20,20] - number of trials \n chi2Proportions([10,10,20], [20,20,20])\n\n output: \n (2.7777777777777777,\n 0.24935220877729619,\n 2,\n array([[ 12., 12., 16.],\n [ 18., 18., 24.]]))\n\nFILE\n /home/john/git/uiowa-irl/uiowa-irl-utils/irlutils/irlutils/stats/tests/proportions/chi2_proportions.py\n\n\nHelp on module file_utils:\n\nNAME\n file_utils\n\nFUNCTIONS\n chmod(path, mode=777, recursive=False)\n\n chown(path, recursive=False, owner='user', group='user')\n\n compress_path(path)\n\n cp(s, d)\n\n file_ext(path, **kwargs)\n file extension finder\n kwargs:\n path (str): path or file name\n Returns:\n dotted file extension of a file\n Examples:\n\n >>> file_ext('/path/to_file/with_ext/test.py')\n .py\n\n gen_find_files(**kwargs)\n returns filenames that matches the given pattern under() a given dir\n\n\n Kwargs:\n file_pattern (str): a regex style string .\n root (str): top level folder to begin search from.\n\n Yields:\n path (generator): matching path str\n\n Examples:\n gen_find_files(file_pattern=\"*.sql\", root=\"/mnt/data/).\n\n >>> gen_find_files(file_pattern=\"*.sql\", root=\"/mnt/data/).__next__()\n /mnt/data/first_folder/last_folder/file.sqlite\n\n Reference:\n [1] http://www.dabeaz.com/generators/\n\n json_flatten(y)\n flattens nested structures within a json file\n\n\n Kwargs:\n\n data (dict): data from nested dictionary\n kv (dict): dictionary containing key,value pairs.\n\n returns:\n\n kv (dict): a dictionary object containing flattened structures\n\n Examples:\n data = {'k1':{'kv1':['v1', 'v2'], 'kv2': 'v3'}}\n\n >>> json_flatten(data)\n {'k1_kv1_0': 'v1', 'k1_kv1_1': 'v2', 'k1_kv2': 'v3'}\n\n mkdir(d, mode=511, exist_ok=True)\n\n mv(s, d)\n\n rm(d)\n\n rmsubtree(**kwargs)\n Clears all subfolders and files in location\n kwargs:\n location (str): target directory to remove\n Examples:\n\n >>> rmsubtree(location=\"/path/to/target_dir\").\n\n tar_packer(tar_dir, **kwargs)\n tars up directory\n\n\n Kwargs:\n\n dir (str): top level dir\n compression (bool): compression type. gz, xz supported now\n versbose (bool): True enables verbose\n\n returns:\n\n tar_path (generator): path to tar file\n\n Examples:\n\n tar_packer(dir=\"/path/to/top_level_dir\", [compression=gz|xz]\n\n >>>\n /tmp/FZ4245_Zb/top_level_dir.tar\n\n tar_unpacker(tar_path, **kwargs)\n unpacks tar to a tmp directory.\n\n\n Kwargs:\n\n tar_path (str): tar file path\n versbose (bool): True enables verbose\n\n returns:\n\n tmp_path (generator): extracted contents path\n\n Examples:\n\n tar_unpacker(tar_path=\"/mnt/data/tarfile.tar.gz\").\n\n >>> tar_unpacker(tar_path=\"/mnt/data/tarfile.tar.gz\").\n /tmp/FZ4245_Zb/\n\n touch(d)\n\nAUTHOR\n johncook\n\nFILE\n /home/john/git/uiowa-irl/uiowa-irl-utils/irlutils/irlutils/file/file_utils.py\n\n\nHelp on module analysis_utils:\n\nNAME\n analysis_utils\n\nFUNCTIONS\n add_col_bare_script_url(js_df)\n Add a col for script URL without scheme, www and query.\n\n add_col_set_of_script_ps1s_from_call_stack(js_df)\n map psls to call stack in scripts\n\n Args: \n js_df (pandas dataFrame): javascript table\n\n add_col_set_of_script_urls_from_call_stack(js_df)\n\n add_col_unix_timestamp(df)\n\n datetime_from_iso(iso_date)\n Convert from ISO.\n\n get_cookie(headers)\n A special case of parse headers that extracts only the cookie. \n\n Args: \n headers (list): http request headers\n\n Returns:\n\n item(string): name value pairs of a cookie\n\n get_func_and_script_url_from_initiator(initiator)\n Remove line number and column number from the initiator.\n\n get_host_from_url(url)\n\n get_initiator_from_call_stack(call_stack)\n Return the bottom element of the call stack.\n\n get_initiator_from_req_call_stack(req_call_stack)\n Return the bottom element of a request call stack.\n Request call stacks have an extra field (async_cause) at the end.\n\n get_requests_from_visits(con, visit_ids)\n Extact http requests matching visit_ids\n\n Args: \n con (sqlite3.connection): A connection to a sqlite data base\n visit_ids (list): A list of ids for from each web visit\n\n Returns:\n df(pandas DataFrame): A table containing visits that conincide with http requests\n\n get_responses_from_visits(con, visit_ids)\n Extact http requests matching visit_ids\n\n Args: \n con (sqlite3.connection): A connection to a sqlite data base\n visit_ids (list): A list of ids for from each web visit\n\n Returns:\n df(pandas DataFrame): A table containing visits that conincide with http responses\n\n get_script_url_from_initiator(initiator)\n Remove the scheme and query section of a URL.\n\n get_script_urls_from_call_stack_as_set(call_stack)\n Return the urls of the scripts involved in the call stack as a set.\n\n get_set_cookie(header)\n A special case of parse headers that returns 'Set-Cookies'\n\n Args: \n headers (string): http request headers\n\n Returns:\n item(string): name value pairs of Set Cookie field\n\n get_set_of_script_hosts_from_call_stack(call_stack)\n Return the urls of the scripts involved in the call stack.\n\n get_set_of_script_ps1s_from_call_stack(script_urls, du)\n extract a unique set of urls from a list of urls detected in scripts\n\n Args: \n script_urls (list): A list of urls extracted from javascripts\n du (list): A domain utilities instance\n\n Returns:\n psls(set): a set of tld+1(string)\n\n get_set_of_script_urls_from_call_stack(call_stack)\n Return the urls of the scripts involved in the call stack as a\n string.\n\n parse_headers(header)\n parses http header into kv pairs\n\n Args: \n headers (string): http request headers\n\n Returns:\n kv(dict): name value pairs of http headers\n\n strip_scheme_www_and_query(url)\n Remove the scheme and query section of a URL.\n\nDATA\n absolute_import = _Feature((2, 5, 0, 'alpha', 1), (3, 0, 0, 'alpha', 0...\n print_function = _Feature((2, 6, 0, 'alpha', 2), (3, 0, 0, 'alpha', 0)...\n\nFILE\n /home/john/git/uiowa-irl/uiowa-irl-utils/irlutils/irlutils/url/crawl/analysis_utils.py\n\n\nHelp on module database_utils:\n\nNAME\n database_utils\n\nFUNCTIONS\n build_index(cursor, column, tables)\n Build an index on `column` for each table in `tables`\n\n drop_tables(sqlite_con, tables=[])\n\n fetchiter(cursor, arraysize=10000)\n Generator for cursor results\n\n get_channel_content(visit_id, channel_id, sqlite_cur, ldb_con, beautify=True)\n Return javascript content for given channel_id.\n Parameters\n ----------\n visit_id : int\n `visit_id` of the page visit where this URL was loaded\n channel_id : string\n `channel_id` to search content hash for\n sqlite_cur : sqlite3.Cursor\n cursor for crawl database\n ldb_con : plyvel.DB\n leveldb database storing javascript content\n beautify : boolean\n Control weather or not to beautify output\n\n get_channel_content_with_hash(visit_id, channel_id, sqlite_cur, ldb_con, beautify=True)\n Return javascript content for given channel_id.\n Parameters\n ----------\n visit_id : int\n `visit_id` of the page visit where this URL was loaded\n channel_id : string\n `channel_id` to search content hash for\n sqlite_cur : sqlite3.Cursor\n cursor for crawl database\n ldb_con : plyvel.DB\n leveldb database storing javascript content\n beautify : boolean\n Control weather or not to beautify output\n\n get_content(db, content_hash, compression='snappy', beautify=True)\n Returns decompressed content from javascript leveldb database\n\n get_ldb_content(ldb_addr, content_hash)\n\n get_leveldb(db_path, compression='snappy')\n Returns an open handle for a leveldb database\n with proper configuration settings.\n\n get_url_content(url, sqlite_cur, ldb_con, beautify=True, visit_id=None)\n Return javascript content for given url.\n Parameters\n ----------\n url : string\n url to search content hash for\n sqlite_cur : sqlite3.Cursor\n cursor for crawl database\n ldb_con : plyvel.DB\n leveldb database storing javascript content\n beautify : boolean\n Control weather or not to beautify output\n visit_id : int\n (optional) `visit_id` of the page visit where this URL was loaded\n\n get_url_content_with_hash(url, sqlite_cur, ldb_con, beautify=True, visit_id=None)\n Return javascript content for given url.\n Parameters\n ----------\n url : string\n url to search content hash for\n sqlite_cur : sqlite3.Cursor\n cursor for crawl database\n ldb_con : plyvel.DB\n leveldb database storing javascript content\n beautify : boolean\n Control weather or not to beautify output\n visit_id : int\n (optional) `visit_id` of the page visit where this URL was loaded\n\n list_placeholder(length, is_pg=False)\n Returns a (?,?,?,?...) string of the desired length\n\n optimize_db(cursor)\n Set options to make sqlite more efficient on a high memory machine\n\nFILE\n /home/john/git/uiowa-irl/uiowa-irl-utils/irlutils/irlutils/url/crawl/database_utils.py\n\n\nHelp on module blocklist_utils:\n\nNAME\n blocklist_utils\n\nFUNCTIONS\n get_option_dict(request)\n Build an options dict for BlockListParser\n\n Parameters\n ----------\n request : sqlite3.Row\n A single HTTP request record pulled from OpenWPM's http_requests table\n public_suffix_list : PublicSuffixList\n An instance of PublicSuffixList()\n\n BINARY_OPTIONS = [\n \"script\",\n \"image\",\n \"stylesheet\",\n \"object\",\n \"xmlhttprequest\",\n \"object-subrequest\",\n \"subdocument\",\n \"document\",\n \"elemhide\",\n \"other\",\n \"background\",\n \"xbl\",\n \"ping\",\n \"dtd\",\n \"media\",\n \"third-party\",\n \"match-case\",\n \"collapse\",\n \"donottrack\",\n ]\n\n Returns\n -------\n dict\n An \"options\" dictionary for use with BlockListParser\n refs: [1] https://github.com/MoonchildProductions/UXP/blob/master/dom/base/nsIContentPolicyBase.idl\n [2] https://adblockplus.org/en/filters#options\n [3]\n\nFILE\n /home/john/git/uiowa-irl/uiowa-irl-utils/irlutils/irlutils/url/crawl/blocklist_utils.py\n\n\nHelp on module domain_utils:\n\nNAME\n domain_utils\n\nFUNCTIONS\n get_hostname(url)\n strips out the hostname from a url\n\n get_ps_plus_1(url, **kwargs)\n Returns the PS+1 of the url. This will also return\n an IP address if the hostname of the url is a valid\n IP address.\n\n An (optional) PublicSuffixList object can be passed with keyword arg 'psl',\n otherwise a version cached in the system temp directory is used.\n\n get_psl(location='public_suffix_list.dat')\n Grabs an updated public suffix list.\n\n get_stripped_query_string(url)\n\n get_stripped_url(url, scheme=False)\n Returns a url stripped to (scheme)?+hostname+path\n\n get_stripped_urls(urls, scheme=False)\n Returns a set (or list) of urls stripped to (scheme)?+hostname+path\n\n hostname_subparts(url, include_ps=False, **kwargs)\n Returns a list of slices of a url's hostname down to the PS+1\n\n If `include_ps` is set, the hostname slices will include the public suffix\n\n For example: http://a.b.c.d.com/path?query#frag would yield:\n [a.b.c.d.com, b.c.d.com, c.d.com, d.com] if include_ps == False\n [a.b.c.d.com, b.c.d.com, c.d.com, d.com, com] if include_ps == True\n\n An (optional) PublicSuffixList object can be passed with keyword arg 'psl'.\n otherwise a version cached in the system temp directory is used.\n\n is_ip_address(hostname)\n Check if the given string is a valid IP address\n\n load_psl(function)\n\nDATA\n PSL_CACHE_LOC = 'public_suffix_list.dat'\n absolute_import = _Feature((2, 5, 0, 'alpha', 1), (3, 0, 0, 'alpha', 0...\n print_function = _Feature((2, 6, 0, 'alpha', 2), (3, 0, 0, 'alpha', 0)...\n\nFILE\n /home/john/git/uiowa-irl/uiowa-irl-utils/irlutils/irlutils/url/crawl/domain_utils.py\n\n\nHelp on module file_utils:\n\nNAME\n file_utils\n\nFUNCTIONS\n chmod(path, mode=777, recursive=False)\n\n chown(path, recursive=False, owner='user', group='user')\n\n compress_path(path)\n\n cp(s, d)\n\n file_ext(path, **kwargs)\n file extension finder\n kwargs:\n path (str): path or file name\n Returns:\n dotted file extension of a file\n Examples:\n\n >>> file_ext('/path/to_file/with_ext/test.py')\n .py\n\n gen_find_files(**kwargs)\n returns filenames that matches the given pattern under() a given dir\n\n\n Kwargs:\n file_pattern (str): a regex style string .\n root (str): top level folder to begin search from.\n\n Yields:\n path (generator): matching path str\n\n Examples:\n gen_find_files(file_pattern=\"*.sql\", root=\"/mnt/data/).\n\n >>> gen_find_files(file_pattern=\"*.sql\", root=\"/mnt/data/).__next__()\n /mnt/data/first_folder/last_folder/file.sqlite\n\n Reference:\n [1] http://www.dabeaz.com/generators/\n\n json_flatten(y)\n flattens nested structures within a json file\n\n\n Kwargs:\n\n data (dict): data from nested dictionary\n kv (dict): dictionary containing key,value pairs.\n\n returns:\n\n kv (dict): a dictionary object containing flattened structures\n\n Examples:\n data = {'k1':{'kv1':['v1', 'v2'], 'kv2': 'v3'}}\n\n >>> json_flatten(data)\n {'k1_kv1_0': 'v1', 'k1_kv1_1': 'v2', 'k1_kv2': 'v3'}\n\n mkdir(d, mode=511, exist_ok=True)\n\n mv(s, d)\n\n rm(d)\n\n rmsubtree(**kwargs)\n Clears all subfolders and files in location\n kwargs:\n location (str): target directory to remove\n Examples:\n\n >>> rmsubtree(location=\"/path/to/target_dir\").\n\n tar_packer(tar_dir, **kwargs)\n tars up directory\n\n\n Kwargs:\n\n dir (str): top level dir\n compression (bool): compression type. gz, xz supported now\n versbose (bool): True enables verbose\n\n returns:\n\n tar_path (generator): path to tar file\n\n Examples:\n\n tar_packer(dir=\"/path/to/top_level_dir\", [compression=gz|xz]\n\n >>>\n /tmp/FZ4245_Zb/top_level_dir.tar\n\n tar_unpacker(tar_path, **kwargs)\n unpacks tar to a tmp directory.\n\n\n Kwargs:\n\n tar_path (str): tar file path\n versbose (bool): True enables verbose\n\n returns:\n\n tmp_path (generator): extracted contents path\n\n Examples:\n\n tar_unpacker(tar_path=\"/mnt/data/tarfile.tar.gz\").\n\n >>> tar_unpacker(tar_path=\"/mnt/data/tarfile.tar.gz\").\n /tmp/FZ4245_Zb/\n\n touch(d)\n\nAUTHOR\n johncook\n\nFILE\n /Users/johncook/git/uiowa-irl-utils/irlutils/file/file_utils.py\n\n\nHelp on module database_utils:\n\nNAME\n database_utils\n\nFUNCTIONS\n build_index(cursor, column, tables)\n Build an index on `column` for each table in `tables`\n\n drop_tables(sqlite_con, tables=[])\n\n fetchiter(cursor, arraysize=10000)\n Generator for cursor results\n\n get_channel_content(visit_id, channel_id, sqlite_cur, ldb_con, beautify=True)\n Return javascript content for given channel_id.\n Parameters\n ----------\n visit_id : int\n `visit_id` of the page visit where this URL was loaded\n channel_id : string\n `channel_id` to search content hash for\n sqlite_cur : sqlite3.Cursor\n cursor for crawl database\n ldb_con : plyvel.DB\n leveldb database storing javascript content\n beautify : boolean\n Control weather or not to beautify output\n\n get_channel_content_with_hash(visit_id, channel_id, sqlite_cur, ldb_con, beautify=True)\n Return javascript content for given channel_id.\n Parameters\n ----------\n visit_id : int\n `visit_id` of the page visit where this URL was loaded\n channel_id : string\n `channel_id` to search content hash for\n sqlite_cur : sqlite3.Cursor\n cursor for crawl database\n ldb_con : plyvel.DB\n leveldb database storing javascript content\n beautify : boolean\n Control weather or not to beautify output\n\n get_content(db, content_hash, compression='snappy', beautify=True)\n Returns decompressed content from javascript leveldb database\n\n get_ldb_content(ldb_addr, content_hash)\n\n get_leveldb(db_path, compression='snappy')\n Returns an open handle for a leveldb database\n with proper configuration settings.\n\n get_url_content(url, sqlite_cur, ldb_con, beautify=True, visit_id=None)\n Return javascript content for given url.\n Parameters\n ----------\n url : string\n url to search content hash for\n sqlite_cur : sqlite3.Cursor\n cursor for crawl database\n ldb_con : plyvel.DB\n leveldb database storing javascript content\n beautify : boolean\n Control weather or not to beautify output\n visit_id : int\n (optional) `visit_id` of the page visit where this URL was loaded\n\n get_url_content_with_hash(url, sqlite_cur, ldb_con, beautify=True, visit_id=None)\n Return javascript content for given url.\n Parameters\n ----------\n url : string\n url to search content hash for\n sqlite_cur : sqlite3.Cursor\n cursor for crawl database\n ldb_con : plyvel.DB\n leveldb database storing javascript content\n beautify : boolean\n Control weather or not to beautify output\n visit_id : int\n (optional) `visit_id` of the page visit where this URL was loaded\n\n list_placeholder(length, is_pg=False)\n Returns a (?,?,?,?...) string of the desired length\n\n optimize_db(cursor)\n Set options to make sqlite more efficient on a high memory machine\n\nFILE\n /Users/johncook/git/uiowa-irl-utils/irlutils/url/crawl/database_utils.py\n\n\nproblem in irlutils/url/crawl/domain_utils.py - ModuleNotFoundError: No module named 'publicsuffix2'\nproblem in irlutils/url/crawl/blocklist_utils.py - ModuleNotFoundError: No module named 'publicsuffix2'\nHelp on module analysis_utils:\n\nNAME\n analysis_utils\n\nFUNCTIONS\n add_col_bare_script_url(js_df)\n Add a col for script URL without scheme, www and query.\n\n add_col_set_of_script_ps1s_from_call_stack(js_df)\n map psls to call stack in scripts\n\n Args: \n js_df (pandas dataFrame): javascript table\n\n add_col_set_of_script_urls_from_call_stack(js_df)\n\n add_col_unix_timestamp(df)\n\n datetime_from_iso(iso_date)\n Convert from ISO.\n\n get_cookie(headers)\n A special case of parse headers that extracts only the cookie. \n\n Args: \n headers (list): http request headers\n\n Returns:\n\n item(string): name value pairs of a cookie\n\n get_func_and_script_url_from_initiator(initiator)\n Remove line number and column number from the initiator.\n\n get_host_from_url(url)\n\n get_initiator_from_call_stack(call_stack)\n Return the bottom element of the call stack.\n\n get_initiator_from_req_call_stack(req_call_stack)\n Return the bottom element of a request call stack.\n Request call stacks have an extra field (async_cause) at the end.\n\n get_requests_from_visits(con, visit_ids)\n Extact http requests matching visit_ids\n\n Args: \n con (sqlite3.connection): A connection to a sqlite data base\n visit_ids (list): A list of ids for from each web visit\n\n Returns:\n df(pandas DataFrame): A table containing visits that conincide with http requests\n\n get_responses_from_visits(con, visit_ids)\n Extact http requests matching visit_ids\n\n Args: \n con (sqlite3.connection): A connection to a sqlite data base\n visit_ids (list): A list of ids for from each web visit\n\n Returns:\n df(pandas DataFrame): A table containing visits that conincide with http responses\n\n get_script_url_from_initiator(initiator)\n Remove the scheme and query section of a URL.\n\n get_script_urls_from_call_stack_as_set(call_stack)\n Return the urls of the scripts involved in the call stack as a set.\n\n get_set_cookie(header)\n A special case of parse headers that returns 'Set-Cookies'\n\n Args: \n headers (string): http request headers\n\n Returns:\n item(string): name value pairs of Set Cookie field\n\n get_set_of_script_hosts_from_call_stack(call_stack)\n Return the urls of the scripts involved in the call stack.\n\n get_set_of_script_ps1s_from_call_stack(script_urls, du)\n extract a unique set of urls from a list of urls detected in scripts\n\n Args: \n script_urls (list): A list of urls extracted from javascripts\n du (list): A domain utilities instance\n\n Returns:\n psls(set): a set of tld+1(string)\n\n get_set_of_script_urls_from_call_stack(call_stack)\n Return the urls of the scripts involved in the call stack as a\n string.\n\n parse_headers(header)\n parses http header into kv pairs\n\n Args: \n headers (string): http request headers\n\n Returns:\n kv(dict): name value pairs of http headers\n\n strip_scheme_www_and_query(url)\n Remove the scheme and query section of a URL.\n\nDATA\n absolute_import = _Feature((2, 5, 0, 'alpha', 1), (3, 0, 0, 'alpha', 0...\n print_function = _Feature((2, 6, 0, 'alpha', 2), (3, 0, 0, 'alpha', 0)...\n\nFILE\n /Users/johncook/git/uiowa-irl-utils/irlutils/url/crawl/analysis_utils.py\n\n\nHelp on module chi2_proportions:\n\nNAME\n chi2_proportions\n\nFUNCTIONS\n chi2Proportions(count, nobs)\n A wrapper for the chi2 testing proportions based upon the chi-square test\n\n Args:\n count (:obj `list` of :obj`int` or a single `int`): the number of successes in nobs trials. If this is \n array_like, then the assumption is that this represents the number of successes \n for each independent sample \n\n\n nobs (:obj `list` of :obj`int` or a single `int`): The number of trials or observations, with the same length as count. \n\n Returns: \n chi2 (:obj `float`): The test statistic.\n\n p (:obj `float`): The p-value of the test\n\n dof (int) : Degrees of freedom\n\n expected (:obj `list`): list same shape as observed. The expected frequencies, based on the marginal sums of the table\n\n\n References: \n [1] \"scipy.stats.chi2_contingency\" https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.chi2_contingency.html\n [2] \"statsmodels.stats.proportion.proportions_chisquare\" https://www.statsmodels.org/dev/generated/statsmodels.stats.proportion.proportions_chisquare.html\n [3] (1, 2) \u201cContingency table\u201d, https://en.wikipedia.org/wiki/Contingency_table\n [4] (1, 2) \u201cPearson\u2019s chi-squared test\u201d, https://en.wikipedia.org/wiki/Pearson%27s_chi-squared_test\n [5] (1, 2) Cressie, N. and Read, T. R. C., \u201cMultinomial Goodness-of-Fit Tests\u201d, J. Royal Stat. Soc. Series B, Vol. 46, No. 3 (1984), pp. 440-464.\n\n Sample use: \n input: \n [10,10,20] - number of successes in trial \n [20,20,20] - number of trials \n chi2Proportions([10,10,20], [20,20,20])\n\n output: \n (2.7777777777777777,\n 0.24935220877729619,\n 2,\n array([[ 12., 12., 16.],\n [ 18., 18., 24.]]))\n\nFILE\n /Users/johncook/git/uiowa-irl-utils/irlutils/stats/tests/proportions/chi2_proportions.py\n\n\n\n\n", "description_content_type": "text/markdown", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/uiowa-irl/uiowa-irl-utils.git", "keywords": "", "license": "", "maintainer": "", "maintainer_email": "", "name": "irlutils", "package_url": "https://pypi.org/project/irlutils/", "platform": "", "project_url": "https://pypi.org/project/irlutils/", "project_urls": { "Homepage": "https://github.com/uiowa-irl/uiowa-irl-utils.git" }, "release_url": "https://pypi.org/project/irlutils/0.1.4/", "requires_dist": null, "requires_python": "", "summary": "IRL Utilities", "version": "0.1.4", "yanked": false, "yanked_reason": null }, "last_serial": 6187099, "releases": { "0.0.6": [ { "comment_text": "", "digests": { "md5": "e68c76f490925bee8f3f99619c4f88b6", "sha256": "9369eb9929c5c4bed53d620fb4d5aab774997d5dcbede3da9723da233f346515" }, "downloads": -1, "filename": "irlutils-0.0.6-py3.7.egg", "has_sig": false, "md5_digest": "e68c76f490925bee8f3f99619c4f88b6", "packagetype": "bdist_egg", "python_version": "3.7", "requires_python": null, "size": 32224, "upload_time": "2019-11-01T01:25:53", "upload_time_iso_8601": "2019-11-01T01:25:53.172892Z", "url": "https://files.pythonhosted.org/packages/63/f9/916d903446a2984d6b3177cff9c04e0415c7cdc50d95c779ee791f9f9e4a/irlutils-0.0.6-py3.7.egg", "yanked": false, "yanked_reason": null }, { "comment_text": "", "digests": { "md5": "3ed31a898a1edb1638c6a1f41e61df50", "sha256": "8b4dc6005cf2e22ca0d8cddc9609e590361c49a34751d1ade9a8121bbdb60664" }, "downloads": -1, "filename": "irlutils-0.0.6-py3-none-any.whl", "has_sig": false, "md5_digest": "3ed31a898a1edb1638c6a1f41e61df50", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 17144, "upload_time": "2019-11-01T01:25:51", "upload_time_iso_8601": "2019-11-01T01:25:51.121298Z", "url": "https://files.pythonhosted.org/packages/2e/63/d0891a74c1780e10b59b91161794221e36283b400337b5c0b833d1f73be8/irlutils-0.0.6-py3-none-any.whl", "yanked": false, "yanked_reason": null }, { "comment_text": "", "digests": { "md5": "2da889d947413c1073a9a5b14ca4c21d", "sha256": "75f2c26d02af1e5694aba96d46cc15396e5c55bbe644900b39ae283d838703fd" }, "downloads": -1, "filename": "irlutils-0.0.6.tar.gz", "has_sig": false, "md5_digest": "2da889d947413c1073a9a5b14ca4c21d", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 15937, "upload_time": "2019-11-01T01:25:54", "upload_time_iso_8601": "2019-11-01T01:25:54.800245Z", "url": "https://files.pythonhosted.org/packages/c2/12/7e51f589529cbf1b9dc09643b1725dee068c228b315584aad4ba74a68746/irlutils-0.0.6.tar.gz", "yanked": false, "yanked_reason": null } ], "0.0.7": [ { "comment_text": "", "digests": { "md5": "0d903893f36a23dd1e82b322207991a8", "sha256": "e650538e6d71ef4ab8ec1ea77102effa86ff6a037e2b37c9cd6bd7e46b48c14c" }, "downloads": -1, "filename": "irlutils-0.0.7-py3.7.egg", "has_sig": false, "md5_digest": "0d903893f36a23dd1e82b322207991a8", "packagetype": "bdist_egg", "python_version": "3.7", "requires_python": null, "size": 59665, "upload_time": "2019-11-16T23:36:06", "upload_time_iso_8601": "2019-11-16T23:36:06.178531Z", "url": "https://files.pythonhosted.org/packages/01/38/de5fc4b6e12d566f494110a21b4b8e3651484103bb446851a067c1e630fc/irlutils-0.0.7-py3.7.egg", "yanked": false, "yanked_reason": null }, { "comment_text": "", "digests": { "md5": "86c2d5624f92ad9c49723600f5208ecb", "sha256": "53a4196c77e32ee0ea4f95c6a364b2149dae0d96fa9fe5e805988776926fefa1" }, "downloads": -1, "filename": "irlutils-0.0.7-py3-none-any.whl", "has_sig": false, "md5_digest": "86c2d5624f92ad9c49723600f5208ecb", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 28557, "upload_time": "2019-11-16T23:36:02", "upload_time_iso_8601": "2019-11-16T23:36:02.962112Z", "url": "https://files.pythonhosted.org/packages/1d/3b/d0bb6d827d05637204638ef2c7c2eea80a0ee205144ff2fa3a215cd5b575/irlutils-0.0.7-py3-none-any.whl", "yanked": false, "yanked_reason": null }, { "comment_text": "", "digests": { "md5": "5cb882bb9ff8a26cf37affa0cd415143", "sha256": "36ca731b505b895ef32d49e2a886c12480cbde9c3e1f1ec763e9bcdd14a0ddc3" }, "downloads": -1, "filename": "irlutils-0.0.7.tar.gz", "has_sig": false, "md5_digest": "5cb882bb9ff8a26cf37affa0cd415143", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 25509, "upload_time": "2019-11-16T23:36:07", "upload_time_iso_8601": "2019-11-16T23:36:07.485828Z", "url": "https://files.pythonhosted.org/packages/5e/19/f2397709d8dc2bae7bc9b6c29eae3001044a80cb33cad8456ce0aaa525c5/irlutils-0.0.7.tar.gz", "yanked": false, "yanked_reason": null } ], "0.0.8": [ { "comment_text": "", "digests": { "md5": "f09e7321f2bd685c034b73e216ff3166", "sha256": "6537440ab2d7894b10ccc2872354f3dc797bd5eddb6634ce7fb0f3dd252a89d9" }, "downloads": -1, "filename": "irlutils-0.0.8-py3.7.egg", "has_sig": false, "md5_digest": "f09e7321f2bd685c034b73e216ff3166", "packagetype": "bdist_egg", "python_version": "3.7", "requires_python": null, "size": 60097, "upload_time": "2019-11-16T23:51:51", "upload_time_iso_8601": "2019-11-16T23:51:51.774789Z", "url": "https://files.pythonhosted.org/packages/2a/ea/74f506be2ee9b837b5dc245f305b4d9d4463975f5feec78672fafca14b21/irlutils-0.0.8-py3.7.egg", "yanked": false, "yanked_reason": null }, { "comment_text": "", "digests": { "md5": "ef7d6ff0b76298e28921d6f445a27516", "sha256": "16ea38b73ffd7444aec81c5e9bdf598dfbb83afe0aca61362967cb0e9da59faf" }, "downloads": -1, "filename": "irlutils-0.0.8-py3-none-any.whl", "has_sig": false, "md5_digest": "ef7d6ff0b76298e28921d6f445a27516", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 28867, "upload_time": "2019-11-16T23:51:48", "upload_time_iso_8601": "2019-11-16T23:51:48.733677Z", "url": "https://files.pythonhosted.org/packages/ca/90/92ccf44183d4503c28494122e6a5697406bbbab32e0b34fe181cd601bb90/irlutils-0.0.8-py3-none-any.whl", "yanked": false, "yanked_reason": null }, { "comment_text": "", "digests": { "md5": "445835a9d24a720071bdafbf32d44301", "sha256": "e2025534299db8ea9d91f4e1f86a37fdfb65d7bdbcd07bed375295d29a1198e6" }, "downloads": -1, "filename": "irlutils-0.0.8.tar.gz", "has_sig": false, "md5_digest": "445835a9d24a720071bdafbf32d44301", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 26948, "upload_time": "2019-11-16T23:51:53", "upload_time_iso_8601": "2019-11-16T23:51:53.281361Z", "url": "https://files.pythonhosted.org/packages/be/94/850692edeec86850615d3a1636b6bec9b936755a82fff2d6880743a9140f/irlutils-0.0.8.tar.gz", "yanked": false, "yanked_reason": null } ], "0.0.9": [ { "comment_text": "", "digests": { "md5": "71d6820a994967f1161d0842aec66024", "sha256": "e39590b243ea0fa5f7ea5142ec442a407095e471765739b9337d6794f86abae5" }, "downloads": -1, "filename": "irlutils-0.0.9-py3.7.egg", "has_sig": false, "md5_digest": "71d6820a994967f1161d0842aec66024", "packagetype": "bdist_egg", "python_version": "3.7", "requires_python": null, "size": 59910, "upload_time": "2019-11-16T23:51:54", "upload_time_iso_8601": "2019-11-16T23:51:54.568576Z", "url": "https://files.pythonhosted.org/packages/26/97/dc34aa500851ef4f90727bb7d1784ac97b1cd4a27242e9bf6491b880b353/irlutils-0.0.9-py3.7.egg", "yanked": false, "yanked_reason": null }, { "comment_text": "", "digests": { "md5": "d90324b57e9b279d7d92d8fb340d3b44", "sha256": "0940c079124e8474e55749d9fa3189782c2aa16bd55876ff10bb05acac8687b2" }, "downloads": -1, "filename": "irlutils-0.0.9-py3-none-any.whl", "has_sig": false, "md5_digest": "d90324b57e9b279d7d92d8fb340d3b44", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 28727, "upload_time": "2019-11-16T23:51:50", "upload_time_iso_8601": "2019-11-16T23:51:50.292304Z", "url": "https://files.pythonhosted.org/packages/4c/66/8ab8a795218a7d9a9aebc3c9c149decf461950dc8ca6f4d261afdf56b626/irlutils-0.0.9-py3-none-any.whl", "yanked": false, "yanked_reason": null }, { "comment_text": "", "digests": { "md5": "cf26ed5613419e726dc2cab946194736", "sha256": "dc95e9b75dd272d5f2b476cff22c8f2a30d7b894dc537ea4ddb47d407079fea5" }, "downloads": -1, "filename": "irlutils-0.0.9.tar.gz", "has_sig": false, "md5_digest": "cf26ed5613419e726dc2cab946194736", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 26418, "upload_time": "2019-11-16T23:51:55", "upload_time_iso_8601": "2019-11-16T23:51:55.959930Z", "url": "https://files.pythonhosted.org/packages/5e/49/2542fd9801e67f08f50b85330fcff751954861e6a873083fdf00c7573abc/irlutils-0.0.9.tar.gz", "yanked": false, "yanked_reason": null } ], "0.1.0": [ { "comment_text": "", "digests": { "md5": "db177306ac536d9c08594627227582c2", "sha256": "fd7770d20068028f0cb8e4f9d607bf6e8091ef5f4e3e262470ea5669e6af8db2" }, "downloads": -1, "filename": "irlutils-0.1.0-py3.7.egg", "has_sig": false, "md5_digest": "db177306ac536d9c08594627227582c2", "packagetype": "bdist_egg", "python_version": "3.7", "requires_python": null, "size": 60452, "upload_time": "2019-11-17T00:35:45", "upload_time_iso_8601": "2019-11-17T00:35:45.781609Z", "url": "https://files.pythonhosted.org/packages/ce/58/cbacfe9e7942a3af0dd809b90bd850fbc02e7989b2fc5fd83afc7d98b135/irlutils-0.1.0-py3.7.egg", "yanked": false, "yanked_reason": null }, { "comment_text": "", "digests": { "md5": "2cfb3f96e8c84ed070c41e2126ada75b", "sha256": "a41012b51c9413f9f7cdbcd8e11fdc6422a370d424613ce06d7b1462dcf3bad6" }, "downloads": -1, "filename": "irlutils-0.1.0-py3-none-any.whl", "has_sig": false, "md5_digest": "2cfb3f96e8c84ed070c41e2126ada75b", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 29113, "upload_time": "2019-11-17T00:35:44", "upload_time_iso_8601": "2019-11-17T00:35:44.256458Z", "url": "https://files.pythonhosted.org/packages/c1/fa/956b8c81189cccfa204261b86ac6bffb1d83e298380be436a13fd57dc08f/irlutils-0.1.0-py3-none-any.whl", "yanked": false, "yanked_reason": null }, { "comment_text": "", "digests": { "md5": "fdf011c24541a317093e7754d5442510", "sha256": "9090b4c4d010ec36a58fde85f729aed3f57693c24c007dbf8c905534696ef7fd" }, "downloads": -1, "filename": "irlutils-0.1.0.tar.gz", "has_sig": false, "md5_digest": "fdf011c24541a317093e7754d5442510", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 28081, "upload_time": "2019-11-17T00:35:47", "upload_time_iso_8601": "2019-11-17T00:35:47.260110Z", "url": "https://files.pythonhosted.org/packages/65/e0/16e5ac1973995c8ab10f562a0e91b0c4cc208471050adfba049ee2a47ee0/irlutils-0.1.0.tar.gz", "yanked": false, "yanked_reason": null } ], "0.1.1": [ { "comment_text": "", "digests": { "md5": "b9ec17a004ad97bf57e45317da25da43", "sha256": "4da04af9d719fc5b8862ba9183c43b600cc56e4632c652424bec4d745665390e" }, "downloads": -1, "filename": "irlutils-0.1.1-py3.7.egg", "has_sig": false, "md5_digest": "b9ec17a004ad97bf57e45317da25da43", "packagetype": "bdist_egg", "python_version": "3.7", "requires_python": null, "size": 60633, "upload_time": "2019-11-17T11:29:29", "upload_time_iso_8601": "2019-11-17T11:29:29.869366Z", "url": "https://files.pythonhosted.org/packages/6b/49/5ffb15f8f20430b44733e6f2c064cb992e13284cb565e360ef268cd95a12/irlutils-0.1.1-py3.7.egg", "yanked": false, "yanked_reason": null }, { "comment_text": "", "digests": { "md5": "de7ea5531234e0bc0e1fee4c739df05d", "sha256": "9bbce28e3c6a702256bbffec7c0d45d8703d3cfe9fe98bad26a23ddd8d810c46" }, "downloads": -1, "filename": "irlutils-0.1.1-py3-none-any.whl", "has_sig": false, "md5_digest": "de7ea5531234e0bc0e1fee4c739df05d", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 29235, "upload_time": "2019-11-17T11:29:27", "upload_time_iso_8601": "2019-11-17T11:29:27.350298Z", "url": "https://files.pythonhosted.org/packages/5c/a0/81389f0fdee5af3d082f3155fcc92e6ce5b5f849c3780fc6067b173a15ce/irlutils-0.1.1-py3-none-any.whl", "yanked": false, "yanked_reason": null }, { "comment_text": "", "digests": { "md5": "385cca763f99d73faf7c80a1ad3d275f", "sha256": "739d69d647ce8c21ea82ba64d24999a32e666afef9036d73d11ba33dd049bdeb" }, "downloads": -1, "filename": "irlutils-0.1.1.tar.gz", "has_sig": false, "md5_digest": "385cca763f99d73faf7c80a1ad3d275f", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 28684, "upload_time": "2019-11-17T11:29:31", "upload_time_iso_8601": "2019-11-17T11:29:31.405952Z", "url": "https://files.pythonhosted.org/packages/ec/0f/353fd67356cb1ccc615f8efc4d3db65d573b16eeb9b0fe58276fdb6c602e/irlutils-0.1.1.tar.gz", "yanked": false, "yanked_reason": null } ], "0.1.2": [ { "comment_text": "", "digests": { "md5": "b1ab0d4cf0c397f2e8206eb2a171f911", "sha256": "35489f837e9eb25355cd78923d4a0feef764a235b15b7c5be314f2b61a26d3f4" }, "downloads": -1, "filename": "irlutils-0.1.2-py3.7.egg", "has_sig": false, "md5_digest": "b1ab0d4cf0c397f2e8206eb2a171f911", "packagetype": "bdist_egg", "python_version": "3.7", "requires_python": null, "size": 60753, "upload_time": "2019-11-17T11:40:33", "upload_time_iso_8601": "2019-11-17T11:40:33.627755Z", "url": "https://files.pythonhosted.org/packages/4e/39/bf9edd5e152b78a4ac43cb52de6876de71da1ea573934f07ee853a6dd564/irlutils-0.1.2-py3.7.egg", "yanked": false, "yanked_reason": null }, { "comment_text": "", "digests": { "md5": "8e061c67e05e4cca887cd94ab606d6ff", "sha256": "3a8b6bac66c2e430923258005884a1864250def7f8308010db7edb1e4d2becb2" }, "downloads": -1, "filename": "irlutils-0.1.2-py3-none-any.whl", "has_sig": false, "md5_digest": "8e061c67e05e4cca887cd94ab606d6ff", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 29342, "upload_time": "2019-11-17T11:40:32", "upload_time_iso_8601": "2019-11-17T11:40:32.229680Z", "url": "https://files.pythonhosted.org/packages/43/84/169714a72b34ba972bf7778550ebe5e58ddb5bb7d36d523485d278877d0c/irlutils-0.1.2-py3-none-any.whl", "yanked": false, "yanked_reason": null }, { "comment_text": "", "digests": { "md5": "8a7707c8ede11abebd0a2f5288cf7c74", "sha256": "8f00a39830f0c8b56840819e33a64afbb32278af8a4bfa224c656fbc7be548e3" }, "downloads": -1, "filename": "irlutils-0.1.2.tar.gz", "has_sig": false, "md5_digest": "8a7707c8ede11abebd0a2f5288cf7c74", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 29081, "upload_time": "2019-11-17T11:40:35", "upload_time_iso_8601": "2019-11-17T11:40:35.029914Z", "url": "https://files.pythonhosted.org/packages/b3/ed/d0f9ddf012e2691fe275bfb75c015a8e38f925cc40510b2f4058378cc0c5/irlutils-0.1.2.tar.gz", "yanked": false, "yanked_reason": null } ], "0.1.3": [ { "comment_text": "", "digests": { "md5": "35ceea85c41ed08bf1b3b07c6ee8087e", "sha256": "1b7d840b4698fbc511f6abfc59e2d46afb945d402886b70a06926557979751d0" }, "downloads": -1, "filename": "irlutils-0.1.3-py3.7.egg", "has_sig": false, "md5_digest": "35ceea85c41ed08bf1b3b07c6ee8087e", "packagetype": "bdist_egg", "python_version": "3.7", "requires_python": null, "size": 60767, "upload_time": "2019-11-23T16:23:52", "upload_time_iso_8601": "2019-11-23T16:23:52.274867Z", "url": "https://files.pythonhosted.org/packages/b4/5f/2c5696e644b33088ef6b64164f0ad34476d98b8e59aabf79928a3df3cc66/irlutils-0.1.3-py3.7.egg", "yanked": false, "yanked_reason": null }, { "comment_text": "", "digests": { "md5": "9ecbc3a7eec6f321573445596ba473c8", "sha256": "9d04f879b69f218e8eca305abb72559030dd24fba72db73f1d526eb5f867a3f9" }, "downloads": -1, "filename": "irlutils-0.1.3-py3-none-any.whl", "has_sig": false, "md5_digest": "9ecbc3a7eec6f321573445596ba473c8", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 29351, "upload_time": "2019-11-23T16:23:49", "upload_time_iso_8601": "2019-11-23T16:23:49.528547Z", "url": "https://files.pythonhosted.org/packages/4e/8c/d3bdfc4bf5e715ec714c6174b03a30ea354cf5a6fa63b800cea816f77950/irlutils-0.1.3-py3-none-any.whl", "yanked": false, "yanked_reason": null }, { "comment_text": "", "digests": { "md5": "47b476a782a48068ecf226924111fba8", "sha256": "3b00c0053a9e253c05dfdeb6152fd7ee44b5fcefd73728abb69d80945ca72abd" }, "downloads": -1, "filename": "irlutils-0.1.3.tar.gz", "has_sig": false, "md5_digest": "47b476a782a48068ecf226924111fba8", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 29100, "upload_time": "2019-11-23T16:23:53", "upload_time_iso_8601": "2019-11-23T16:23:53.429119Z", "url": "https://files.pythonhosted.org/packages/64/dd/db09fbd5097b18f5d4f08ef6b34b63fca4c5a432e0a4adbe999ac8322c63/irlutils-0.1.3.tar.gz", "yanked": false, "yanked_reason": null } ], "0.1.4": [ { "comment_text": "", "digests": { "md5": "0bfe0009e543f56abc64cc16d9c7120c", "sha256": "4fa43f6bda4d6e529809d324a9e4e8a62bb0222eba321cd9e2e220139bad383c" }, "downloads": -1, "filename": "irlutils-0.1.4-py3.7.egg", "has_sig": false, "md5_digest": "0bfe0009e543f56abc64cc16d9c7120c", "packagetype": "bdist_egg", "python_version": "3.7", "requires_python": null, "size": 62843, "upload_time": "2019-11-23T16:30:43", "upload_time_iso_8601": "2019-11-23T16:30:43.232315Z", "url": "https://files.pythonhosted.org/packages/c6/20/fc7bb0ac779b7d378695b8b5c4a2b8f14d5ca0edacfdb5639b8d45ee240b/irlutils-0.1.4-py3.7.egg", "yanked": false, "yanked_reason": null }, { "comment_text": "", "digests": { "md5": "8d86ecbde4dd58cfbef9fc28d9c14618", "sha256": "6a692cd0f209c41d086bb06d552e32eef8274d5355fb6046247abc0b201dc9a4" }, "downloads": -1, "filename": "irlutils-0.1.4-py3-none-any.whl", "has_sig": false, "md5_digest": "8d86ecbde4dd58cfbef9fc28d9c14618", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 30561, "upload_time": "2019-11-23T16:30:38", "upload_time_iso_8601": "2019-11-23T16:30:38.918030Z", "url": "https://files.pythonhosted.org/packages/68/30/5e39c96b706f2d513389da622a2406f0ae937faec7ef2c7eecb824b06946/irlutils-0.1.4-py3-none-any.whl", "yanked": false, "yanked_reason": null }, { "comment_text": "", "digests": { "md5": "a58a87f59ef31b0b68f0f5dae846e9ee", "sha256": "e87ea47ae36ba8f95afaa613d03e58c63258a0bcca8288bd28a45a4367a62517" }, "downloads": -1, "filename": "irlutils-0.1.4.tar.gz", "has_sig": false, "md5_digest": "a58a87f59ef31b0b68f0f5dae846e9ee", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 33624, "upload_time": "2019-11-23T16:30:45", "upload_time_iso_8601": "2019-11-23T16:30:45.228466Z", "url": "https://files.pythonhosted.org/packages/2a/9d/d876ed375c32a8b84b02347a86da4c0dc9f7d0403ecbf1314977b09f7cf9/irlutils-0.1.4.tar.gz", "yanked": false, "yanked_reason": null } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "0bfe0009e543f56abc64cc16d9c7120c", "sha256": "4fa43f6bda4d6e529809d324a9e4e8a62bb0222eba321cd9e2e220139bad383c" }, "downloads": -1, "filename": "irlutils-0.1.4-py3.7.egg", "has_sig": false, "md5_digest": "0bfe0009e543f56abc64cc16d9c7120c", "packagetype": "bdist_egg", "python_version": "3.7", "requires_python": null, "size": 62843, "upload_time": "2019-11-23T16:30:43", "upload_time_iso_8601": "2019-11-23T16:30:43.232315Z", "url": "https://files.pythonhosted.org/packages/c6/20/fc7bb0ac779b7d378695b8b5c4a2b8f14d5ca0edacfdb5639b8d45ee240b/irlutils-0.1.4-py3.7.egg", "yanked": false, "yanked_reason": null }, { "comment_text": "", "digests": { "md5": "8d86ecbde4dd58cfbef9fc28d9c14618", "sha256": "6a692cd0f209c41d086bb06d552e32eef8274d5355fb6046247abc0b201dc9a4" }, "downloads": -1, "filename": "irlutils-0.1.4-py3-none-any.whl", "has_sig": false, "md5_digest": "8d86ecbde4dd58cfbef9fc28d9c14618", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 30561, "upload_time": "2019-11-23T16:30:38", "upload_time_iso_8601": "2019-11-23T16:30:38.918030Z", "url": "https://files.pythonhosted.org/packages/68/30/5e39c96b706f2d513389da622a2406f0ae937faec7ef2c7eecb824b06946/irlutils-0.1.4-py3-none-any.whl", "yanked": false, "yanked_reason": null }, { "comment_text": "", "digests": { "md5": "a58a87f59ef31b0b68f0f5dae846e9ee", "sha256": "e87ea47ae36ba8f95afaa613d03e58c63258a0bcca8288bd28a45a4367a62517" }, "downloads": -1, "filename": "irlutils-0.1.4.tar.gz", "has_sig": false, "md5_digest": "a58a87f59ef31b0b68f0f5dae846e9ee", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 33624, "upload_time": "2019-11-23T16:30:45", "upload_time_iso_8601": "2019-11-23T16:30:45.228466Z", "url": "https://files.pythonhosted.org/packages/2a/9d/d876ed375c32a8b84b02347a86da4c0dc9f7d0403ecbf1314977b09f7cf9/irlutils-0.1.4.tar.gz", "yanked": false, "yanked_reason": null } ], "vulnerabilities": [] }