{ "info": { "author": "Ben Fulton", "author_email": "fulton.benjamin@gmail.com", "bugtrack_url": null, "classifiers": [], "description": "# Input Set Generator\n\nThis is the input set generator for the R2C platform.\n\n## Installation\nTo install, simply `pip install r2c-inputset-generator`. Then run `r2c-isg` to load the shell.\n\n**Note:** This application caches HTTP requests to the various package registries in the terminal's current directory. Be sure to navigate to an appropriate directory before loading the shell, or use the command `set-api --nocache` inside the shell.\n\n## Quick Start\nTry the following command sequences:\n\n- Load the top 5,000 pypi projects by downloads in the last 365 days, sort by descending number of downloads, trim to the top 100 most downloaded, download project metadata and all versions, and generate an input set json.\n\n\t load pypi top5kyear\n\t sort \"desc download_count\n\t trim 100\n\t get -mv all\n\t set-meta -n test -v 1.0\n\t export inputset.json\n
\n\n- Load all npm projects, sample 100, download the latest versions, and generate an input set json.\n\n\t load npm allbydependents\n\t sample 100\n\t get -v latest\n\t set-meta -n test -v 1.0\n\t export inputset.json\n
\n\n- Load a csv containing github urls and commit hashes, get project metadata and the latest versions, generate an input set json of type GitRepoCommit, remove all versions, and generate an input set json of type GitRepo.\n\n\t load --columns \"url v.commit\" github list_of_github_urls_and_commits.csv\n\t get -mv latest\n\t set-meta -n test -v 1.0\n\t export inputset_1.json\n\t trim -v 0\n\t export inputset_2.json\n\n## Shell Usage\n\n#### Input/Output\n\n- **load** (OPTIONS) [noreg | github | npm | pypi] [WEBLIST_NAME | FILEPATH.csv]
\n\tGenerates a dataset from a weblist or a local file. The following weblists are available:\n - Github: top1kstarred, top1kforked; the top 1,000 most starred or forked repos
\n - NPM: allbydependents; **all** packages, sorted from most to fewest dependents count (caution: 1M+ projects... handle with care)
\n - Pypi: top5kmonth and top5kyear; the top 5,000 most downloaded projects in the last 30/365 days\n\n\t**Options:**
\n **-c --columns** \"string of col names\": A space-separated list of column names in a csv. Overrides default columns (name and version), as well as any headers listed in the file (headers in files begin with a '!'). The CSV reader recognizes the following column keywords: name, url, org, v.commit, v.version. All other columns are read in as project or version attributes.
\n Example usage: --headers \"name url downloads v.commit v.date\".\n\n- **backup** (FILEPATH.p)
\n\tBacks up the dataset to a pickle file (defaults to ./dataset_name.p).\n\n- **restore** FILEPATH.p
\n\tRestores a dataset from a pickle file.\n\n- **import** [noreg | github | npm | pypi] FILEPATH.json
\n\tBuilds a dataset from an R2C input set.\n\n- **export** (FILEPATH.json)
\n\tExports a dataset to an R2C input set (defaults to ./dataset_name.json).\n\n#### Data Acquisition\n\n- **get** (OPTIONS)
\n\tDownloads project and version metadata from Github/NPM/Pypi.\n\n\t**Options:**
\n **-m --metadata**: Gets metadata for all projects.
\n **-v --versions** [all | latest]: Gets historical versions for all projects.\n\n#### Transformation\n\n- **trim** (OPTIONS) N
\n\tTrims the dataset to *n* projects or *n* versions per project.\n\n **Options**
\n **-v --versions**: Binary flag; trims on versions instead of projects.\n\n- **sample** (OPTIONS) N
\n\tSamples *n* projects or *n* versions per project.\n\n **Options**
\n **-v --versions**: Binary flag; sample versions instead of projects.\n\n- **sort** \"[asc, desc] attributes [...]\"
\n\tSorts the projects and versions based on a space-separated string of keywords. Valid keywords are:\n - Any project attributes\n - Any version attributes (prepend \"v.\" to the attribute name)\n - Any uuids (prepend \"uuids.\" to the uuid name\n - Any meta values (prepend \"meta.\" to the meta name\n - The words \"asc\" and \"desc\"\n\n All values are sorted in ascending order by default. The first keyword in the string is the primary sort key, the next the secondary, and so on.\n\n Example: The string \"uuids.name meta.url downloads desc v.version_str v.date\" would sort the dataset by ascending project name, url, and download count; and descending version string and date (assuming those keys exist).\n\n\n#### Settings\n\n- **set-meta** (OPTIONS)
\n\tSets the dataset's metadata.\n\n\t**Options:**
\n\t**-n --name** NAME: Input set name. Must be set before the dataset can be exported.
\n **-v --version** VERSION: Input set version. Must be set before the dataset can be exported.
\n **-d --description** DESCRIPTION: Description string.
\n **-r --readme** README: Markdown-formatted readme string.
\n **-a --author** AUTHOR: Author name; defaults to git user.name.
\n **-e --email** EMAIL: Author email; defaults to git user.email.
\n\n- **set-api** (OPTIONS)
\n\t**--cache_dir** CACHE_DIR: The path to the requests cache; defaults to ./.requests_cache.
\n **--cache_timeout** DAYS: The number of days before a cached request goes stale.
\n **--nocache**: Binary flag; disables request caching for this dataset.
\n **--github_pat** GITHUB_PAT: A github personal access token, used to increase the max allowed hourly request rate from 60/hr to 5,000/hr. For instructions on how to obtain a token, see: [https://help.github.com/en/articles/creating-a-personal-access-token-for-the-command-line](https://help.github.com/en/articles/creating-a-personal-access-token-for-the-command-line). \n\n#### Visualization\n\n- **show**
\n\tConverts the dataset to a json file and loads it in the system's native json viewer.\n\n## Python Project\n\nYou can also import the package into your own project. Just import the Dataset structure, initialize it, and you're good to go!\n\n```\nfrom r2c_isg.structures import Dataset\n\nds = Dataset.import_inputset(\n 'file.csv' ~or~ 'weblist_name',\n registry='github' ~or~ 'npm' ~or~ 'pypi',\n cache_dir=path/to/cache/dir, # optional; overrides ./.requests_cache\n cache_timeout=int(days_in_cache), # optional; overrides 1 week cache timeout\n nocache=True, # optional; disables caching\n github_pat=your_github_pat # optional; personal access token for github api\n)\n\nds.get_projects_meta()\n\nds.get_project_versions(historical='all' ~or~ 'latest')\n\nds.trim(\n n,\n on_versions=True\t# optional; defaults to False\n)\n\nds.sample(\n n,\n on_versions=True\t# optional; defaults to False\n)\n\nds.sort('string of sort parameters')\n\nds.update(**{'name': 'you_dataset_name', 'version': 'your_dataset_version'})\n\nds.export_inputset('your_inputset.json')\n```\n\n## Troubleshooting\n\nIf you run into any issues, you can run the shell with the `--debug` flag enabled to get a full error message. Then reach out to `support@ret2.co` with the stack trace and the steps to reproduce the error.\n\n**Note:** If the issue is related to the \"sample\" command, be sure to seed the random number generator to ensure reproducibility.\n\n\n", "description_content_type": "text/markdown", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://ret2.co/", "keywords": "", "license": "", "maintainer": "", "maintainer_email": "", "name": "r2c-inputset-generator", "package_url": "https://pypi.org/project/r2c-inputset-generator/", "platform": "", "project_url": "https://pypi.org/project/r2c-inputset-generator/", "project_urls": { "Homepage": "https://ret2.co/" }, "release_url": "https://pypi.org/project/r2c-inputset-generator/0.2.6/", "requires_dist": [ "click", "click-shell", "python-dotenv", "requests", "dill", "urllib3", "tqdm" ], "requires_python": ">=3.6", "summary": "An input set generator for R2C", "version": "0.2.6" }, "last_serial": 5950260, "releases": { "0.2.0": [ { "comment_text": "", "digests": { "md5": "8d9e17ea5e3a7b70c2c22e4daf1ce074", "sha256": "10e6bfc2094c7f87131b1ee17f284be862025a188b38322eb537488f2393197f" }, "downloads": -1, "filename": "r2c_inputset_generator-0.2.0-py3-none-any.whl", "has_sig": false, "md5_digest": "8d9e17ea5e3a7b70c2c22e4daf1ce074", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": ">=3.6", "size": 38842, "upload_time": "2019-07-31T15:13:43", "url": "https://files.pythonhosted.org/packages/fe/45/54e4b38948f6786069f3be6ac1756f3ced4131b9537727edb5df682aeb04/r2c_inputset_generator-0.2.0-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "69e7b305d43ea0e9cca87829702f4f03", "sha256": "ba2e783aa62d57cc38a6f5d1b67db49c50bc081d58f73a9933cb80e646166c59" }, "downloads": -1, "filename": "r2c-inputset-generator-0.2.0.tar.gz", "has_sig": false, "md5_digest": "69e7b305d43ea0e9cca87829702f4f03", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.6", "size": 26088, "upload_time": "2019-07-31T15:13:45", "url": "https://files.pythonhosted.org/packages/c2/2c/20357ac422fdf9eb6c2e28f80c5c8395ecd52608c9cd3254c14b24f90483/r2c-inputset-generator-0.2.0.tar.gz" } ], "0.2.1": [ { "comment_text": "", "digests": { "md5": "fcda18c018009a3979799a14ebd1ccd4", "sha256": "801ae5d848c1adf33f52d5f7a1a99330b5349be8055e4a916c85d88a9f0896c9" }, "downloads": -1, "filename": "r2c_inputset_generator-0.2.1-py3-none-any.whl", "has_sig": false, "md5_digest": "fcda18c018009a3979799a14ebd1ccd4", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": ">=3.6", "size": 39044, "upload_time": "2019-07-31T15:41:34", "url": "https://files.pythonhosted.org/packages/2c/ac/e8f9163e9c5893435b638e528063bb0c9707c3202de162b4f97ea1c5ef31/r2c_inputset_generator-0.2.1-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "1253c85e9b90a90fdb91c5d8720d7cb8", "sha256": "42b0e07743137037493c46858f48df5f394dd29a46e47ff3cd699fb6903f4cc7" }, "downloads": -1, "filename": "r2c-inputset-generator-0.2.1.tar.gz", "has_sig": false, "md5_digest": "1253c85e9b90a90fdb91c5d8720d7cb8", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.6", "size": 26448, "upload_time": "2019-07-31T15:41:36", "url": "https://files.pythonhosted.org/packages/6b/f3/0aa9a8326a35c1737c3c00e4bb950822cbab335adf26d267b10014c76038/r2c-inputset-generator-0.2.1.tar.gz" } ], "0.2.2": [ { "comment_text": "", "digests": { "md5": "b26a45797d150fba4a1bcb3dc48bbadf", "sha256": "e951beb07dfd4378a6d8da2b5762998561ac01f9613f0f7898c66bfd7380af67" }, "downloads": -1, "filename": "r2c_inputset_generator-0.2.2-py3-none-any.whl", "has_sig": false, "md5_digest": "b26a45797d150fba4a1bcb3dc48bbadf", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": ">=3.6", "size": 39043, "upload_time": "2019-07-31T17:46:12", "url": "https://files.pythonhosted.org/packages/2f/5f/969b2a5d0b2156830ac3d00f0ad2b7ee9e75da36adbb0f826466f0aeba20/r2c_inputset_generator-0.2.2-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "cc02f4573a6fe201f5990381d9df1708", "sha256": "64397fd881695da6aad3ef6044c6229edd4ff79ba15e572c9cc68c5e80b9dfef" }, "downloads": -1, "filename": "r2c-inputset-generator-0.2.2.tar.gz", "has_sig": false, "md5_digest": "cc02f4573a6fe201f5990381d9df1708", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.6", "size": 26432, "upload_time": "2019-07-31T17:46:13", "url": "https://files.pythonhosted.org/packages/c7/ec/b6d8767b1951c03812ff4c470284f524ce25b78b7e9bc9c6fdb844325edf/r2c-inputset-generator-0.2.2.tar.gz" } ], "0.2.3": [ { "comment_text": "", "digests": { "md5": "c6b02c51ee9f09b2a1ad823057534255", "sha256": "f7846bd6ec49517c4559a394d1c0b1bb21128958cc741a6b19a6f067d6d71f15" }, "downloads": -1, "filename": "r2c_inputset_generator-0.2.3-py3-none-any.whl", "has_sig": false, "md5_digest": "c6b02c51ee9f09b2a1ad823057534255", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": ">=3.6", "size": 39101, "upload_time": "2019-08-02T14:56:19", "url": "https://files.pythonhosted.org/packages/d4/c2/b19f46bed7c25b50f9c14ede73558b40a2c3f79d758226db50ddf2a34de1/r2c_inputset_generator-0.2.3-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "73025341092b7112f116b65bfbcbade3", "sha256": "24b8b4ffff7518ff46031ce031f700e428bdd74b7a31a2e898e3dd274b91ec2c" }, "downloads": -1, "filename": "r2c-inputset-generator-0.2.3.tar.gz", "has_sig": false, "md5_digest": "73025341092b7112f116b65bfbcbade3", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.6", "size": 26500, "upload_time": "2019-08-02T14:56:20", "url": "https://files.pythonhosted.org/packages/74/2b/35128ea91597e4e1fb34de62bb04902632b9f1211daac80c24382ca7c458/r2c-inputset-generator-0.2.3.tar.gz" } ], "0.2.4": [ { "comment_text": "", "digests": { "md5": "18c320f78b8458153b5a1b6297d6f156", "sha256": "1e58e319cd40b791c751bfe85fe30e4eec80e6aac3f0be13dae06f06da4617ec" }, "downloads": -1, "filename": "r2c_inputset_generator-0.2.4-py3-none-any.whl", "has_sig": false, "md5_digest": "18c320f78b8458153b5a1b6297d6f156", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": ">=3.6", "size": 39209, "upload_time": "2019-08-02T15:55:50", "url": "https://files.pythonhosted.org/packages/aa/4a/92dcdd581ee49e11ed7c97922482939c301ad13e3c7974972935d3c75f88/r2c_inputset_generator-0.2.4-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "8eaaad85a26114d9f6ccae1231e002bf", "sha256": "762383a8cba0c042653d44600c47e41e61d5490c2212abc6cf305ae89eb14e91" }, "downloads": -1, "filename": "r2c-inputset-generator-0.2.4.tar.gz", "has_sig": false, "md5_digest": "8eaaad85a26114d9f6ccae1231e002bf", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.6", "size": 26605, "upload_time": "2019-08-02T15:55:52", "url": "https://files.pythonhosted.org/packages/e9/a0/857187eeaefb826008271863d44e67705b542a9f74d1e14ce27d7de0991e/r2c-inputset-generator-0.2.4.tar.gz" } ], "0.2.5": [ { "comment_text": "", "digests": { "md5": "7c1bbfde51bf95759100ac801c33fab3", "sha256": "9725b014ed5c0d8dc38b85650c2130dc859945bf5e3f4e8b4fd8953bc0ad01fa" }, "downloads": -1, "filename": "r2c_inputset_generator-0.2.5-py3-none-any.whl", "has_sig": false, "md5_digest": "7c1bbfde51bf95759100ac801c33fab3", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": ">=3.6", "size": 39297, "upload_time": "2019-09-19T18:30:28", "url": "https://files.pythonhosted.org/packages/86/7f/f66867d3cf3de38a4d26b4f2afcf1213d49ccb9c4647a83199414793b91e/r2c_inputset_generator-0.2.5-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "8cbed89d8f51dc233e711582f1d51d71", "sha256": "1836ff5d517fc031514e54b49b265cc17ce6192899f9539d819a84a292e6e2f4" }, "downloads": -1, "filename": "r2c-inputset-generator-0.2.5.tar.gz", "has_sig": false, "md5_digest": "8cbed89d8f51dc233e711582f1d51d71", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.6", "size": 26717, "upload_time": "2019-09-19T18:30:29", "url": "https://files.pythonhosted.org/packages/90/0e/b9cb32e3106aa90f05f1acfce5ba7ff42a05b53c8062a3ee003878449e92/r2c-inputset-generator-0.2.5.tar.gz" } ], "0.2.6": [ { "comment_text": "", "digests": { "md5": "14f222a02d11bc0d2492299ac465fb1c", "sha256": "85cb0cd8c6145d194ee00c2e2041f768e79a56e25388ceb273e9f21f58ecdfa4" }, "downloads": -1, "filename": "r2c_inputset_generator-0.2.6-py3-none-any.whl", "has_sig": false, "md5_digest": "14f222a02d11bc0d2492299ac465fb1c", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": ">=3.6", "size": 39563, "upload_time": "2019-10-09T14:41:13", "url": "https://files.pythonhosted.org/packages/35/aa/4017fde0e036961f95af2c12c6b76132322a998462310a01e37b715c384f/r2c_inputset_generator-0.2.6-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "300f8047029ecce690b8e821b11eadb1", "sha256": "acde046b914d425d796918de7ae587c1a430e641e6132298a06c8914fa11051b" }, "downloads": -1, "filename": "r2c-inputset-generator-0.2.6.tar.gz", "has_sig": false, "md5_digest": "300f8047029ecce690b8e821b11eadb1", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.6", "size": 26873, "upload_time": "2019-10-09T14:41:16", "url": "https://files.pythonhosted.org/packages/c5/3a/6e3416978abed4aa42b78750f595d92b0a29871365dc9ca19f0827986fd5/r2c-inputset-generator-0.2.6.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "14f222a02d11bc0d2492299ac465fb1c", "sha256": "85cb0cd8c6145d194ee00c2e2041f768e79a56e25388ceb273e9f21f58ecdfa4" }, "downloads": -1, "filename": "r2c_inputset_generator-0.2.6-py3-none-any.whl", "has_sig": false, "md5_digest": "14f222a02d11bc0d2492299ac465fb1c", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": ">=3.6", "size": 39563, "upload_time": "2019-10-09T14:41:13", "url": "https://files.pythonhosted.org/packages/35/aa/4017fde0e036961f95af2c12c6b76132322a998462310a01e37b715c384f/r2c_inputset_generator-0.2.6-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "300f8047029ecce690b8e821b11eadb1", "sha256": "acde046b914d425d796918de7ae587c1a430e641e6132298a06c8914fa11051b" }, "downloads": -1, "filename": "r2c-inputset-generator-0.2.6.tar.gz", "has_sig": false, "md5_digest": "300f8047029ecce690b8e821b11eadb1", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.6", "size": 26873, "upload_time": "2019-10-09T14:41:16", "url": "https://files.pythonhosted.org/packages/c5/3a/6e3416978abed4aa42b78750f595d92b0a29871365dc9ca19f0827986fd5/r2c-inputset-generator-0.2.6.tar.gz" } ] }