{ "info": { "author": "Ryan Marren", "author_email": "rymarr@tuta.io", "bugtrack_url": null, "classifiers": [ "Development Status :: 3 - Alpha" ], "description": "# ioexplorer-dataloader\n\n_This repository contains the code for a command line tool to manage and ingest datasets into a Postgres database, for later use by the IOExplorer web application._\n\n## Pre-Requisites\nThis CLI has three main depencies: `docker` and `node`, and `python` (specifically Python 3).\n\nInstallation instructions for each can be found below:\n- `docker`: https://docs.docker.com/install/\n- `node`: https://nodejs.org/en/\n- `python`: https://www.python.org/downloads/\n\nOnce `node` is installed, you will also need to globally install some packages which are used to interact with the database.\n```\nnpm i -g sequelize sequelize-cli pg\n```\n\nWith some environments, you will get a permission error when you attempt to install these packages. There is a [good article](https://docs.npmjs.com/getting-started/fixing-npm-permissions) on how to fix your environment to avoid these errors.\n\nTo do a final check to make sure all software is installed, run the following:\n```\ndocker --version\nnode --version\nnpm --version\nsequelize --version\npython --version\n```\nNote that `python --version` should return something starting with `3`.\n## Installing the CLI\n Installing the CLI:\n simply,\n ```\n pip install ioexplorer-dataloader\n iodl --help\n ```\n\n Make sure you have the most up to date version!\n Run the following if you have downloaded in the past:\n ```\n pip install ioexplorer-dataloader --upgrade\n ```\n ## Example Workflows\n ### 1. Basics\n _A workflow to help you get familiar with the basics of the dataloader._\n\n##### Setting environment variables\nTo start, we need to set environment variables so that we can interact with a database.\nFor example, to set up a development database, we can create a file called `development.env` with the following contents:\n```\nNODE_ENV=development\nIOEXPLORER_MODE=development\nIOEXPLORER_DEVELOPMENT_DATABASE_NAME=ioexplorerdb\nIOEXPLORER_DEVELOPMENT_DATABASE_HOST=127.0.0.1\nIOEXPLORER_DEVELOPMENT_DATABASE_PORT=5432\nIOEXPLORER_DEVELOPMENT_DATABASE_USERNAME=root\nIOEXPLORER_DEVELOPMENT_DATABASE_PASSWORD=password\n\n```\nThen call `set -a && source development.env`.\n_Note: `set -a` will cause all bash variables which are modified to be exported. This is the equivalent of calling `export ` for each `` in `development.env`._\n\nIn order to use a production database, you would instead work with a file called `production.env` which would look slightly different\n```\nNODE_ENV=production\nIOEXPLORER_MODE=production\nIOEXPLORER_PRODUCTION_DATABASE_NAME=ioexplorerdb\nIOEXPLORER_PRODUCTION_DATABASE_HOST=127.0.0.1\nIOEXPLORER_PRODUCTION_DATABASE_PORT=5432\nIOEXPLORER_PRODUCTION_DATABASE_USERNAME=root\nIOEXPLORER_PRODUCTION_DATABASE_PASSWORD=password\nIOEXPLORER_GRAPHQL_URL=http://api:4000/graphql\n\n```\n_Note: The differences here are that `DEVELOPMENT` is replaced with `PRODUCTION` for many of the environment variables. This is so you can have production and development variables loaded at the same time and easily toggle between the two contexts by setting `IOEXPLORER_MODE` and `NODE_ENV`. Also, the environment variable `IOEXPLORER_GRAPHQL_URL` is added in production, since the processes will be interconnected via a docker network in production rather than via localhost in development. Be sure to `unset IOEXPLORER_GRAPHQL_URL` in development, or else your graphql client will attempt to connet to the graphql api at the wrong url._\n##### Starting a database.\n\nNote that the database is not live yet, we only set our environment properly to connect to the database. Start the database by running\n```\n$ iodl database start\n```\n\nThe database should now be started. The database just a docker container running the [postgres](https://hub.docker.com/_/postgres/) image, so you can see it being run with `docker ps`.\n\n##### Opening a `psql` shell into a database.\nNow lets open a `psql` shell connected to our newly created database:\n```\n$ iodl database shell\npsql (11.0 (Debian 11.0-1.pgdg90+2))\nType \"help\" for help.\n\nioexplorerdb=# \\dt\nDid not find any relations.\n```\n\nThe message `Did not find any relations.` lets us know that this database is completely empty and schemaless.\n\n##### Applying migrations to our database.\nThe `iodl` CLI has a copy of all migrations used to produce the current production version of the IOExplorer database schema. To apply all these migrations run:\n```\n$ iodl database migrate\n```\n\nNow if you open another `psql` shell and list the relations, you get the expected:\n```\n$ iodl database shell\npsql (11.0 (Debian 11.0-1.pgdg90+2))\nType \"help\" for help.\n\nioepxlorerdb=# \\dt\n List of relations\n Schema | Name | Type | Owner\n--------+---------------+-------+-------\n public | SequelizeMeta | table | root\n public | cnas | table | root\n public | datasets | table | root\n public | fusions | table | root\n public | mutations | table | root\n public | samples | table | root\n public | subjects | table | root\n public | svs | table | root\n public | timelines | table | root\n(9 rows)\n```\n\nNow our database is ready to get some data.\n\n\n##### Initializing a dataset\n\n`cd` into a dataset to upload. Ask Ryan for one if you do not have any.\n**TODO**: upload example dataset.\n\nThe dataset should have the following directory structure:\n```\n.\n\u251c\u2500\u2500 data_clinical_patient.txt (R)\n\u251c\u2500\u2500 data_clinical_sample.txt (R)\n\u251c\u2500\u2500 data_CNA.txt\n\u251c\u2500\u2500 data_fusions.txt\n\u251c\u2500\u2500 data_expression.fpkm.txt\n\u251c\u2500\u2500 data_expression.rld.txt\n\u251c\u2500\u2500 data_expression.raw.txt\n\u251c\u2500\u2500 data_mutations_extended.txt\n\u251c\u2500\u2500 data_SV.txt\n\u2514\u2500\u2500 data_timeline.txt\n```\nNote: _Only the files denoted with an (R) are actually required_.\n\nWe now want to _initialize_ the dataset. This step will\n1. Run some quick validations to make sure the data structure is correct.\n2. Collect some meta-information from the user.\n3. Write a `config.yaml` file which stores information about this dataset and helps with ingestion.\n\nRun:\n```\n(dataloader) ryan@galliumos:~/MSK/data/Hugo$ iodl dataset init\nINFO: Initializing new dataset!\n\n...\n Some success messages will appear here, or a prompt will ask you if you would like to continue with missing data.\n...\n\n? What is the dataset name? my-dataset\n? What is a description of the dataset? this is a test dataset...\n? Enter link to paper. http://google.com\n? Who are you (person uploading data)? Ryan\nSUCCESS: Thanks! I made a file called `config.yaml` in this directory! Check it out and make sure everything looks OK!\n```\n\n##### Ingesting a dataset\nWith the `config.yaml` file already formed, ingesting the database is very simple.\n```\n$ iodl dataset ingest\n```\n\nIf there are any problems during ingestion, an error will be thrown and the data that already made it into the database (before the error) will be deleted. This will let you diagnose any problems with the data ingestion and re-attempt ingestion without messing with the state of the database.\n\n### 2. Production on AWS\nThere are some subtle differences in the above when running on the production AWS server:\n\n1. Instead of `pip`, you should use `pip-3.6`.\n2. The environment variables are located in `~/ioexplorer/production.env`.\n3. `iodl database shell` will no longer work since the production database is running in a docker swarm. If you would like to shell into the datbase, you can find the id of the database container with\n`docker container ls`.\nLook for the line where the `IMAGE` is `postgres:latest` and the `NAMES` is something starting with `ioexplorer_database`. Copy the name string, which should look like `ioexplorer_database.1.`. Then, execute `docker container exec -it ioexplorer_database.1. psql -d $IOEXPLORER_PRODUCTION_DATABASE_NAME`", "description_content_type": "text/markdown", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "", "keywords": "", "license": "", "maintainer": "", "maintainer_email": "", "name": "ioexplorer-dataloader", "package_url": "https://pypi.org/project/ioexplorer-dataloader/", "platform": "", "project_url": "https://pypi.org/project/ioexplorer-dataloader/", "project_urls": null, "release_url": "https://pypi.org/project/ioexplorer-dataloader/0.0.15/", "requires_dist": null, "requires_python": "", "summary": "A CLI for ingesting data into the IOExplorer database.", "version": "0.0.15" }, "last_serial": 5607908, "releases": { "0.0.1": [ { "comment_text": "", "digests": { "md5": "13c5d49f3dfa520d0ca5d60faf1b4ea2", "sha256": "bc133bf9de3778aee1201c6dd3c787fc5329db56171544cafbcfeacd440c1e67" }, "downloads": -1, "filename": "ioexplorer-dataloader-0.0.1.tar.gz", "has_sig": false, "md5_digest": "13c5d49f3dfa520d0ca5d60faf1b4ea2", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 11273, "upload_time": "2018-11-05T20:38:42", "url": "https://files.pythonhosted.org/packages/23/14/c06b88fc82bc7d93a6c9a7b461c4fcf033cee78e635c2be630c44081cbdb/ioexplorer-dataloader-0.0.1.tar.gz" } ], "0.0.10": [ { "comment_text": "", "digests": { "md5": "c77ba09de886cf1710bc262b97b1c914", "sha256": "869958ee6bf4cb8d57fab3112183208190b994455f07c3dfcd8233fc32fbc0d6" }, "downloads": -1, "filename": "ioexplorer-dataloader-0.0.10.tar.gz", "has_sig": false, "md5_digest": "c77ba09de886cf1710bc262b97b1c914", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 20562, "upload_time": "2018-12-17T22:28:40", "url": "https://files.pythonhosted.org/packages/4e/b4/2b96664d3a44ad921202690c6c31c2755e9b2b6cc725a5fb1b07dccc6845/ioexplorer-dataloader-0.0.10.tar.gz" } ], "0.0.11": [ { "comment_text": "", "digests": { "md5": "5576f721a88bf38a352b63156f79b39a", "sha256": "4e3f4b5d2ea583d127833d9a0db7570823896a24be581c7019b814f2e6880465" }, "downloads": -1, "filename": "ioexplorer-dataloader-0.0.11.tar.gz", "has_sig": false, "md5_digest": "5576f721a88bf38a352b63156f79b39a", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 20595, "upload_time": "2018-12-18T20:11:35", "url": "https://files.pythonhosted.org/packages/5a/26/a8e360654a8c0aa5babeee72f12b78c1b45626810c69a519245b6d616dc1/ioexplorer-dataloader-0.0.11.tar.gz" } ], "0.0.12": [ { "comment_text": "", "digests": { "md5": "ba843efaf59ea254c42987ad4ae524e1", "sha256": "d0d8f157049d3c6b881cbf84f3631bdddf0931ec95df8afbae4066f5ab051aaf" }, "downloads": -1, "filename": "ioexplorer-dataloader-0.0.12.tar.gz", "has_sig": false, "md5_digest": "ba843efaf59ea254c42987ad4ae524e1", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 21126, "upload_time": "2019-01-11T15:09:58", "url": "https://files.pythonhosted.org/packages/d1/66/1b34d92aa64ec606c16559adc9f82f1ee8a9ef3845ff2a89e055877556ab/ioexplorer-dataloader-0.0.12.tar.gz" } ], "0.0.13": [ { "comment_text": "", "digests": { "md5": "0704d4ddd281142652b9ab732033105d", "sha256": "76f27f7b0026942804c75a4b539f25dedf46986403b8d9126436bd640e467b77" }, "downloads": -1, "filename": "ioexplorer-dataloader-0.0.13.tar.gz", "has_sig": false, "md5_digest": "0704d4ddd281142652b9ab732033105d", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 21586, "upload_time": "2019-01-23T15:56:06", "url": "https://files.pythonhosted.org/packages/0e/4c/d23d3051e8f5e3f625ff14cc1675c67a95e7914e364a6bb638949e1d1df7/ioexplorer-dataloader-0.0.13.tar.gz" } ], "0.0.14": [ { "comment_text": "", "digests": { "md5": "4aa493ebbd1ad73aebaecdd8ea2a895f", "sha256": "7ef04bc88530fb22a3bbe55c53bdeee932218eb5f693201352f886f84d85f544" }, "downloads": -1, "filename": "ioexplorer-dataloader-0.0.14.tar.gz", "has_sig": false, "md5_digest": "4aa493ebbd1ad73aebaecdd8ea2a895f", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 21754, "upload_time": "2019-02-04T18:13:55", "url": "https://files.pythonhosted.org/packages/25/87/7363b41e54a8b59775d1bf78250f44641cb59d88bda7fd524c40d6aa627a/ioexplorer-dataloader-0.0.14.tar.gz" } ], "0.0.15": [ { "comment_text": "", "digests": { "md5": "7d6c29c217ab74d0b0f06e985ddb3325", "sha256": "db22f2aeb457722244bcfc2eca9eca4188f15b7283247e31aa3ad3111fa01295" }, "downloads": -1, "filename": "ioexplorer-dataloader-0.0.15.tar.gz", "has_sig": false, "md5_digest": "7d6c29c217ab74d0b0f06e985ddb3325", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 21604, "upload_time": "2019-03-28T18:21:59", "url": "https://files.pythonhosted.org/packages/5e/71/a2d3b6fdd4af988cf469631497227f34cb4ba73b61da6dc6c42fb8927d1f/ioexplorer-dataloader-0.0.15.tar.gz" } ], "0.0.2": [ { "comment_text": "", "digests": { "md5": "8382a8b57c15726d58de869800087e7f", "sha256": "cacca680a6a30a698d0a25d849256bab735d385dfddcd6e0d2031cd39888254c" }, "downloads": -1, "filename": "ioexplorer-dataloader-0.0.2.tar.gz", "has_sig": false, "md5_digest": "8382a8b57c15726d58de869800087e7f", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 16632, "upload_time": "2018-11-13T21:45:09", "url": "https://files.pythonhosted.org/packages/36/54/e7bb411a995cab46836e173ccdb91dd375532cb87d440aa0fd422c3bc05c/ioexplorer-dataloader-0.0.2.tar.gz" } ], "0.0.3": [ { "comment_text": "", "digests": { "md5": "25ea26f2ac13f29b7c37774846c05a45", "sha256": "e2e581fa033a0adbb4c649ede13a69b8aa704a5709e2bebd469c60105a36bf5c" }, "downloads": -1, "filename": "ioexplorer-dataloader-0.0.3.tar.gz", "has_sig": false, "md5_digest": "25ea26f2ac13f29b7c37774846c05a45", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 16949, "upload_time": "2018-11-15T12:23:03", "url": "https://files.pythonhosted.org/packages/e2/26/9c4534fa7920d1bd15ce565357311171cfdd26bc375f61a24e55226042ff/ioexplorer-dataloader-0.0.3.tar.gz" } ], "0.0.4": [ { "comment_text": "", "digests": { "md5": "1299da536f512ae5448cf5835539854d", "sha256": "d1488b996647b1b532a026743af2157b03ceed612e46f151531b21d777756171" }, "downloads": -1, "filename": "ioexplorer-dataloader-0.0.4.tar.gz", "has_sig": false, "md5_digest": "1299da536f512ae5448cf5835539854d", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 17103, "upload_time": "2018-11-16T21:48:37", "url": "https://files.pythonhosted.org/packages/75/31/2fac2d37bdc7595170dbf5451f8e4cabc93965a4947c8f8ba6841030c433/ioexplorer-dataloader-0.0.4.tar.gz" } ], "0.0.5": [ { "comment_text": "", "digests": { "md5": "66f4c8ebda6fef62c3e6007120b10ab8", "sha256": "d05064ce6e94c5370add9c9d86593b69335aeb6bd625c4afea25a3cca935e292" }, "downloads": -1, "filename": "ioexplorer-dataloader-0.0.5.tar.gz", "has_sig": false, "md5_digest": "66f4c8ebda6fef62c3e6007120b10ab8", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 17140, "upload_time": "2018-11-27T17:47:20", "url": "https://files.pythonhosted.org/packages/8a/31/95024da1bbd7ab2a0e658b9d7f254639889e9aa82606abebeaa4f4b8ba1e/ioexplorer-dataloader-0.0.5.tar.gz" } ], "0.0.6": [ { "comment_text": "", "digests": { "md5": "c66e81a0b90d542b7fc4433cac23e76d", "sha256": "961c86c37fcc3f6082d75d8c963f125ab33a25f06acd555ca7fbe04c734c3779" }, "downloads": -1, "filename": "ioexplorer-dataloader-0.0.6.tar.gz", "has_sig": false, "md5_digest": "c66e81a0b90d542b7fc4433cac23e76d", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 17883, "upload_time": "2018-12-03T19:58:05", "url": "https://files.pythonhosted.org/packages/5b/4f/f5eac4863aa246460bd107185e66ee77ebdb9a636207362fbf0154308502/ioexplorer-dataloader-0.0.6.tar.gz" } ], "0.0.7": [ { "comment_text": "", "digests": { "md5": "8d4d03903a6e0711660ba0c17646da82", "sha256": "3e6f79b6f0590589f28947251cfc827dee37a4cf84515d22e9c9aec328a1311f" }, "downloads": -1, "filename": "ioexplorer-dataloader-0.0.7.tar.gz", "has_sig": false, "md5_digest": "8d4d03903a6e0711660ba0c17646da82", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 20435, "upload_time": "2018-12-17T01:08:32", "url": "https://files.pythonhosted.org/packages/d6/d2/3e3a0168c57e4a96986d5df2e4fd32a535d57a293fc37fc894544a67eae2/ioexplorer-dataloader-0.0.7.tar.gz" } ], "0.0.8": [ { "comment_text": "", "digests": { "md5": "b10d890254aaa028873082becc9cc480", "sha256": "51b31fcf79a6cad1594d166f7f002adbbf53bc3c37ff8e9297b6a321b5a79972" }, "downloads": -1, "filename": "ioexplorer-dataloader-0.0.8.tar.gz", "has_sig": false, "md5_digest": "b10d890254aaa028873082becc9cc480", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 20491, "upload_time": "2018-12-17T19:48:13", "url": "https://files.pythonhosted.org/packages/25/6c/ee54eff1a81f503b9471bc3bb71b427e98cf359fe3c7e528b1de3a61f211/ioexplorer-dataloader-0.0.8.tar.gz" } ], "0.0.9": [ { "comment_text": "", "digests": { "md5": "1c9fd68e5672b93b43b207630f8aff52", "sha256": "22a678c55fe59d20e97f09d2bd66820bf5a664ab7f0fd3dd2ca8c46a6f931a1f" }, "downloads": -1, "filename": "ioexplorer-dataloader-0.0.9.tar.gz", "has_sig": false, "md5_digest": "1c9fd68e5672b93b43b207630f8aff52", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 20522, "upload_time": "2018-12-17T21:51:18", "url": "https://files.pythonhosted.org/packages/99/36/a508137fd0b15702650ba2f5c8f90cd5e9008951bc800edf5d370989fc8d/ioexplorer-dataloader-0.0.9.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "7d6c29c217ab74d0b0f06e985ddb3325", "sha256": "db22f2aeb457722244bcfc2eca9eca4188f15b7283247e31aa3ad3111fa01295" }, "downloads": -1, "filename": "ioexplorer-dataloader-0.0.15.tar.gz", "has_sig": false, "md5_digest": "7d6c29c217ab74d0b0f06e985ddb3325", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 21604, "upload_time": "2019-03-28T18:21:59", "url": "https://files.pythonhosted.org/packages/5e/71/a2d3b6fdd4af988cf469631497227f34cb4ba73b61da6dc6c42fb8927d1f/ioexplorer-dataloader-0.0.15.tar.gz" } ] }