{ "info": { "author": "Jasleen Grewal", "author_email": "grewalj23@gmail.com", "bugtrack_url": null, "classifiers": [ "Development Status :: 5 - Production/Stable", "Intended Audience :: Healthcare Industry", "Programming Language :: Python :: 2.7", "Topic :: Scientific/Engineering :: Artificial Intelligence", "Topic :: Scientific/Engineering :: Bio-Informatics", "Topic :: Scientific/Engineering :: Medical Science Apps." ], "description": "# Cancerscope for SCOPE\n[](https://pypi.python.org/pypi/cancerscope)\n[](https://coveralls.io/github/jasgrewal/cancerscope?branch=master)\n[](https://travis-ci.org/jasgrewal/cancerscope)\n[](http://cancerscope.readthedocs.io/?badge=latest)\n[](https://opensource.org/licenses/MIT) \n[](https://www.python.org/)\n[](https://pypi.python.org/pypi/cancerscope/) \n\nSCOPE, Supervised Cancer Origin Prediction using Expression, is a method for predicting the tumor type (or matching normal) of an RNA-Seq sample. \nSCOPE's python package, **cancerscope**, allows users to pass the RPKM values with matching Gene IDs and receive a set of probabilities across 66 different categories (40 tumor types and 26 healthy tissues), that sum to 1. Users can optionally generate plots visualizing each sample's classification as well. \n\nSince SCOPE is an ensemble-based approach, it is possible to train additional models and include them in the ensemble that SCOPE uses (Instructions forthcoming). \n\n# The current PyPi release does not support Python 3.x due to issues with plotting library support. \n\n## Installation \nBefore installing **cancerscope**, you will need to install the correct version of the packages [lasagne](https://lasagne.readthedocs.io/en/latest/) and [theano](https://pypi.org/project/Theano/). \n`pip install --upgrade https://github.com/Theano/Theano/archive/master.zip` \n`pip install --upgrade https://github.com/Lasagne/Lasagne/archive/master.zip` \n\n### Automated Install \nOnce you have the latest lasagne and theano python packages installed, you can set up **cancerscope** using the command `pip install cancerscope`. \n\nAt initial install, cancerscope will attempt to download the models needed for prediction. This may take a while depending on your internet connection (3-10 minutes). Please ensure you have a reliable internet connection and atleast 5 GB of space before proceeding with install. \n\n## Setup and Usage \nTo get started with SCOPE, launch a python instance and run: \n`>>> import cancerscope` \n\nIncase the download was unsuccessful at the time of package install, the first time you import cancerscope, the package will attempt to set up a local download of the models needed for prediction. Please be patient as this will take a while (3-10 minutes). \n\n### Prediction - Example \nPrediction can be performed from a pre-formatted input file, or by passing in the data matrix. Please refer to the [tutorial](tutorial/README.md) and [detailed documentation](DETAILED_EXPL.md.md) for more information. \n\nThe commands are as simple as follows: \n`>>> import cancerscope as cs` \n`>>> scope_obj = cs.scope()` \n\nThis will set up the references to the requires SCOPE models. \n\nNext, you can process the predictions straight from the input file: \n`>>> predictions_from_file = scope_obj.get_predictions_from_file(filename) ` \nHere, the input file should be prepared as follows. Columns should be tab-separated, with unique sample IDs. The first column is always the Gene identifier (Official HUGO ID, Ensemble Gene ID, or Gencode). An example is shown with the first 2 rows of input. \n\n| ENSEMBL | Sample 1 | Sample 2 | ... |\n|---|---|---|---|\n|ENSG000XXXXX| 0.2341 | 9451.2 | .... |\n\n...or you can pass in the data matrix, list of sample names, list of feature names, the type of gene names (ENSG, HUGO etc), and optionally, the list of sample names. \n`>>> predictions = scope_obj.predict(` \n`\tX = numpy_array_X, ` \n`\tx_features = list_of_features, `\n`\tx_features_genecode = string_genecode, `\n`\tx_sample_names = list_of_sample_names)` \n\nThe output will look like this: \n\n|'ix'|`sample_ix`|`label`|`pred`|`freq`|`models`|`rank_pred`|`sample_name`|\n|---|---|---|---|---|---|---|---|\n|0|0|BLCA\\_TS|0.268193|2|v1\\_none17kdropout,v1\\_none17k|1|test1|\n|1|0|LUSC\\_TS|0.573807|1|v1\\_smotenone17k|2|test1|\n|2|0|PAAD\\_TS|0.203504|1|v1\\_rm500|3|test1|\n|3|0|TFRI\\_GBM\\_NCL\\_TS|0.552021|1|v1\\_rm500dropout|4|test1|\n|4|1|ESCA\\_EAC\\_TS|0.562124|2|v1\\_smotenone17k,v1\\_none17k|1|test2|\n|5|1|HSNC\\_TS|0.223115|1|v1\\_rm500|2|test2|\n|6|1|MB-Adult\\_TS|0.743373|1|v1\\_none17kdropout|3|test2|\n|7|1|TFRI\\_GBM\\_NCL\\_TS|0.777685|1|v1\\_rm500dropout|4|test2|\n\nHere, 2 samples, called *test1* and *test2*, were processed. The top prediction from each model in the ensemble was taken, and aggregated. \n- For instance, 2 models predicted that 'BLCA\\_TS' was the most likely class for *test1*. The column **freq** gives you the count of contributing models for a prediction, and the column **models** lists these models. The other 3 models had a prediction of 'LUSC\\_TS', 'PAAD\\_TS', and 'TFRI\\_GBM\\_NCL\\_TS' respectively. \n- You can use the rank of the predictions, shown in the column **rank\\_pred**, to filter out the prediction you want to use for interpretation. \n- When SCOPE is highly confident in the prediction, you will see **freq** = 5, indicating all models have top-voted for the same class. \n\n### Visualizing or exporting results - Example \n**cancerscope** can also automatically generate plots for each sample, and save the prediction dataframe to file. This is done by passing the output directory to the prediction functions: \n`>>> predictions_from_file = scope_obj.get_predictions_from_file(filename, outdir = output_folder) ` \n`>>> predictions = scope_obj.predict(X = numpy_array_X, x_features = list_of_features, x_features_genecode = string_genecode, x_sample_names = list_of_sample_names, **outdir = output_folder**)` \n\nThis will automatically save the dataframe returned from the prediction functions as `output_folder + /SCOPE_topPredictions.txt`, and the predictions from all models across all classes as `output_folder + /SCOPE_allPredictions.txt`. \n\nSample specific plots are also generated automatically in the same directory, and labelled `SCOPE_sample-SAMPLENAME_predictions.svg`. \n\n
\n \n