{ "info": { "author": "Peter Vegh", "author_email": "peter.vegh@newcastle.ac.uk", "bugtrack_url": null, "classifiers": [ "License :: OSI Approved :: GNU General Public License v3 or later (GPLv3+)", "Operating System :: OS Independent", "Programming Language :: Python :: 3" ], "description": "# pyVDJ\n\nV(D)J sequencing data analysis\n\nThis package adds 10x Genomics V(D)J sequencing data to an [AnnData](https://anndata.readthedocs.io) object's `.uns` part, and also makes annotation columns in `.obs`.\nThis enables plotting various V(D)J properties and handling mRNA (GEX) and V(D)J sequencing data together.\n\n\n## Install\n\n`pip3 install pyvdj`\n\n\n## Usage\n\n import pyvdj\n adata = pyvdj.load_vdj(paths, samples, adata)\n adata = pyvdj.add_obs(adata, obs=['is_clone'])\n\nFor a detailed description, see the [tutorial](tutorials/pyVDJ_tutorial.html).\n\n\n## Details\n\nThe package has functions that\n* read `metrics_summary.csv` files into a pandas dataframe.\n* load `filtered_contig_annotations.csv` files into an AnnData object.\n* create various statistics and annotations in the AnnData object.\n\n\n### Read metrics\n\nThe `read10xsummary` function requires a list of paths to `metrics_summary.csv` files, and optionally a dictionary of path:samplename. It returns a dataframe of the metrics.\n\n\n### Load V(D)J data\n\nThe `load_vdj` function loads 10x V(D)J sequencing data (`filtered_contig_annotations.csv` files) into an AnnData object's `.uns['pyvdj']` slot, and returns the object. The `adata.uns['pyvdj']` slot is a dictionary which has the following elements:\n* `'df'`: a dataframe containing V(D)J data\n* `'obs_col'`: the `anndata.obs` columname of matching cellnames.\n* `'samples'`: a dictionary of filename:samplename\n\nIf an anndata object is not supplied, the function returns the dictionary.\n\nArguments:\n* `paths`: list of paths to filtered_contig_annotations.csv files.\n* `samples`: a dictionary of path:samplename.\n* `adata`: the AnnData object.\n* `add_obs`: whether to add some default .obs metadata columns.\n\n\n### Add annotations\n\nThe `adata.uns['pyvdj']['df']` is a pandas dataframe of the V(D)J data, with two additional columns that contain unique cell barcode and clonotype labels. These are generated using the user-supplied sample names: `cellbarcode + '_' + samplename` and `clonotype + '_' + samplename`.\n\nThese unique cell names are used to match the V(D)J cells to the AnnData `.X` cells, using `adata.obs['vdj_obs']`. The user has to prepare this column using the cell barcodes and the sample names.\n\nThe `add_obs` function can add the following annotations:\n* `'has_vdjdata'`: does the cell have V(D)J sequencing data?\n* `'clonotype'`: add clonotype name\n* `'is_clone'`: does it have a clone?\n* `'is_productive'`: are all chains productive?\n* `'chains'`: adds annotation (True, False, No_data) for each chain\n* `'genes'`: adds annotation (True, False, No_data) for each constant gene\n* `'v_genes'`: adds annotation (True, False, No_data) for each variable gene\n* `'j_genes'`: adds annotation (True, False, No_data) for each joining gene\n* `'clone_count'`: adds clone count annotation\n\n\n### Definitions\n\n* Clone: a cell whose TCR is identical to another cell, within the same individual (donor, organism).*\n* Clonotype: a set of all cells with the same TCR in the same individual (donor). A clonotype can have 1 or more cells.**\n* Clone count (of a clonotype): number of clones in the clonotype.\n* Public TCR (or CDR3) sequence: these are common and occur in multiple (or all) donors.\n* Private TCR (or CDR3) sequence: these are unique to one donor.\n* Condition-specific TCR (or CDR3) sequence: these occur in donors with a condition (disease, treated etc). These are private (unique) to the condition.\n\n_The above definitions are understood in the context of the sequenced cells._\n\n*As determined by Cell Ranger.\n**Note that Cell Ranger v2 does not assign a clonotype id to clonotypes with only 1 clone, but uses \u2018None\u2019. Cell Ranger v3 does assign a clonotype id to all cells.\n\n\n### CDR3 specificity\n\nWe can retrieve CDR3 amino acid sequences for given clonotypes using\n\n pyvdj.get_spec(adata, clonotypes = [clonotype1_sampleA', 'clonotype3_sampleB'])\n\nwhich returns a dictionary. This can be used to find specificity in CDR3 databases, such as [VDJdb](http://vdjdb.cdr3.net) or [McPAS-TCR](http://friedmanlab.weizmann.ac.il/McPAS-TCR/).\n\n\n### Clonotype statistics\n\nWe can generate and [plot various statistics](tutorials/pyVDJ_tutorial.html) on clonotypes and diversity.\n\n adata = pyvdj.stats(adata, meta)\n\nThis function adds a dictionary of statistics on the VDJ data (`adata.uns['pyvdj']['stats'][meta]`),\ngrouped by categories in the `adata.obs[meta]` column. Keys:\n\n* `'meta'` stores the adata.obs columname\n* `'cells'` count of cells, and cells with VDJ data per category\n* `'clonotype_counts'` number of different clonotypes per category\n* `'clonotype_dist'` clone count distribution\n* `'shared_cdr3'` dictionary of cdr3 - cell\n\n\n### Public and private CDR3 sequences\n\nWe can [find TCR-specificity shared between samples](tutorials/pyVDJ_tutorial.html), donors or any other annotation category.\n\n adata = pyvdj.find_clones(adata, sample_dict)\n\nThis function returns AnnData with clonotype annotation, where clonotypes shared between 10x samples within donor (organism, individual) are combined to have the same clonotype ID.\n`'sample_dict'` is a dictionary of sample:donor, matching 10x samples (channels, as specified when the 10x VDJ data was loaded) to donors.\n\n\n### CDR3-similarity graph\n\nA set of prototype functions build CDR3-similarity graphs using [Levenshtein distances](https://en.wikipedia.org/wiki/Levenshtein_distance). The nodes are the CDR3 sequences, and edges connect nodes with Levenshtein distance of 1.\n\n cdr3_dict = pyvdj.get_cdr3(adata) # get CDR3s for each sample\n dist = pyvdj.get_dist(cdr3_dict, sample) # calculate distances (adjacency matrix)\n g = pyvdj.graph_cdr3(dist) # returns an igraph graph object.\n\nThis requires the python-Levenshtein and the igraph-python packages.\n\n\n### Dependencies\n\nThe package was originally developed for data made with Cell Ranger v2.1.1 (Chemistry: Single Cell V(D)J; V(D)J reference: GRCh38-alts-ensembl) and has been tested to work with Cell Ranger v3.1.0 data, with the following Python (v3.6.9) package versions:\n\n pandas 0.25.1\n anndata 0.6.21\n scanpy 1.4.3\n\n\n\n", "description_content_type": "text/markdown", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/veghp/pyVDJ", "keywords": "", "license": "", "maintainer": "", "maintainer_email": "", "name": "pyvdj", "package_url": "https://pypi.org/project/pyvdj/", "platform": "", "project_url": "https://pypi.org/project/pyvdj/", "project_urls": { "Homepage": "https://github.com/veghp/pyVDJ" }, "release_url": "https://pypi.org/project/pyvdj/0.1.1/", "requires_dist": [ "pandas", "anndata" ], "requires_python": "", "summary": "V(D)J sequencing data analysis", "version": "0.1.1" }, "last_serial": 6004656, "releases": { "0.1.0": [ { "comment_text": "", "digests": { "md5": "48ca025b163d63b213dc8a423b5577ce", "sha256": "3c09214d862e25f8adc647d10e4a8f7a94864dd054ef760f0b7cc9bd52333d13" }, "downloads": -1, "filename": "pyvdj-0.1.0-py3-none-any.whl", "has_sig": false, "md5_digest": "48ca025b163d63b213dc8a423b5577ce", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 20129, "upload_time": "2019-09-01T18:00:04", "url": "https://files.pythonhosted.org/packages/7d/48/d3c352aff023528278452bd130adc3e82b3201ac8bf2081a06e81560c97d/pyvdj-0.1.0-py3-none-any.whl" } ], "0.1.1": [ { "comment_text": "", "digests": { "md5": "60683598ccac8a9e5ece052bfb4ab798", "sha256": "080e5998f8ccd61d3f952bcb0ad151fbdf912becdec6f6f94d9a8eb667976900" }, "downloads": -1, "filename": "pyvdj-0.1.1-py3-none-any.whl", "has_sig": false, "md5_digest": "60683598ccac8a9e5ece052bfb4ab798", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 27848, "upload_time": "2019-10-20T21:02:14", "url": "https://files.pythonhosted.org/packages/fd/87/5df94cefb66dc7fb27c1c53598f4c1bc87dee4e1afc16ce12c354b98c5ca/pyvdj-0.1.1-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "776185270990ae1ff012313db128e300", "sha256": "a6f6eb53a603dbf56edf18ef3b4cbcd671282ffbb3946609549fc3fdba35f8d9" }, "downloads": -1, "filename": "pyvdj-0.1.1.tar.gz", "has_sig": false, "md5_digest": "776185270990ae1ff012313db128e300", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 10897, "upload_time": "2019-10-20T21:02:16", "url": "https://files.pythonhosted.org/packages/62/18/94657e57491935dd945b13814a07e9957958534c5007626c5146d937474b/pyvdj-0.1.1.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "60683598ccac8a9e5ece052bfb4ab798", "sha256": "080e5998f8ccd61d3f952bcb0ad151fbdf912becdec6f6f94d9a8eb667976900" }, "downloads": -1, "filename": "pyvdj-0.1.1-py3-none-any.whl", "has_sig": false, "md5_digest": "60683598ccac8a9e5ece052bfb4ab798", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 27848, "upload_time": "2019-10-20T21:02:14", "url": "https://files.pythonhosted.org/packages/fd/87/5df94cefb66dc7fb27c1c53598f4c1bc87dee4e1afc16ce12c354b98c5ca/pyvdj-0.1.1-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "776185270990ae1ff012313db128e300", "sha256": "a6f6eb53a603dbf56edf18ef3b4cbcd671282ffbb3946609549fc3fdba35f8d9" }, "downloads": -1, "filename": "pyvdj-0.1.1.tar.gz", "has_sig": false, "md5_digest": "776185270990ae1ff012313db128e300", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 10897, "upload_time": "2019-10-20T21:02:16", "url": "https://files.pythonhosted.org/packages/62/18/94657e57491935dd945b13814a07e9957958534c5007626c5146d937474b/pyvdj-0.1.1.tar.gz" } ] }