{ "info": { "author": "Xiannian Zhang", "author_email": "friedpine@gmail.com", "bugtrack_url": null, "classifiers": [ "Development Status :: 3 - Alpha", "Intended Audience :: Developers", "License :: OSI Approved :: MIT License", "Programming Language :: Python :: 3.6", "Topic :: Software Development :: Build Tools" ], "description": "# baseqDrops\nA versatile pipeline for processing dataset from 10X, indrop and Drop-seq.\n\n## Install baseqDrops\nWe need python3 and a package called: baseqDrops, which could be installed by:\n\n pip install baseqDrops\n\nAfter install, you will have a runnable command `baseqDrops`\n\nIt is recommend for the computer or server to have memory >= 30Gb and CPU cores >=8 for efficient processing;\n\n## Configuration file\n\nThe following software or resources are required:\n\n+ `star`: STAR software, for fast alignment of RNA-Seq data to the genome;\n+ `samtools`: For sorting the aligned bam file (version >=1.6);\n+ `whitelistDir`: The barcode whitelist files for indrop and 10X should be placed under whitelistDir. These files could bed downloaded from https://github.com/beiseq/baseqDrops/tree/master/whitelist;\n+ `cellranger_ref_`: The key process of read alignment and tagging to genes are inspired and borrowed from the open source cellranger pipeline(https://github.com/10XGenomics/cellranger). The references of genome index and transcriptome can be downloaded from https://support.10xgenomics.com/single-cell-gene-expression/software/downloads/latest.\nIn the config file, the directory of cellranger references is named as `cellranger_`.\n\nWhile running command, the configures are recorded in the file called `config_drops.ini`:\n\n [Drops]\n samtools = /path/to/samtools\n star = /path/to/STAR\n whitelistDir = /path/to/whitelist_file_directory\n cellranger_ref_hg38 = /path/to/reference/refdata-cellranger-GRCh38-1.2.0/\n\n## For Help Informations\n\n\tbaseqDrops run-pipe --help\n\n## Process Steps\n\n1. `Cell Barcode Counting`: Counting the existed barcodes in dataset. This will generate a file named: barcode_count_.csv;\n2. `Cell Barcode Correction, Aggregating and Filtering`: Correcting the cell barcodes within 1bp mismatch and then aggregating, filtering the barcode by minimum number of reads (default 5000), this will generate a valid barcode list named: barcode_stats_.csv;\n3. `Split the Reads of Valid Cell Barcodes`: The raw pair-end raw reads are splitted to 16 single-end files for multiprocessing according to the 2bp prefix of the barcode; The folder of barcode_splits contains files like: split...fq;\n4. `Alignment to Genome using STAR`: Several (defined by --parallel/-p) STAR programs run at the same time, the results will be at folder named as star_align; The bam files are further sorted by sequence header;\n5. `Reads Tagging`: Tagging the reads alignment position to the corresponding gene name;\n6. `Generating Expression Table`: Both the expression table quantified by UMI (Result.UMIs..txt) and raw read count (Result.Reads..txt) will be generated;\n\n## Run Pipeline\n\nThese parameters should be provided: (or run: baseqDrops run-pipe --help for information)\n\n+ `--outdir/-d`: Output path (default ./, the result will be stored in ./);\n+ `--config`: Path to the config file;\n+ `--genome/-g`: Genome version [hg38/mm38/hgmm];\n+ `--protocol/-p`: [10X|indrop|dropseq];\n+ `--minreads`: Minimum reads required for a barcode;\n+ `--name/-n` : Name of sample, a folder of / will be created and be the main directory; \n+ `--parallel` : The number of STAR and tagging processes runs at the same time (default is 4, need more memory for larger parallel number); \n+ `--fq1/-1`: Path of Pair-end 1 sequencing file;\n+ `--fq2/-2`: Path of Pair-end 2 sequencing file;\n+ `--top_million_reads`: For huge dataset, you can choose to use part of the data for a quick look, the reads exceeding N million of reads will be skipped;\n\nIf your data is human origin and `cellranger_ref_hg38` has been defined in configuration file, you can run:\n\n baseqDrops run-pipe --config ./config_drops.ini -g hg38 -p 10X --minreads 1000 -n 10X_test -1 10x_1.1.fq.gz -2 10x.2.fq.gz -d ./\n\n## Run by Single Steps\n\nWe also provide step-wise ways for running the pipeline, all the parameters should be provided as described above, an extra \"--step\" should be provided, for example:\n\n\tbaseqDrops run-pipe --config ./config.ini -g hg38 -p dropseq --minreads 1000 -n dropseq2 --top_million_reads 20 -1 dropseq_1.1.fq.gz -2 dropseq.2.fq.gz --step count -d ./\n\nThe steps are listed:\n\n+ `Cell Barcode Counting`: --step count\n+ `Cell Barcode Correction, Aggregating and Filtering`: --step stats\n+ `Split the Reads of Valid Cell Barcodes`: --step split\n+ `Alignment to Genome using STAR`: --step star\n+ `Reads Tagging` : --step tagging\n+ `Generating Expression Table`: --step table\n\n## Contact\n\nFor any questions, please email to: friedpine@gmail.com\n\n\n", "description_content_type": "", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://gene.pku.edu.cn", "keywords": "sample setuptools development", "license": "", "maintainer": "", "maintainer_email": "", "name": "baseqDrops", "package_url": "https://pypi.org/project/baseqDrops/", "platform": "", "project_url": "https://pypi.org/project/baseqDrops/", "project_urls": { "Homepage": "https://gene.pku.edu.cn" }, "release_url": "https://pypi.org/project/baseqDrops/2.0/", "requires_dist": [ "click", "configparser", "matplotlib", "numpy", "pandas", "pysam", "check-manifest; extra == 'dev'", "coverage; extra == 'test'" ], "requires_python": "", "summary": "Processing Drop-seq, 10X(3prime) and inDrop RNA-seq dataset", "version": "2.0" }, "last_serial": 4772269, "releases": { "1.5": [ { "comment_text": "", "digests": { "md5": "0a92f6e86db8bfbc6725dc1307580d00", "sha256": "ecc52145684bf62252a6fe5081a0d1042d90df589e5ba2d72ea7492697322cc2" }, "downloads": -1, "filename": "baseqDrops-1.5-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "0a92f6e86db8bfbc6725dc1307580d00", "packagetype": "bdist_wheel", "python_version": "py2.py3", "requires_python": null, "size": 33301, "upload_time": "2018-11-21T14:44:28", "url": "https://files.pythonhosted.org/packages/d0/e3/01050dd66a3c3cedea9c0ae70e47ce67f71d535686766da0c0796c1461cf/baseqDrops-1.5-py2.py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "c82e17776a3e1d5ef11d66020d821ba2", "sha256": "c728391bbc81d579f909c144883bbd884725629df450b4bd846666cebfe18671" }, "downloads": -1, "filename": "baseqDrops-1.5.tar.gz", "has_sig": false, "md5_digest": "c82e17776a3e1d5ef11d66020d821ba2", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 20225, "upload_time": "2018-11-21T14:44:34", "url": "https://files.pythonhosted.org/packages/36/d5/f8eb93f082233dcdb8fbd15c12cb4438f2921051cf00eaff1ca0b69978f6/baseqDrops-1.5.tar.gz" } ], "2.0": [ { "comment_text": "", "digests": { "md5": "b701d3e4f3e02574f679e307d3ee2411", "sha256": "ccfbdb8f99f41fd898c09a40432c4605fc11f7fec95e32a16f5be1f11c7357ee" }, "downloads": -1, "filename": "baseqDrops-2.0-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "b701d3e4f3e02574f679e307d3ee2411", "packagetype": "bdist_wheel", "python_version": "py2.py3", "requires_python": null, "size": 33397, "upload_time": "2019-02-02T13:46:29", "url": "https://files.pythonhosted.org/packages/4f/58/b668bad105d17ab900757568eeaaeb0f4a824a538cfd8a21994121a673c9/baseqDrops-2.0-py2.py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "777836e05a54391ac0ffb5bd1b5b167b", "sha256": "775f40d1e4f394e3b48d44ed47cdbbff7aeed1117996f62900099f147cf6b82c" }, "downloads": -1, "filename": "baseqDrops-2.0.tar.gz", "has_sig": false, "md5_digest": "777836e05a54391ac0ffb5bd1b5b167b", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 20354, "upload_time": "2019-02-02T13:46:33", "url": "https://files.pythonhosted.org/packages/c4/c2/5b1323bb5da55797053b7b533b5e864a38c8363d39df365718c8beddccf3/baseqDrops-2.0.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "b701d3e4f3e02574f679e307d3ee2411", "sha256": "ccfbdb8f99f41fd898c09a40432c4605fc11f7fec95e32a16f5be1f11c7357ee" }, "downloads": -1, "filename": "baseqDrops-2.0-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "b701d3e4f3e02574f679e307d3ee2411", "packagetype": "bdist_wheel", "python_version": "py2.py3", "requires_python": null, "size": 33397, "upload_time": "2019-02-02T13:46:29", "url": "https://files.pythonhosted.org/packages/4f/58/b668bad105d17ab900757568eeaaeb0f4a824a538cfd8a21994121a673c9/baseqDrops-2.0-py2.py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "777836e05a54391ac0ffb5bd1b5b167b", "sha256": "775f40d1e4f394e3b48d44ed47cdbbff7aeed1117996f62900099f147cf6b82c" }, "downloads": -1, "filename": "baseqDrops-2.0.tar.gz", "has_sig": false, "md5_digest": "777836e05a54391ac0ffb5bd1b5b167b", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 20354, "upload_time": "2019-02-02T13:46:33", "url": "https://files.pythonhosted.org/packages/c4/c2/5b1323bb5da55797053b7b533b5e864a38c8363d39df365718c8beddccf3/baseqDrops-2.0.tar.gz" } ] }