{
    "info": {
        "author": "Xiannian Zhang",
        "author_email": "friedpine@gmail.com",
        "bugtrack_url": null,
        "classifiers": [
            "Development Status :: 3 - Alpha",
            "Intended Audience :: Developers",
            "License :: OSI Approved :: MIT License",
            "Programming Language :: Python :: 3.6",
            "Topic :: Software Development :: Build Tools"
        ],
        "description": "# baseqDrops\nA versatile pipeline for processing dataset from 10X, indrop and Drop-seq.\n\n## Install baseqDrops\nWe need python3 and a package called: baseqDrops, which could be installed by:\n\n    pip install baseqDrops\n\nAfter install, you will have a runnable command `baseqDrops`\n\nIt is recommend for the computer or server to have memory >= 30Gb and CPU cores >=8 for efficient processing;\n\n## Configuration file\n\nThe following software or resources are required:\n\n+ `star`: STAR software, for fast alignment of RNA-Seq data to the genome;\n+ `samtools`: For sorting the aligned bam file (version >=1.6);\n+ `whitelistDir`: The barcode whitelist files for indrop and 10X should be placed under whitelistDir. These files could bed downloaded from https://github.com/beiseq/baseqDrops/tree/master/whitelist;\n+ `cellranger_ref_<genome>`: The key process of read alignment and tagging to genes are inspired and borrowed from the open source cellranger pipeline(https://github.com/10XGenomics/cellranger). The references of genome index and transcriptome can be downloaded from https://support.10xgenomics.com/single-cell-gene-expression/software/downloads/latest.\nIn the config file, the directory of cellranger references is named as `cellranger_<genome>`.\n\nWhile running command, the configures are recorded in the file called `config_drops.ini`:\n\n    [Drops]\n    samtools = /path/to/samtools\n    star = /path/to/STAR\n    whitelistDir = /path/to/whitelist_file_directory\n    cellranger_ref_hg38 = /path/to/reference/refdata-cellranger-GRCh38-1.2.0/\n\n## For Help Informations\n\n\tbaseqDrops run-pipe --help\n\n## Process Steps\n\n1. `Cell Barcode Counting`: Counting the existed barcodes in dataset. This will generate a file named: barcode_count_<sample>.csv;\n2. `Cell Barcode Correction, Aggregating and Filtering`: Correcting the cell barcodes within 1bp mismatch and then aggregating, filtering the barcode by minimum number of reads (default 5000), this will generate a valid barcode list named: barcode_stats_<sample>.csv;\n3. `Split the Reads of Valid Cell Barcodes`: The raw pair-end raw reads are splitted to 16 single-end files for multiprocessing according to the 2bp prefix of the barcode; The folder of barcode_splits contains files like: split.<sample>.<AA|AT|AC|AG...|GG>.fq;\n4. `Alignment to Genome using STAR`: Several (defined by --parallel/-p) STAR programs run at the same time, the results will be at folder named as star_align; The bam files are further sorted by sequence header;\n5. `Reads Tagging`: Tagging the reads alignment position to the corresponding gene name;\n6. `Generating Expression Table`: Both the expression table quantified by UMI (Result.UMIs.<sample>.txt) and raw read count (Result.Reads.<sample>.txt) will be generated;\n\n## Run Pipeline\n\nThese parameters should be provided: (or run: baseqDrops run-pipe --help for information)\n\n+ `--outdir/-d`: Output path (default ./, the result will be stored in ./<name>);\n+ `--config`: Path to the config file;\n+ `--genome/-g`: Genome version [hg38/mm38/hgmm];\n+ `--protocol/-p`: [10X|indrop|dropseq];\n+ `--minreads`:  Minimum reads required for a barcode;\n+ `--name/-n` : Name of sample, a folder of <outdir>/<name> will be created and be the main directory; \n+ `--parallel` : The number of STAR and tagging processes runs at the same time (default is 4, need more memory for larger parallel number); \n+ `--fq1/-1`: Path of Pair-end 1 sequencing file;\n+ `--fq2/-2`: Path of Pair-end 2 sequencing file;\n+ `--top_million_reads`: For huge dataset, you can choose to use part of the data for a quick look, the reads exceeding N million of reads will be skipped;\n\nIf your data is human origin and `cellranger_ref_hg38` has been defined in configuration file, you can run:\n\n    baseqDrops run-pipe --config ./config_drops.ini -g hg38 -p 10X --minreads 1000 -n 10X_test -1 10x_1.1.fq.gz -2 10x.2.fq.gz -d ./\n\n## Run by Single Steps\n\nWe also provide step-wise ways for running the pipeline, all the parameters should be provided as described above, an extra \"--step\" should be provided, for example:\n\n\tbaseqDrops run-pipe --config ./config.ini -g hg38 -p dropseq --minreads 1000 -n dropseq2 --top_million_reads 20 -1 dropseq_1.1.fq.gz -2 dropseq.2.fq.gz --step count -d ./\n\nThe steps are listed:\n\n+ `Cell Barcode Counting`:  --step count\n+ `Cell Barcode Correction, Aggregating and Filtering`: --step stats\n+ `Split the Reads of Valid Cell Barcodes`: --step split\n+ `Alignment to Genome using STAR`: --step star\n+ `Reads Tagging` : --step tagging\n+ `Generating Expression Table`: --step table\n\n## Contact\n\nFor any questions, please email to: friedpine@gmail.com\n\n\n",
        "description_content_type": "",
        "docs_url": null,
        "download_url": "",
        "downloads": {
            "last_day": -1,
            "last_month": -1,
            "last_week": -1
        },
        "home_page": "https://gene.pku.edu.cn",
        "keywords": "sample setuptools development",
        "license": "",
        "maintainer": "",
        "maintainer_email": "",
        "name": "baseqDrops",
        "package_url": "https://pypi.org/project/baseqDrops/",
        "platform": "",
        "project_url": "https://pypi.org/project/baseqDrops/",
        "project_urls": {
            "Homepage": "https://gene.pku.edu.cn"
        },
        "release_url": "https://pypi.org/project/baseqDrops/2.0/",
        "requires_dist": [
            "click",
            "configparser",
            "matplotlib",
            "numpy",
            "pandas",
            "pysam",
            "check-manifest; extra == 'dev'",
            "coverage; extra == 'test'"
        ],
        "requires_python": "",
        "summary": "Processing Drop-seq, 10X(3prime) and inDrop RNA-seq dataset",
        "version": "2.0"
    },
    "last_serial": 4772269,
    "releases": {
        "1.5": [
            {
                "comment_text": "",
                "digests": {
                    "md5": "0a92f6e86db8bfbc6725dc1307580d00",
                    "sha256": "ecc52145684bf62252a6fe5081a0d1042d90df589e5ba2d72ea7492697322cc2"
                },
                "downloads": -1,
                "filename": "baseqDrops-1.5-py2.py3-none-any.whl",
                "has_sig": false,
                "md5_digest": "0a92f6e86db8bfbc6725dc1307580d00",
                "packagetype": "bdist_wheel",
                "python_version": "py2.py3",
                "requires_python": null,
                "size": 33301,
                "upload_time": "2018-11-21T14:44:28",
                "url": "https://files.pythonhosted.org/packages/d0/e3/01050dd66a3c3cedea9c0ae70e47ce67f71d535686766da0c0796c1461cf/baseqDrops-1.5-py2.py3-none-any.whl"
            },
            {
                "comment_text": "",
                "digests": {
                    "md5": "c82e17776a3e1d5ef11d66020d821ba2",
                    "sha256": "c728391bbc81d579f909c144883bbd884725629df450b4bd846666cebfe18671"
                },
                "downloads": -1,
                "filename": "baseqDrops-1.5.tar.gz",
                "has_sig": false,
                "md5_digest": "c82e17776a3e1d5ef11d66020d821ba2",
                "packagetype": "sdist",
                "python_version": "source",
                "requires_python": null,
                "size": 20225,
                "upload_time": "2018-11-21T14:44:34",
                "url": "https://files.pythonhosted.org/packages/36/d5/f8eb93f082233dcdb8fbd15c12cb4438f2921051cf00eaff1ca0b69978f6/baseqDrops-1.5.tar.gz"
            }
        ],
        "2.0": [
            {
                "comment_text": "",
                "digests": {
                    "md5": "b701d3e4f3e02574f679e307d3ee2411",
                    "sha256": "ccfbdb8f99f41fd898c09a40432c4605fc11f7fec95e32a16f5be1f11c7357ee"
                },
                "downloads": -1,
                "filename": "baseqDrops-2.0-py2.py3-none-any.whl",
                "has_sig": false,
                "md5_digest": "b701d3e4f3e02574f679e307d3ee2411",
                "packagetype": "bdist_wheel",
                "python_version": "py2.py3",
                "requires_python": null,
                "size": 33397,
                "upload_time": "2019-02-02T13:46:29",
                "url": "https://files.pythonhosted.org/packages/4f/58/b668bad105d17ab900757568eeaaeb0f4a824a538cfd8a21994121a673c9/baseqDrops-2.0-py2.py3-none-any.whl"
            },
            {
                "comment_text": "",
                "digests": {
                    "md5": "777836e05a54391ac0ffb5bd1b5b167b",
                    "sha256": "775f40d1e4f394e3b48d44ed47cdbbff7aeed1117996f62900099f147cf6b82c"
                },
                "downloads": -1,
                "filename": "baseqDrops-2.0.tar.gz",
                "has_sig": false,
                "md5_digest": "777836e05a54391ac0ffb5bd1b5b167b",
                "packagetype": "sdist",
                "python_version": "source",
                "requires_python": null,
                "size": 20354,
                "upload_time": "2019-02-02T13:46:33",
                "url": "https://files.pythonhosted.org/packages/c4/c2/5b1323bb5da55797053b7b533b5e864a38c8363d39df365718c8beddccf3/baseqDrops-2.0.tar.gz"
            }
        ]
    },
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "md5": "b701d3e4f3e02574f679e307d3ee2411",
                "sha256": "ccfbdb8f99f41fd898c09a40432c4605fc11f7fec95e32a16f5be1f11c7357ee"
            },
            "downloads": -1,
            "filename": "baseqDrops-2.0-py2.py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "b701d3e4f3e02574f679e307d3ee2411",
            "packagetype": "bdist_wheel",
            "python_version": "py2.py3",
            "requires_python": null,
            "size": 33397,
            "upload_time": "2019-02-02T13:46:29",
            "url": "https://files.pythonhosted.org/packages/4f/58/b668bad105d17ab900757568eeaaeb0f4a824a538cfd8a21994121a673c9/baseqDrops-2.0-py2.py3-none-any.whl"
        },
        {
            "comment_text": "",
            "digests": {
                "md5": "777836e05a54391ac0ffb5bd1b5b167b",
                "sha256": "775f40d1e4f394e3b48d44ed47cdbbff7aeed1117996f62900099f147cf6b82c"
            },
            "downloads": -1,
            "filename": "baseqDrops-2.0.tar.gz",
            "has_sig": false,
            "md5_digest": "777836e05a54391ac0ffb5bd1b5b167b",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 20354,
            "upload_time": "2019-02-02T13:46:33",
            "url": "https://files.pythonhosted.org/packages/c4/c2/5b1323bb5da55797053b7b533b5e864a38c8363d39df365718c8beddccf3/baseqDrops-2.0.tar.gz"
        }
    ]
}