{ "info": { "author": "Pranathi Vemuri", "author_email": "pranathi93.vemuri@gmail.com", "bugtrack_url": null, "classifiers": [ "Development Status :: 5 - Production/Stable", "Environment :: Console", "Environment :: MacOS X", "Intended Audience :: Science/Research", "License :: OSI Approved :: MIT License", "Natural Language :: English", "Operating System :: MacOS :: MacOS X", "Operating System :: POSIX :: Linux", "Programming Language :: C++", "Programming Language :: Python :: 2.7", "Programming Language :: Python :: 3.5", "Programming Language :: Python :: 3.6", "Programming Language :: Python :: 3.7", "Topic :: Scientific/Engineering :: Bio-Informatics" ], "description": "bam2fasta\n================================\n![Tests](https://travis-ci.com/czbiohub/bam2fasta.svg)\n[![codecov](https://codecov.io/gh/czbiohub/bam2fasta/branch/master/graph/badge.svg)](https://codecov.io/gh/czbiohub/bam2fasta)\n\nConvert 10x bam file or fastq.gz files to individual FASTA files per cell barcode\nconvert large bam files to fastq.gz format before the individual fasta files per cell barcode conversion. \nIt speeds up this conversion\nFor small bam files this package can be used directly to convert them to individual FASTA files per cell barcode\n\n\nTo convert bam to fastq.gz format use samtools like below\n\nTo get aligned reads from bam file into fastq.gz use \n\n```\nsamtools view -ub -F 4 ${bam} \\\\\n | samtools fastq --threads ${cpus} -T \"CB,XC,UB,XM,XB,RG\" \\\\\n | gzip -c - \\\\\n > ${output_fastq_gz}\n```\n\nTo get unaligned reads from bam file into fastq.gz use\n\n```\n samtools view -f4 ${bam} \\\\\n | grep -E '(CB|XC):Z:([ACGT]+)(\\\\-1)?' \\\\\n | samtools fastq --threads ${cpus} -T \"CB,XC,UB,XM,XB,RG\" - \\\\\n | gzip -c - \\\\\n > ${output_fastq_gz} \\\\\n```\n\nUsing samtools view through python is not recommended for large bam files, as samtools view is streaming the output\n\n\nFree software: MIT license\n\n\n## Installation\nLatest version can be installed via pip package `bam2fasta`.\n\nQuick install given you have the ssl and zlib packages are already installed.\n\n\t\tpip install bam2fasta\n\t\tconda install -c bioconda bam2fasta\n\nPlease refer to .travis.yml to see what packages are apt addons on linux and linux addons are required\n\nFor osx, before `pip install bam2fasta` install the below homebrew packages\n\n\t\tsudo pip install setuptools\n\t\tbrew update\n\t\tbrew install openssl\n\t\tbrew install zlib\n\nFor linux, before `pip install bam2fasta` install the below apt packages\n\n\t\tapt-get install libbz2-dev\n\t\tapt-get install libcurl4-openssl-dev\n\t\tapt-get install libssl-dev\n\n\n## Usage\n\nBam2fasta info command:\n\t\n\t\tbam2fasta info\n\t\tbam2fasta info -v\n\nBam2fasta percell command, it takes BAM and/or barcode files as input. Examples:\n\t\n\tbam2fasta percell --filename filename.bam \n\tbam2fasta percell --filename 10x-example/possorted_genome_bam.bam \\\n\t\t--save-fastas fastas --min-umi-per-barcode 10 \\\n\t\t--write-barcode-meta-csv all_barcodes_meta.csv \\\n\t\t--barcodes 10x-example/barcodes.tsv \\\n\t\t--rename-10x-barcodes 10x-example/barcodes_renamer.tsv \\\n\t\t--shard-size 150 \\\n\t\t--save-intermediate-files intermediate_files\n\nBam2fasta percell command, it takes fastq.gz and/or barcode files as input. Examples:\n\t\n\tbam2fasta percell --filename 10x-example/possorted_genome_bam.fastq.gz \\\n\t\t--save-fastas fastas --min-umi-per-barcode 10 \\\n\t\t--write-barcode-meta-csv all_barcodes_meta.csv \\\n\t\t--barcodes 10x-example/barcodes.tsv\n\nBam2fasta count_umis_percell command, it takes fastq.gz file with sequences and barcodes, umis in their read id and counts the umis per cell. Examples:\n\t\n\tbam2fasta count_umis_percell --filename filename.fastq.gz \n\tbam2fasta count_umis_percell --filename 10x-example/possorted_genome_bam.fastq.gz \\\n\t\t--write-barcode-meta-csv all_barcodes_meta.csv \\\n\t\t--min-umi-per-barcode 10 \\\n\t\t--barcodes-significant-umis-file good_barcodes.csv \\\n\t\t--cell-barcode-pattern 'CB:Z' \\\n\t\t--molecular-barcode-pattern 'UB:Z'\n\nBam2fasta make_fastqs_percell command, it takes it takes fastq.gz file with sequences and barcodes. Examples:\n\t\n\tbam2fasta make_fastqs_percell --filename filename.fastq.gz \n\tbam2fasta make_fastqs_percell --filename 10x-example/possorted_genome_bam.fastq.gz \\\n\t\t--save-fastas fastas \\\n\t\t--min-umi-per-barcode 10 \\\n\t\t--barcodes-significant-umis-file good_barcodes.csv \\\n\t\t--barcodes 10x-example/barcodes.tsv \\\n\t\t--cell-barcode-pattern 'CB:Z'\n\n## Table of Contents\n* [Main arguments](#main-arguments)\n * [`--filename`](#--filename)\n* [Bam optional parameters](#bam-optional-parameters)\n * [`--barcodes-file`](#--barcodes-file)\n * [`--rename-10x-barcodes`](#--rename-10x-barcodes)\n * [`--save-fastas`](#--save-fastas)\n * [`--save-intermediate-files`](#--save-intermediate-files)\n * [`--write-barcode-meta-csv`](#--write-barcode-meta-csv)\n * [`--min-umi-per-barcode`](#--min-umi-per-barcode)\n * [`--shard-size`](#--shard-size)\n * [`--processes`](#--processes)\n * [`--delimiter`](#--delimiter)\n * [`--cell-barcode-patternt`](#--cell-barcode-pattern)\n * [`--molecular-barcode-pattern`](#--molecular-barcode-pattern)\n * [`--channel-id`](#--channel-id)\n * [`--output-format`](#--output-format)\n\n\n### `--filename`\nFor bam/10x files, Use this to specify the location of the bam file or fastq.gz file to get per cell fastas. For example:\n\n```bash\n--filename /path/to/data/10x-example/possorted_genome_bam.bam\n--filename /path/to/data/10x-example/possorted_genome_bam.fastq.gz\n```\n\n## Bam optional parameters\n\n\n### `--barcodes-file`\nFor bam/10x files, Use this to specify the location of tsv (tab separated file) containing cell barcodes. For example:\n\n```bash\n--barcodes-file /path/to/data/10x-example/barcodes.tsv\n```\n\nIf left unspecified, barcodes are derived from bam are used.\n\n### `--rename-10x-barcodes`\nFor bam/10x files, Use this to specify the location of your tsv (tab separated file) containing map of cell barcodes and their corresponding new names(e.g row in the tsv file: AAATGCCCAAACTGCT-1 lung_epithelial_cell|AAATGCCCAAACTGCT-1). \nFor example:\n\n```bash\n--rename-10x-barcodes /path/to/data/10x-example/barcodes_renamer.tsv\n```\nIf left unspecified, barcodes in bam as given in barcodes_file are not renamed.\n\n\n### `--save-fastas`\n\n1. The [save-fastas ](#--save-fastas ) used to save the sequences of each unique barcode in the bam file. By default, they are saved inside working directory to save unique barcodes to files namely {CELL_BARCODE}.fasta. Otherwise absolute path given in save_fastas. \n\n\n**Example parameters**\n\n* Default: Save fastas in current working directory:\n\t* `--save-fastas \"fastas\"`\n\n### `--save-intermediate-files`\n\n1. The [save-intermediate-files](#--save-intermediate-files ) used to save the intermediate sharded bams and their corresponding fastas. By default, they are saved inside \"/tmp/\" and are deleted automatically at the end of the program. Otherwise absolute path given in save_intermediate_files. \n\n\n**Example parameters**\n\n* Default: Save temporary fastas and bam in `/tmp/` directory:\n\t* `--save-intermediate-files \"fastas\"`\n\n\n### `--write-barcode-meta-csv`\nThis creates a CSV containing the barcode and number of UMIs per barcode, written in a path given by `write_barcode_meta_csv`. This csv file is empty when the min-umi-per-barcode is zero i.e reads and UMIs per barcode are calculated only when the barcodes are filtered based on [min-umi-per-barcode](#--min-umi-per-barcode)\n**Example parameters**\n\n* Default: barcode metadata is not saved \n\t* `--write-barcode-meta-csv \"barcodes_counts.csv\"`\n\n\n### `--min-umi-per-barcode`\nThe parameter `--min-umi-per-barcode` ensures that a barcode is only considered a valid barcode read and its sketch is written if number of unique molecular identifiers (UMIs, aka molecular barcodes) are greater than the value specified for a barcode.\n\n**Example parameters**\n\n* Default: min-umi-per-barcode is 0\n* Set minimum UMI per cellular barcode as 10:\n\t* `--min-umi-per-barcode 10`\n\n\n### `--shard-size`\nThe parameter `--shard-size` specifies the number of alignments/lines in each bam shard.\n**Example parameters**\n\n* Default: shard-size is 1500\n\t* `--shard-size 400`\n\n\n### `--processes`\nThe parameter `--processes` specifies the number of cores/processes to parallelize on.\n**Example parameters**\n\n* Default: processes is 2\n\t* `--processes 400`\n\n\n### `--delimiter`\nThe parameter `--delimiter` specifies delimiter between sequences of the same barcode.\n**Example parameters**\n\n* Default: delimiter is X\n\t* `--delimiter X`\n\n\n### `--cell-barcode-pattern`\nThe parameter `--cell-barcode-pattern` specifies the regular expressions for molecular barcodes\n**Example parameters**\n\n* Default: cell-barcode-pattern is (CB|XC):Z:\n\t* `--cell-barcode-pattern 'CB:Z'`\n\n\n### `--molecular-barcode-pattern`\nThe parameter `--molecular-barcode-pattern` specifies the regular expressions for molecular barcodes.\n**Example parameters**\n\n* Default: molecular-barcode-pattern is '(UB|XB):Z:([ACGT]+)'\n\t* `--molecular-barcode-pattern 'UB:Z'`\n\n\n### `--channel-id`\nThe parameter `--channel-id` specifies the prefix for fastqs or fastq.gzs saved by default method.\n**Example parameters**\n\n* Default: channel-id is ''\n\t* `--channel-id 'possorted_aligned'`\n\n\n### `--output-format`\nThe parameter `--output-format` specifies the format of output fastq per cell files. it can be either fasta(when input format is bam), fastq or fastq.gz (when input format is fastq.gz). This parameter is only valid for default method\n**Example parameters**\n\n* Default: output-format is fastq\n\t* `--output-format fastq.gz`", "description_content_type": "text/markdown", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/czbiohub/bam2fasta", "keywords": "10x,bam,genomic,fastas,kmers,sequences", "license": "MIT License", "maintainer": "Pranathi Vemuri", "maintainer_email": "pranathi93.vemuri@gmail.com", "name": "bam2fasta", "package_url": "https://pypi.org/project/bam2fasta/", "platform": "", "project_url": "https://pypi.org/project/bam2fasta/", "project_urls": { "Homepage": "https://github.com/czbiohub/bam2fasta" }, "release_url": "https://pypi.org/project/bam2fasta/1.0.8/", "requires_dist": null, "requires_python": "", "summary": "tool for converting a bam file to fastas", "version": "1.0.8", "yanked": false, "yanked_reason": null }, "last_serial": 8384830, "releases": { "1.0.0": [ { "comment_text": "", "digests": { "md5": "f648a499c934597172f02d96da5097d3", "sha256": "5724a03042999ffa9434fc2913413eb2caa17d321973ebb1263f8991de4b33cf" }, "downloads": -1, "filename": "bam2fasta-1.0.0.tar.gz", "has_sig": false, "md5_digest": "f648a499c934597172f02d96da5097d3", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 266064, "upload_time": "2019-10-30T18:36:13", "upload_time_iso_8601": "2019-10-30T18:36:13.960794Z", "url": "https://files.pythonhosted.org/packages/37/06/036208fcd4475c23f830c27f7af85c0458599f98f28284ec95a61bacfc87/bam2fasta-1.0.0.tar.gz", "yanked": false, "yanked_reason": null } ], "1.0.1": [ { "comment_text": "", "digests": { "md5": "11115b06e3fb5dd21353b842f92d67b6", "sha256": "77e2063cf2ec2f9152945147c9efd37879bc81a935e552af69f1778536d728f7" }, "downloads": -1, "filename": "bam2fasta-1.0.1.tar.gz", "has_sig": false, "md5_digest": "11115b06e3fb5dd21353b842f92d67b6", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 266186, "upload_time": "2019-10-31T01:26:32", "upload_time_iso_8601": "2019-10-31T01:26:32.414966Z", "url": "https://files.pythonhosted.org/packages/d2/78/de6c2e359a167b670b6f04f96e73a136159c14c5cb5be00aa1026e230562/bam2fasta-1.0.1.tar.gz", "yanked": false, "yanked_reason": null } ], "1.0.3": [ { "comment_text": "", "digests": { "md5": "fe2092c8b1a78a76de604f33c6b637f5", "sha256": "040d6b15b087d95a19d51bcf048d514b789c32f6e5c5252f9ba0668b675d980a" }, "downloads": -1, "filename": "bam2fasta-1.0.3.tar.gz", "has_sig": false, "md5_digest": "fe2092c8b1a78a76de604f33c6b637f5", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 361515, "upload_time": "2020-01-29T02:15:38", "upload_time_iso_8601": "2020-01-29T02:15:38.914571Z", "url": "https://files.pythonhosted.org/packages/47/5a/d9f1f7116a7dc339a6dec8e8c866e258faa85f2d22b785e538317458c08f/bam2fasta-1.0.3.tar.gz", "yanked": false, "yanked_reason": null } ], "1.0.4": [ { "comment_text": "", "digests": { "md5": "53d156be74bc4137a853debaf5c8b377", "sha256": "3afc9843b21ebd764f005825b93899016313d8645ecde9a04e52dddabc4015af" }, "downloads": -1, "filename": "bam2fasta-1.0.4.tar.gz", "has_sig": false, "md5_digest": "53d156be74bc4137a853debaf5c8b377", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 699245, "upload_time": "2020-02-11T21:33:08", "upload_time_iso_8601": "2020-02-11T21:33:08.329735Z", "url": "https://files.pythonhosted.org/packages/f7/bc/3ac40f22b7b0c2f40a2363ebce24e32254383316568ea08084528ea319fe/bam2fasta-1.0.4.tar.gz", "yanked": false, "yanked_reason": null } ], "1.0.5": [ { "comment_text": "", "digests": { "md5": "c2026f112db4b01d5030bf847d33d8ad", "sha256": "d24c02f98726eda8ffda303ea63c77b28966e46dfeecfa1f0da770832b27a92f" }, "downloads": -1, "filename": "bam2fasta-1.0.5.tar.gz", "has_sig": false, "md5_digest": "c2026f112db4b01d5030bf847d33d8ad", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 597456, "upload_time": "2020-07-23T03:12:19", "upload_time_iso_8601": "2020-07-23T03:12:19.614854Z", "url": "https://files.pythonhosted.org/packages/66/02/b2c3ab2f3623730f5d2f6ff19a80e42ded7fff738e326b5a50f772f8d849/bam2fasta-1.0.5.tar.gz", "yanked": false, "yanked_reason": null } ], "1.0.6": [ { "comment_text": "", "digests": { "md5": "e30ef2eb6ba89cb8709b581bc4976648", "sha256": "a695fabff95eb893e77c8ec9b0ff932fd26c5abcdee036c7b4b71efd33e33146" }, "downloads": -1, "filename": "bam2fasta-1.0.6.tar.gz", "has_sig": false, "md5_digest": "e30ef2eb6ba89cb8709b581bc4976648", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 597974, "upload_time": "2020-07-28T15:58:55", "upload_time_iso_8601": "2020-07-28T15:58:55.802958Z", "url": "https://files.pythonhosted.org/packages/2c/04/567cd21b909d00e0e8e7939d9ad117b4ed44dafd9554cdc8c6cb614e2576/bam2fasta-1.0.6.tar.gz", "yanked": false, "yanked_reason": null } ], "1.0.7": [ { "comment_text": "", "digests": { "md5": "e4a780e4776f0defe8e4f859a64004a8", "sha256": "de132c7e4be3670cafb8b98efd7af9a9774c279f4d7a272f319c3564e3199293" }, "downloads": -1, "filename": "bam2fasta-1.0.7.tar.gz", "has_sig": false, "md5_digest": "e4a780e4776f0defe8e4f859a64004a8", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 597873, "upload_time": "2020-10-01T07:14:00", "upload_time_iso_8601": "2020-10-01T07:14:00.336044Z", "url": "https://files.pythonhosted.org/packages/b6/b7/4a94fddb1e9fbaa1aa58c2ae67a2581810ee73d348d8458a52b991daaafb/bam2fasta-1.0.7.tar.gz", "yanked": false, "yanked_reason": null } ], "1.0.8": [ { "comment_text": "", "digests": { "md5": "03af6b78c888b8f7b2e3376e38a1ba40", "sha256": "f807c2481af8208d82921879d03622a482e2f2de791cd38d0a810ddcb80cca28" }, "downloads": -1, "filename": "bam2fasta-1.0.8.tar.gz", "has_sig": false, "md5_digest": "03af6b78c888b8f7b2e3376e38a1ba40", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 597886, "upload_time": "2020-10-11T00:35:30", "upload_time_iso_8601": "2020-10-11T00:35:30.667714Z", "url": "https://files.pythonhosted.org/packages/8f/17/e611653c32912f5c03c8528db33c7808413c22d9268ca96efac2611de6df/bam2fasta-1.0.8.tar.gz", "yanked": false, "yanked_reason": null } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "03af6b78c888b8f7b2e3376e38a1ba40", "sha256": "f807c2481af8208d82921879d03622a482e2f2de791cd38d0a810ddcb80cca28" }, "downloads": -1, "filename": "bam2fasta-1.0.8.tar.gz", "has_sig": false, "md5_digest": "03af6b78c888b8f7b2e3376e38a1ba40", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 597886, "upload_time": "2020-10-11T00:35:30", "upload_time_iso_8601": "2020-10-11T00:35:30.667714Z", "url": "https://files.pythonhosted.org/packages/8f/17/e611653c32912f5c03c8528db33c7808413c22d9268ca96efac2611de6df/bam2fasta-1.0.8.tar.gz", "yanked": false, "yanked_reason": null } ], "vulnerabilities": [] }