{ "info": { "author": "Peter Kruczkiewicz", "author_email": "peter.kruczkiewicz@gmail.com", "bugtrack_url": null, "classifiers": [ "Development Status :: 2 - Pre-Alpha", "Intended Audience :: Developers", "License :: OSI Approved :: Apache Software License", "Natural Language :: English", "Programming Language :: Python :: 3", "Programming Language :: Python :: 3.6", "Programming Language :: Python :: 3.7", "Topic :: Scientific/Engineering", "Topic :: Scientific/Engineering :: Bio-Informatics" ], "description": "=======================\nfilter_classified_reads\n=======================\n\n\n.. image:: https://img.shields.io/pypi/v/filter_classified_reads.svg\n :target: https://pypi.python.org/pypi/filter_classified_reads\n\n.. image:: https://travis-ci.org/peterk87/filter_classified_reads.svg?branch=master\n :target: https://travis-ci.org/peterk87/filter_classified_reads\n\n.. image:: https://readthedocs.org/projects/filter-classified-reads/badge/?version=latest\n :target: https://filter-classified-reads.readthedocs.io/en/latest/?badge=latest\n :alt: Documentation Status\n\n\n\n\nFilter for reads from taxa of interest using Kraken2/Centrifuge classification results.\n\n\n* Free software: Apache Software License 2.0\n* Documentation: https://filter-classified-reads.readthedocs.io.\n\n\nFeatures\n--------\n\n* Filter for union of reads classified to taxa of interest Kraken2_ and Centrifuge_ (by default filter for Viral reads (taxid=10239))\n* Output unclassified reads along with reads from taxa of interest *or* exlude them with `--exclude-unclassified`\n* seqtk_ for quickly filtering reads and pbgzip_ for parallel block Gzip compression of output reads (recommended that these dependencies are installed with Conda_)\n\nUsage\n-----\n\nPaired-end reads with classification results by both Kraken2_ and Centrifuge_\n\n.. code-block::\n\n filter_classified_reads -i /path/to/reads/R1.fq \\\n -I /path/to/reads/R2.fq \\\n -o /path/to/reads/R1.filtered.fq.gz \\\n -O /path/to/reads/R2.filtered.fq.gz \\\n -k /path/to/kraken2/results.tsv \\\n -K /path/to/kraken2/kreport.tsv \\\n -c /path/to/centrifuge/results.tsv \\\n -C /path/to/centrifuge/kreport.tsv \\\n\n\nUsing test data in `tests/data/`:\n\n.. code-block::\n\n $ filter_classified_reads -i tests/data/SRR8207674_1.viral_unclassified.seqtk_seed42_n10000.fastq.gz \\\n -I tests/data/SRR8207674_2.viral_unclassified.seqtk_seed42_n10000.fastq.gz \\\n -o r1.fq.gz \\\n -O r2.fq.gz \\\n -k tests/data/SRR8207674-kraken2_results.tsv \\\n -K tests/data/SRR8207674-kraken2_report.tsv \\\n -c tests/data/SRR8207674-centrifuge_results.tsv \\\n -C tests/data/SRR8207674-centrifuge_kreport.tsv\n\nYou should see the following log information:\n\n.. code-block::\n\n 2019-04-16 13:40:34,114 INFO: Parsing centrifuge results into DataFrame [in target_classified_reads.py:49]\n 2019-04-16 13:40:34,168 INFO: Parsed n=12281 centrifuge result records into DataFrame from \"tests/data/SRR8207674-centrifuge_results.tsv\" [in target_classified_reads.py:57]\n 2019-04-16 13:40:34,172 INFO: Parsed n=298 centrifuge Kraken-style report records into DataFrame from \"tests/data/SRR8207674-centrifuge_kreport.tsv\" [in target_classified_reads.py:60]\n 2019-04-16 13:40:34,177 INFO: Found 7129 unclassified reads from Centrifuge results [in target_classified_reads.py:65]\n 2019-04-16 13:40:34,242 INFO: Found 231 unique viral Taxonomy IDs [in target_classified_reads.py:98]\n 2019-04-16 13:40:34,245 INFO: Found 2181 target reads from centrifuge results [in target_classified_reads.py:101]\n 2019-04-16 13:40:34,245 INFO: Parsing kraken2 results into DataFrame [in target_classified_reads.py:49]\n 2019-04-16 13:40:34,289 INFO: Parsed n=20000 kraken2 result records into DataFrame from \"tests/data/SRR8207674-kraken2_results.tsv\" [in target_classified_reads.py:57]\n 2019-04-16 13:40:34,293 INFO: Parsed n=139 kraken2 Kraken-style report records into DataFrame from \"tests/data/SRR8207674-kraken2_report.tsv\" [in target_classified_reads.py:60]\n 2019-04-16 13:40:34,295 INFO: Found 1737 unclassified reads from Centrifuge results [in target_classified_reads.py:65]\n 2019-04-16 13:40:34,325 INFO: Found 26 unique viral Taxonomy IDs [in target_classified_reads.py:98]\n 2019-04-16 13:40:34,331 INFO: Found 8345 target reads from kraken2 results [in target_classified_reads.py:101]\n 2019-04-16 13:40:34,332 INFO: Found N=1701 common unclassified reads by all classification methods. [in cli.py:110]\n 2019-04-16 13:40:34,333 INFO: Total viral reads=8357 [in util.py:37]\n 2019-04-16 13:40:34,333 INFO: Centrifuge found n=12 target reads not found with Kraken2 [in util.py:38]\n 2019-04-16 13:40:34,333 INFO: Kraken2 found n=6176 target reads not found with Centrifuge [in util.py:40]\n 2019-04-16 13:40:34,338 INFO: N=1701 reads unclassified by both Centrifuge and Kraken2. [in util.py:62]\n 2019-04-16 13:40:34,345 INFO: Writing n=9999 filtered reads from \"tests/data/SRR8207674_1.viral_unclassified.seqtk_seed42_n10000.fastq.gz\" to \"r1.fq.gz\" [in cli.py:129]\n 2019-04-16 13:40:34,957 INFO: Writing n=9999 filtered reads from \"tests/data/SRR8207674_2.viral_unclassified.seqtk_seed42_n10000.fastq.gz\" to \"r2.fq.gz\" [in cli.py:134]\n 2019-04-16 13:40:35,459 INFO: Done! [in cli.py:137]\n\n\n\nCredits\n-------\n\nThis package was created with Cookiecutter_ and the `audreyr/cookiecutter-pypackage`_ project template.\n\n.. _Cookiecutter: https://github.com/audreyr/cookiecutter\n.. _`audreyr/cookiecutter-pypackage`: https://github.com/audreyr/cookiecutter-pypackage\n.. _Kraken2: https://ccb.jhu.edu/software/kraken2/\n.. _Centrifuge: https://ccb.jhu.edu/software/centrifuge/manual.shtml\n.. _seqtk: https://github.com/lh3/seqtk\n.. _pbgzip: https://anaconda.org/bioconda/pbgzip\n.. _Conda: https://conda.io/en/latest/\n\n\n=======\nHistory\n=======\n\n0.2.0 (2019-09-23)\n------------------\n\n* Use ``seqtk subseq`` instead of screed for pulling reads of interest from files\n* Added external dependencies for ``seqtk`` for pulling reads from input FASTQs and ``pbgzip`` for parallel block Gzip for output of Gzipped FASTQs\n* Removed ``screed`` Python dependency\n\n\n0.1.0 (2019-04-15)\n------------------\n\n* First release on PyPI.", "description_content_type": "", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/peterk87/filter_classified_reads", "keywords": "filter_classified_reads", "license": "Apache Software License 2.0", "maintainer": "", "maintainer_email": "", "name": "filter-classified-reads", "package_url": "https://pypi.org/project/filter-classified-reads/", "platform": "", "project_url": "https://pypi.org/project/filter-classified-reads/", "project_urls": { "Homepage": "https://github.com/peterk87/filter_classified_reads" }, "release_url": "https://pypi.org/project/filter-classified-reads/0.2.0/", "requires_dist": null, "requires_python": "", "summary": "Filter for reads from taxa of interest using Kraken2/Centrifuge classification results", "version": "0.2.0" }, "last_serial": 5890715, "releases": { "0.1.0": [ { "comment_text": "", "digests": { "md5": "2701bb6e102a0c0ca71e0207d6acba4d", "sha256": "99db66d8db04547ce97a57e34dfa9b8657730ad9902546e2bb8a56991c406631" }, "downloads": -1, "filename": "filter_classified_reads-0.1.0.tar.gz", "has_sig": false, "md5_digest": "2701bb6e102a0c0ca71e0207d6acba4d", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 1616818, "upload_time": "2019-04-17T16:11:45", "url": "https://files.pythonhosted.org/packages/95/77/02c5752eaa39b02ed2e06d0d9775c7c83956a062f7c6859e50bb17c10451/filter_classified_reads-0.1.0.tar.gz" } ], "0.2.0": [ { "comment_text": "", "digests": { "md5": "874906de011be0dc8a34cda7ae89df12", "sha256": "34d386ac5ac8d53c24dad7e16056762f2ea6090505ea42d8e81a9e2af203519c" }, "downloads": -1, "filename": "filter_classified_reads-0.2.0.tar.gz", "has_sig": false, "md5_digest": "874906de011be0dc8a34cda7ae89df12", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 1614289, "upload_time": "2019-09-26T13:47:23", "url": "https://files.pythonhosted.org/packages/4f/ff/d708158d9a8d07c0b7fb0481e7d17643570a8836b2807ca4efe6639c1c8b/filter_classified_reads-0.2.0.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "874906de011be0dc8a34cda7ae89df12", "sha256": "34d386ac5ac8d53c24dad7e16056762f2ea6090505ea42d8e81a9e2af203519c" }, "downloads": -1, "filename": "filter_classified_reads-0.2.0.tar.gz", "has_sig": false, "md5_digest": "874906de011be0dc8a34cda7ae89df12", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 1614289, "upload_time": "2019-09-26T13:47:23", "url": "https://files.pythonhosted.org/packages/4f/ff/d708158d9a8d07c0b7fb0481e7d17643570a8836b2807ca4efe6639c1c8b/filter_classified_reads-0.2.0.tar.gz" } ] }