{
"info": {
"author": "Chentao Yang, Guanliang Meng",
"author_email": "yangchentao@genomics.cn",
"bugtrack_url": null,
"classifiers": [
"Development Status :: 3 - Alpha",
"Intended Audience :: Science/Research",
"License :: OSI Approved :: GNU General Public License v3 or later (GPLv3+)",
"Operating System :: MacOS :: MacOS X",
"Operating System :: Microsoft :: Windows",
"Operating System :: POSIX :: Linux",
"Programming Language :: Python :: 3",
"Topic :: Scientific/Engineering :: Bio-Informatics"
],
"description": "# HIFI-barcode-SE400\nThe BGISEQ-500 platform has launched a new test sequencing kits capable of single-end 400 bp sequencing (SE400), which offers a simple and reliable way to achieve DNA barcodes efficiently. In this study, we explored the potential of the BGISEQ-500 SE400 sequencing in DNA barcode reference construction, meanwhile provided an updated HIFI-Barcode software package that can generate COI barcode assemblies using HTS reads of length >= 400 bp.\n\n\n### Manual\n[manual book](https://github.com/comery/HIFI-barcode-SE400/raw/master/HIFI-SE_manual.pdf)\n\n### Versions\n\n#### version 1.0.5 Python\n- v1.0.5 2019-0409 add support to compressed fastq, fix a bug of taxonomy\n- v1.0.4 2019-04-02 Fix a bug of \"polish\", and update the bold_identification module\n- v1.0.3 2018-12-14 Fix a bug of \"trim\"\n- v1.0.2 2018-12-10 Add \"-trim\" function in filter;\n\taccept mismatches in tag or primer sequence,\n \twhen demultiplexing; accept uneven reads to\n \tassembly; add \"-ds\" to drop short reads before\n \tassembly.\n- v1.0.1 2018-12-2 Add \"polish\" function\n- v1.0.0\n\tHIFI-SE v1.0.0 2018/11/22. Changers form previous version:\n\t- Formatted python code writing style as PEP8.\n\t- Fixed several small bugs.\n- v0.0.3\n\tHIFI-SE v0.03 2018/11/15. Changes from previous version:\n\t- Modify the description of some arguments for better understanding.\n- v0.0.1\n\tHIFI-SE v0.0.1 2018/11/03 beat version, establish the framework and archive almost complete functions.\n\n#### original Perl version & Python, rough sources\n>0.expected_error.pl \n1.split_extract.pl \n2.hificonnect.pl\n\n>0.expected_error.py \n1.split_extract.py \n2.hificonnect.py \n\n### Installation\n\n#### System requirement and dependencies \nOperating system: HIFI-SE is designed to run on most platforms, including UNIX, Linux and MacOS/X. Microsoft Windows. We have tested on Linux and on MacOS/X, because these are the machines we develop on. HIFI-SE is written in python language, and a version 3.5 or higher is required.\n#### Dependencies: \n- biopython version 1.5 or higher (required). Please check https://biopython.org/ and https://pypi.org/project/biopython/#description for more details on installation of biopython.\n- Another python package - bold_identification is also required for getting complete function of HIFI-SE. See https://pypi.org/project/bold-identification/\n- HIFI-SE supposed you have installed the VSEARCH on your device, and its path in your $PATH. See https://github.com/torognes/vsearch\n\n#### Install\n\n1. I only deploy my latest version on github, so you can clone this repository to your local computer. However, it would not solve package dependencies, thus you need to install biopython and bold_identification before using HIFI-SE software.(NOTE: pip is a link from pip3)\n\n\t```shell\n\tgit clone https://github.com/comery/HIFI-barcode-SE400.git\n\tpip install biopython\n\tpip install bold_identification \n\t```\n\t\n2. Installation by pip is recommended because it will solve package dependencies automatically, including biopython and bold_identification packages. \n\n\t```pip install HIFI-SE``` \n\n\n### Usage (latest)\n\n```shell\npython3 HIFI-SE.py\n```\nor \n\n```shell\n./HIFI-SE.py\n```\n\n```text\nusage: HIFI-SE [-h] [-v]\n {all,filter,assign,assembly,polish,bold_identification} ...\n\nDescription\n\n An automatic pipeline for HIFI-SE400 project, including filtering\n raw reads, assigning reads to samples, assembly HIFI barcodes\n (COI sequences), polished assemblies, and do tax identification.\n See more: https://github.com/comery/HIFI-barcode-SE400\n\nVersions\n\n 1.0.4 (20190402)\n\nAuthors\n\n yangchentao at genomics.cn, BGI.\n mengguanliang at genomics.cn, BGI.\n\npositional arguments:\n {all,filter,assign,assembly,polish,bold_identification}\n all run filter, assign and assembly.\n filter remove or trim reads with low quality.\n assign assign reads to samples by tags.\n assembly do assembly from assigned reads,\n output raw HIFI barcodes.\n polish polish COI barcode assemblies,\n output confident barcodes.\n bold_identification\n do taxa identification on BOLD system\n\noptional arguments:\n -h, --help show this help message and exit\n -v, --version show program's version number and exit\n```\n\n#### run by steps [filter -> assign -> assembly]\n\n- ```python3 HIFI-SE.py filter ```\n\n```text\nusage: HIFI-SE filter [-h] -outpre -raw [-phred ] [-e ]\n [-q ] [-trim] [-n ]\n\noptional arguments:\n -h, --help show this help message and exit\n\ncommon arguments:\n -outpre prefix for output files\n\nfilter arguments:\n -raw input raw Single-End fastq file, and only\n adapters should be removed; supposed on\n Phred33 score system (BGISEQ-500)\n -phred Phred score system, 33 or 64, default=33\n -e expected error threshod, default=10\n see more: http://drive5.com/usearch/manual/exp_errs.html\n -q filter by base quality; for example: '20 5' means\n dropping read which contains more than 5 percent of\n quality score < 20 bases.\n -trim whether to trim 5' end of read, it adapts to -e mode\n or -q mode\n -n remove reads containing [INT] Ns, default=1\n```\n\n- ```python3 HIFI-SE.py assign```\n\n```text\nusage: HIFI-SE assign [-h] -outpre -index INT -fq -primer \n [-outdir ] [-tmis ] [-pmis ]\n\noptional arguments:\n -h, --help show this help message and exit\n\ncommon arguments:\n -outpre prefix for output files\n\nindex arguments:\n -index INT the length of tag sequence in the ends of primers\n\nwhen only run assign arguments:\n -fq cleaned fastq file\n\nassign arguments:\n -primer taged-primer list, on following format:\n Rev001 AAGCTAAACTTCAGGGTGACCAAAAAATCA\n For001 AAGCGGTCAACAAATCATAAAGATATTGG\n ...\n this format is necessary!\n -outdir output directory for assignment,default=\"assigned\"\n -tmis mismatch number in tag when demultiplexing, default=0\n -pmis mismatch number in primer when demultiplexing, default=1\n```\n- ```python3 HIFI-SE.py assembly```\n\n```\nusage: HIFI-SE assembly [-h] -outpre -index INT -list FILE\n [-vsearch ] [-threads ] [-cid FLOAT]\n [-min INT] [-max INT] [-oid FLOAT] [-tp INT] [-ab INT]\n [-seqs_lim INT] [-len INT] [-ds] [-mode INT] [-rc]\n [-codon INT] [-frame INT]\n\noptional arguments:\n -h, --help show this help message and exit\n\ncommon arguments:\n -outpre prefix for output files\n\nindex arguments:\n -index INT the length of tag sequence in the ends of primers\n\nonly run assembly arguments(not all):\n -list FILE input file, fastq file list. [required]\n\nsoftware path:\n -vsearch vsearch path(only needed if vsearch is not in $PATH)\n -threads threads for vsearch, default=2\n -cid FLOAT identity for clustering, default=0.98\n\nassembly arguments:\n -min INT minimun length of overlap, default=80\n -max INT maximum length of overlap, default=90\n -oid FLOAT minimun similarity of overlap region, default=0.95\n -tp INT how many clusters will be used inassembly, recommend 2\n -ab INT keep clusters to assembly if its abundance >=INT\n -seqs_lim INT reads number limitation. by default,\n no limitation for input reads\n -len INT standard read length, default=400\n -ds drop short reads away before assembly\n -mode INT 1 or 2; modle 1 is to cluster and keep\n most [-tp] abundance clusters, or clusters\n abundance more than [-ab], and then make a\n consensus sequence for each cluster.\n modle 2 is directly to make only one consensus\n sequence without clustering. default=1\n -rc whether to check amino acid\n translation for reads, default not\n\ntranslation arguments(when set -rc or -cc):\n -codon INT codon usage table used to checktranslation, default=5\n -frame INT start codon shift for amino acidtranslation, default=1\n```\n\n### quickstart\n#### Files used in tutorial\nAll related files could be found from here. The important files for tutorial are:\n\n- raw.fastq.gz, raw output fastq file generated from BGISEQ-500 SE400 module.\n- indexed_primer.list, tagged primer list\n\n\n#### Run in \"all\"\nExample:\n\n```shell\npython3 HIFI-SE.py all -outpre hifi -trim -e 5 -raw test.raw.fastq -index 5 -primer index_primer.list -mode 1 -cid 0.98 -oid 0.95 -seqs_lim 50000 -threads 4 -tp 2 \n```\n\n### Citation\nThis work is not be published, but coming soon! I will update this part after publication.",
"description_content_type": "text/markdown",
"docs_url": null,
"download_url": "",
"downloads": {
"last_day": -1,
"last_month": -1,
"last_week": -1
},
"home_page": "https://github.com/comery/HIFI-barcode-SE400",
"keywords": "",
"license": "",
"maintainer": "",
"maintainer_email": "",
"name": "HIFI-SE",
"package_url": "https://pypi.org/project/HIFI-SE/",
"platform": "",
"project_url": "https://pypi.org/project/HIFI-SE/",
"project_urls": {
"Homepage": "https://github.com/comery/HIFI-barcode-SE400"
},
"release_url": "https://pypi.org/project/HIFI-SE/1.0.5/",
"requires_dist": null,
"requires_python": ">=3.5",
"summary": "HIFI-SE",
"version": "1.0.5"
},
"last_serial": 5116378,
"releases": {
"0.0.1": [
{
"comment_text": "",
"digests": {
"md5": "f7cec17fde02a8a8d7fa91afebe51043",
"sha256": "5f36dcb0c2d758c85da9320f871d3d44014a8e1dc9064d4fb428fe0c6a86f42d"
},
"downloads": -1,
"filename": "HIFI-SE-0.0.1.tar.gz",
"has_sig": false,
"md5_digest": "f7cec17fde02a8a8d7fa91afebe51043",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.5",
"size": 28242,
"upload_time": "2018-11-15T09:23:29",
"url": "https://files.pythonhosted.org/packages/29/c6/212a87647dadd51dce5a2841f0d894bee5ab29929c336b0913248c9c3e24/HIFI-SE-0.0.1.tar.gz"
}
],
"0.0.3": [
{
"comment_text": "",
"digests": {
"md5": "a5e1fd526671fa70a08bd69f9494aaae",
"sha256": "81ec8fba8353800c0718041df4d52214f4baf52c5044ed3cadbf6661d8c1bada"
},
"downloads": -1,
"filename": "HIFI-SE-0.0.3.tar.gz",
"has_sig": false,
"md5_digest": "a5e1fd526671fa70a08bd69f9494aaae",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.5",
"size": 28433,
"upload_time": "2018-11-15T09:39:44",
"url": "https://files.pythonhosted.org/packages/32/09/be9796b5463c59369d8d48ce90bf61385a2ee5ecb78e5caeb5933fbeac5a/HIFI-SE-0.0.3.tar.gz"
}
],
"1.0.0": [
{
"comment_text": "",
"digests": {
"md5": "6c2fd95e0b02726667362efaaa6ad4c4",
"sha256": "7f024ec463305cfcd2c98057f5deac2beca31a0dce6b1b49c67bd429e8bdb00e"
},
"downloads": -1,
"filename": "HIFI-SE-1.0.0.tar.gz",
"has_sig": false,
"md5_digest": "6c2fd95e0b02726667362efaaa6ad4c4",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.5",
"size": 30828,
"upload_time": "2018-11-23T03:41:11",
"url": "https://files.pythonhosted.org/packages/3f/99/7fedc900c5451a7e226e2355b265321103bbcb5cb0002bea43fd019c91d5/HIFI-SE-1.0.0.tar.gz"
}
],
"1.0.1": [
{
"comment_text": "",
"digests": {
"md5": "f4adc5ed809134c8aa077aff196ec63f",
"sha256": "ee8540cd1fb9e340ad93618902821e5e1a7b88842a7482ae95695e1632e2bbce"
},
"downloads": -1,
"filename": "HIFI-SE-1.0.1.tar.gz",
"has_sig": false,
"md5_digest": "f4adc5ed809134c8aa077aff196ec63f",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.5",
"size": 19357,
"upload_time": "2018-12-02T15:17:27",
"url": "https://files.pythonhosted.org/packages/72/1a/5dbf0d6bdaa7856c885c9ec5261f4db9287b66c808fec5b1cb4ef03d3c68/HIFI-SE-1.0.1.tar.gz"
}
],
"1.0.2": [
{
"comment_text": "",
"digests": {
"md5": "73b8b864e845fd95f283e5726d0e8ed2",
"sha256": "acd7491a7879c513cbba0373aa8273452f520f50dda52d48af3380fae63fac02"
},
"downloads": -1,
"filename": "HIFI-SE-1.0.2.tar.gz",
"has_sig": false,
"md5_digest": "73b8b864e845fd95f283e5726d0e8ed2",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.5",
"size": 21495,
"upload_time": "2018-12-11T03:06:58",
"url": "https://files.pythonhosted.org/packages/ff/c8/69221348b03ab7d1ae33dcb397f3a305a82756110b156715e99aa06ce096/HIFI-SE-1.0.2.tar.gz"
}
],
"1.0.3": [
{
"comment_text": "",
"digests": {
"md5": "19864bd9267c5d9aa469c68846ee2bd2",
"sha256": "75eefd56ac977be1ab01744b63297c4b204a669981615007cd5f1fad1a77435d"
},
"downloads": -1,
"filename": "HIFI-SE-1.0.3.tar.gz",
"has_sig": false,
"md5_digest": "19864bd9267c5d9aa469c68846ee2bd2",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.5",
"size": 21948,
"upload_time": "2018-12-14T09:36:04",
"url": "https://files.pythonhosted.org/packages/ca/72/f9ba0d05edcdb141b5bd02c60cb97f59683719878be2b23b8464b5a98c61/HIFI-SE-1.0.3.tar.gz"
}
],
"1.0.4": [
{
"comment_text": "",
"digests": {
"md5": "112a072894d8afcccf1ad5036f63ef6b",
"sha256": "aa36b2a8b4edf8c0e07859fc1cf3c12981499b30579732c2fbc396a7ddabd7ea"
},
"downloads": -1,
"filename": "HIFI-SE-1.0.4.tar.gz",
"has_sig": false,
"md5_digest": "112a072894d8afcccf1ad5036f63ef6b",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.5",
"size": 19317,
"upload_time": "2019-04-08T06:04:32",
"url": "https://files.pythonhosted.org/packages/35/05/b7222b1244dd5ca693b9af29fa14945dd6f7a08e3b29281ce78308fba556/HIFI-SE-1.0.4.tar.gz"
}
],
"1.0.5": [
{
"comment_text": "",
"digests": {
"md5": "08b81b4c8e4eeb30a6b1ceeea8a4bcfc",
"sha256": "8bb34a34ad1c1ff7ca114633f2998574140722c6af768deac586daeec1df6858"
},
"downloads": -1,
"filename": "HIFI-SE-1.0.5.tar.gz",
"has_sig": false,
"md5_digest": "08b81b4c8e4eeb30a6b1ceeea8a4bcfc",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.5",
"size": 19313,
"upload_time": "2019-04-09T01:52:13",
"url": "https://files.pythonhosted.org/packages/69/c2/e289fd7480f0d94643762bf7d6b271bb452d6ad21b8552fb0abc057e7657/HIFI-SE-1.0.5.tar.gz"
}
]
},
"urls": [
{
"comment_text": "",
"digests": {
"md5": "08b81b4c8e4eeb30a6b1ceeea8a4bcfc",
"sha256": "8bb34a34ad1c1ff7ca114633f2998574140722c6af768deac586daeec1df6858"
},
"downloads": -1,
"filename": "HIFI-SE-1.0.5.tar.gz",
"has_sig": false,
"md5_digest": "08b81b4c8e4eeb30a6b1ceeea8a4bcfc",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.5",
"size": 19313,
"upload_time": "2019-04-09T01:52:13",
"url": "https://files.pythonhosted.org/packages/69/c2/e289fd7480f0d94643762bf7d6b271bb452d6ad21b8552fb0abc057e7657/HIFI-SE-1.0.5.tar.gz"
}
]
}