{ "info": { "author": "Lars Sch\u00f6ning", "author_email": "lays@biosustain.dtu.dk", "bugtrack_url": null, "classifiers": [ "Development Status :: 4 - Beta", "Environment :: Console", "Intended Audience :: Developers", "Intended Audience :: Science/Research", "License :: OSI Approved :: Apache Software License", "Operating System :: OS Independent", "Programming Language :: Python", "Programming Language :: Python :: 2.7", "Programming Language :: Python :: 3", "Programming Language :: Python :: 3.3", "Programming Language :: Python :: 3.4", "Programming Language :: Python :: 3.5", "Topic :: Scientific/Engineering", "Topic :: Scientific/Engineering :: Bio-Informatics" ], "description": "================\r\nGoodbye, GenBank\r\n================\r\n\r\n.. image:: https://img.shields.io/travis/biosustain/goodbye-genbank/master.svg?style=flat-square\r\n :target: https://travis-ci.org/biosustain/goodbye-genbank\r\n\r\n**Goodbye, GenBank** converts `SeqFeature `_ sequence\r\nannotations from NCBI GenBank records to a common and simplified format. GenBank feature annotations have a\r\nfeature key and reasonably well defined qualifiers, but non-standard and discontinued feature types and qualifiers are commonly\r\nused and often the feature key is something someone made up and not a valid GenBank feature key. And even when a valid GenBank feature key is used, it is often incomplete and useless without additional details in the qualifiers.\r\n\r\nThis package converts most feature keys to appropriate `Sequence Ontology `_ terms used by GFF3 and SBOL. Non-standard qualifiers are repaired or removed.\r\n\r\n**Goodbye, GenBank** is intended for those who wish to clean-up their GenBank files and then transition to a different format.\r\nThe philosophy of this project is to salvage what is salvageable and to discard what is not. GenBank feature types are translated\r\nto Sequence Ontology terms; qualifiers are converted into a reduced set that contains only the parts that are not broken. Qualifiers are also converted to their correct type: ``int`` for integers, ``list`` only for qualifiers that can appear multiple times, ``bool`` for flags.\r\n\r\nMoreover, different options are available to configure what is kept and what is thrown away.\r\n\r\nInstallation\r\n------------\r\n\r\n::\r\n\r\n pip install gbgb\r\n \r\n\r\nExample\r\n-------\r\n\r\n::\r\n\r\n >>> feature\r\n SeqFeature(FeatureLocation(ExactPosition(2931), ExactPosition(2936), strand=1), type='-10_signal')\r\n >>> feature.qualifiers\r\n {'ApEinfo_fwdcolor': ['pink'],\r\n 'ApEinfo_graphicformat': ['arrow_data {{0 1 2 0 0 -1} {} 0} width 5 offset 0'],\r\n 'ApEinfo_revcolor': ['pink'],\r\n 'label': ['RNAII Promoter (-10 signal)']}\r\n >>>\r\n >>> from gbgb import convert_feature\r\n >>> feature = convert_feature(feature)\r\n >>> feature\r\n SeqFeature(FeatureLocation(ExactPosition(2931), ExactPosition(2936), strand=1), type='minus_10_signal')\r\n >>> feature.qualifiers\r\n {'note': 'RNAII Promoter (-10 signal)'}\r\n >>>\r\n >>> from gbgb import genbank_feature_key\r\n >>> genbank_feature_key('minus_10_signal')\r\n 'regulatory'\r\n\r\n\r\nDesign considerations\r\n---------------------\r\n\r\nFor the most part, *Goodbye, GenBank* attempts to be idempotent, i.e. features and their types/keys and qualifiers can be safely\r\ntransformed any number times with the same settings. The apparent mismatch between the conversion to Sequence Ontology feature\r\nterms and valid/fixed GenBank qualifiers is to simplify downstream processing. It is up to the users which qualifiers they wish\r\nto keep, but at least the choices they are given are reasonable.\r\n\r\nContributing\r\n------------\r\n\r\nIf you have any questions or suggestions or if you have found a unique new specimen of GenBank files that you would like\r\nto convert, please open an issue.\r\n\r\n\r\nIssues\r\n------\r\n\r\n- SO Term: \"regulatory\" feature type with /regulatory_class=\"enhancer_blocking_element\"\r\n\r\n There is apparently no matching Sequence Ontology term. An enhancer blocking element behaves like an insulator, but\r\n is not an insulator. It is a transcriptional cis regulatory region, but that description is too broad.\r\n\r\n- SO Term: \"misc_structure\" feature type\r\n\r\n GenBank uses this feature type for secondary and tertiary nucleotide structures. There appears to be\r\n no matching Sequence Ontology term.\r\n\r\n- SO Term: \"assembly_gap\" feature type\r\n\r\n GenBank has both \"gap\" and \"assembly_gap\" feature types, which appear to have slightly different meanings. However,\r\n SO only has a \"gap\" term, which refers to assembly gaps.\r\n\r\n- GFF3 export\r\n\r\n There is no good GFF3 exporter out there, so why not write one?\r\n\r\n Skeleton code in gbgb.export.gff3\r\n\r\n- Reduction of SO terms\r\n\r\n Allow users to specify a set of Sequence Ontology terms (inheriting from \"sequence_feature\"). Feature types will be\r\n reduced to the nearest Ontology term. This is to simplify downstream analysis.\r\n\r\n- /pseudo qualifier without /pseudogene=\"\"\r\n\r\n There is no matching Sequence Ontology term for this. Several GenBank files contain /pseudo without /pseudogene=\"\"\r\n to mean pseudogene.\r\n\r\n- Mandatory qualifiers\r\n\r\n These should be filled in using a reasonable guess or errors should be thrown when trying to convert a feature without\r\n its mandatory qualifiers.\r\n\r\n\r\nMaterials\r\n---------\r\n\r\n- `GenBank Feature Table Definition `_\r\n\r\n- `GenBank Release Notes (since December 1992) `_\r\n\r\n- `NCBI Prokaryotic Genome Annotation Guide `_\r\n\r\n- `Sequence Ontology Wiki -- Discontinuous Features `_.\r\n\r\n On trans-splicing.\r\n\r\n- `GFF3 Specification `_", "description_content_type": null, "docs_url": null, "download_url": "UNKNOWN", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/biosustain/goodbye-genbank", "keywords": "", "license": "Apache", "maintainer": "", "maintainer_email": "", "name": "gbgb", "package_url": "https://pypi.org/project/gbgb/", "platform": "UNKNOWN", "project_url": "https://pypi.org/project/gbgb/", "project_urls": { "Download": "UNKNOWN", "Homepage": "https://github.com/biosustain/goodbye-genbank" }, "release_url": "https://pypi.org/project/gbgb/0.1.0/", "requires_dist": null, "requires_python": null, "summary": "Goodbye, GenBank is a package for use with Biopython that gives feature annotations from GenBank records a new and better life.", "version": "0.1.0" }, "last_serial": 4735499, "releases": { "0.0.0": [], "0.1.0": [ { "comment_text": "", "digests": { "md5": "5302fbba6d5b1bd82f2080910e7e60b3", "sha256": "df684802ff81e9646e4344a6357978ff260de5450511cb33af5b74fddfcb6b19" }, "downloads": -1, "filename": "gbgb-0.1.0.tar.gz", "has_sig": false, "md5_digest": "5302fbba6d5b1bd82f2080910e7e60b3", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 11291, "upload_time": "2016-04-27T15:30:36", "url": "https://files.pythonhosted.org/packages/b0/b2/c27dedc9e2f71e5f64f4221e22b64af543fea77bcb5a7be6944b6342c80e/gbgb-0.1.0.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "5302fbba6d5b1bd82f2080910e7e60b3", "sha256": "df684802ff81e9646e4344a6357978ff260de5450511cb33af5b74fddfcb6b19" }, "downloads": -1, "filename": "gbgb-0.1.0.tar.gz", "has_sig": false, "md5_digest": "5302fbba6d5b1bd82f2080910e7e60b3", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 11291, "upload_time": "2016-04-27T15:30:36", "url": "https://files.pythonhosted.org/packages/b0/b2/c27dedc9e2f71e5f64f4221e22b64af543fea77bcb5a7be6944b6342c80e/gbgb-0.1.0.tar.gz" } ] }