{ "info": { "author": "Nick Waters, Marcus Fedarko", "author_email": "", "bugtrack_url": null, "classifiers": [ "Development Status :: 3 - Alpha", "Programming Language :: Python :: 3", "Programming Language :: Python :: 3 :: Only", "Topic :: Scientific/Engineering", "Topic :: Scientific/Engineering :: Bio-Informatics", "Topic :: Software Development :: Libraries" ], "description": "# pyfastg: a minimal Python library for parsing networks from SPAdes FASTG files\n[![Build Status](https://travis-ci.org/fedarko/pyfastg.svg?branch=master)](https://travis-ci.org/fedarko/pyfastg) [![Code Coverage](https://codecov.io/gh/fedarko/pyfastg/branch/master/graph/badge.svg)](https://codecov.io/gh/fedarko/pyfastg)\n\n## The FASTG file format\nFASTG is a format to describe genome assemblies, geared toward accurately representing the ambiguity resulting from sequencing limitations, ploidy, or other factors that complicate representation of a seqence as a simple string. The official spec for the FASTG format can be found [here](http://fastg.sourceforge.net/).\n\npyfastg parses graphs that follow **a subset of this specification**: in\nparticular, it is designed to work with files output by the\n[SPAdes](http://cab.spbu.ru/software/spades/) family of assemblers.\n\n## pyfastg\npyfastg contains `parse_fastg()`, a function that accepts as input a path\nto a SPAdes FASTG file. This function parses the structure of the specified\nfile, returning a [NetworkX](https://networkx.github.io) `DiGraph` object representing\nthe structure of the graph.\n\npyfastg is very much in its infancy, so it may be most useful as a starting point.\nPull requests welcome!\n\n### Quick Example\n\n```python\n>>> import pyfastg\n>>> g = pyfastg.parse_fastg(\"pyfastg/tests/input/assembly_graph.fastg\")\n>>> # g is now a NetworkX DiGraph! We can do whatever we want with this object.\n>>> # Example: List the nodes in g\n>>> g.nodes()\nNodeView(('1+', '29-', '1-', '6-', '2+', '26+', '27+', '2-', '3+', '4+', '6+', '7+', '3-', '33-', '9-', '4-', '5+', '5-', '28+', '7-', '8+', '28-', '9+', '8-', '12-', '10+', '12+', '10-', '24-', '32-', '11+', '30-', '11-', '27-', '19-', '13+', '25+', '31-', '13-', '14+', '14-', '26-', '15+', '15-', '23-', '16+', '16-', '17+', '17-', '19+', '18+', '33+', '18-', '20+', '20-', '22+', '21+', '21-', '22-', '23+', '24+', '25-', '29+', '30+', '31+', '32+'))\n>>> # Example: Get details for a single node (length, coverage, and GC-content)\n>>> g.nodes[\"15+\"]\n{'length': 193, 'cov': 6.93966, 'gc': 0.5492227979274611}\n>>> # Example: Get information about the graph's connectivity\n>>> import networkx as nx\n>>> components = list(nx.weakly_connected_components(g))\n>>> for c in components:\n... print(len(c), \"nodes\")\n... print(c)\n...\n33 nodes\n{'8-', '17-', '15+', '30+', '16+', '26-', '25+', '19+', '7+', '23+', '14-', '18-', '10-', '29-', '20-', '27-', '11-', '5-', '3+', '2-', '12-', '13+', '31-', '6+', '1+', '21-', '24-', '32-', '22+', '28+', '4+', '33-', '9-'}\n33 nodes\n{'26+', '29+', '18+', '3-', '2+', '8+', '15-', '24+', '9+', '17+', '27+', '28-', '11+', '6-', '20+', '14+', '19-', '13-', '4-', '21+', '5+', '31+', '22-', '12+', '25-', '30-', '10+', '1-', '7-', '32+', '23-', '33+', '16-'}\n```\n\n### Required File Format (tl;dr: SPAdes-dialect FASTG files only)\nCurrently, pyfastg is hardcoded to parse FASTG files created by the SPAdes assembler. Other valid FASTG files that don't follow the pattern used by SPAdes for node names are not supported.\n\nIn particular, each node in the file must be declared as\n\n```bash\n>EDGE_1_length_9909_cov_6.94721\n```\n\nThe node ID (here, `1`) can contain the characters `a-z`, `A-Z`, and `0-9`.\n\nThe node length (here, `9909`) can contain the characters `0-9`.\n\nThe node coverage (here, `6.94721`) can contain the characters `0-9` and `.`.\n\nWe assume that each node sequence (the line(s) between node declarations)\nconsists only of valid DNA characters, as determined by\n[`skbio.DNA`](http://scikit-bio.org/docs/latest/generated/skbio.sequence.DNA.html).\nLeading and trailing whitespace in sequence lines will be ignored, so something\nlike\n```bash\n ATC\n\n G \n```\nis perfectly valid (however, `ATC G` is not since the inner space, ` `, will be\nconsidered part of the sequence).\n\nIt is also worth noting that pyfastg **only creates nodes/edges based on those\nobserved in the graph**: if your graph only contains nodes 1+, 2+, and 3+, then\nthis won't automatically create reverse complement nodes 1-, 2-, 3-, etc.\n\n### Identified node attributes\nNodes in the returned `DiGraph` (represented in the FASTG file as `EDGE_`s)\ncontain three attribute fields:\n\n1. `length`: the length of the node (represented as a python `int`)\n2. `cov`: the coverage of the node (represented as a python `float`)\n2. `gc`: the GC-content of the node's sequence (represented as a python `float`)\n\nFurthermore, every node's name will end in `-` if the node is a \"reverse\ncomplement\" (i.e. if its declaration in the FASTG file ends in a `'` character) and `+` otherwise.\n\n### Dependencies\n\n- [NetworkX](https://networkx.github.io)\n- [scikit-bio](http://scikit-bio.org/)\n\n\n", "description_content_type": "text/markdown", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/fedarko/pyfastg", "keywords": "", "license": "MIT", "maintainer": "Marcus Fedarko", "maintainer_email": "", "name": "pyfastg", "package_url": "https://pypi.org/project/pyfastg/", "platform": "", "project_url": "https://pypi.org/project/pyfastg/", "project_urls": { "Homepage": "https://github.com/fedarko/pyfastg" }, "release_url": "https://pypi.org/project/pyfastg/0.0.0/", "requires_dist": [ "networkx", "scikit-bio", "pytest ; extra == 'dev'", "pytest-cov ; extra == 'dev'", "flake8 ; extra == 'dev'", "black ; extra == 'dev'" ], "requires_python": "", "summary": "Minimal Python library for parsing SPAdes FASTG files", "version": "0.0.0" }, "last_serial": 5942416, "releases": { "0.0.0": [ { "comment_text": "", "digests": { "md5": "8dabccde4164aa9860f65b3e76fac38a", "sha256": "ba319d931b8295530ffa8cd2a6ac94e9462a9f43c5052ebbe2ac424d312b3179" }, "downloads": -1, "filename": "pyfastg-0.0.0-py3-none-any.whl", "has_sig": false, "md5_digest": "8dabccde4164aa9860f65b3e76fac38a", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 9930, "upload_time": "2019-10-08T01:48:09", "url": "https://files.pythonhosted.org/packages/cc/d8/23592b4dd716833d131351254cae2a85bfc1291a446ef801097296e483fa/pyfastg-0.0.0-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "ede09c21243a82cd0cee6f1737b01fd9", "sha256": "201b3b6b39b020d89e3daea8ade96fdd93587290edfa64274bc670ecd22eaf30" }, "downloads": -1, "filename": "pyfastg-0.0.0.tar.gz", "has_sig": false, "md5_digest": "ede09c21243a82cd0cee6f1737b01fd9", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 7857, "upload_time": "2019-10-08T01:48:11", "url": "https://files.pythonhosted.org/packages/c6/e8/a6d7644f3e91fc100d8bf08de4c7e9dbf8fe09eb9b31f5e1cd83b7ba1387/pyfastg-0.0.0.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "8dabccde4164aa9860f65b3e76fac38a", "sha256": "ba319d931b8295530ffa8cd2a6ac94e9462a9f43c5052ebbe2ac424d312b3179" }, "downloads": -1, "filename": "pyfastg-0.0.0-py3-none-any.whl", "has_sig": false, "md5_digest": "8dabccde4164aa9860f65b3e76fac38a", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 9930, "upload_time": "2019-10-08T01:48:09", "url": "https://files.pythonhosted.org/packages/cc/d8/23592b4dd716833d131351254cae2a85bfc1291a446ef801097296e483fa/pyfastg-0.0.0-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "ede09c21243a82cd0cee6f1737b01fd9", "sha256": "201b3b6b39b020d89e3daea8ade96fdd93587290edfa64274bc670ecd22eaf30" }, "downloads": -1, "filename": "pyfastg-0.0.0.tar.gz", "has_sig": false, "md5_digest": "ede09c21243a82cd0cee6f1737b01fd9", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 7857, "upload_time": "2019-10-08T01:48:11", "url": "https://files.pythonhosted.org/packages/c6/e8/a6d7644f3e91fc100d8bf08de4c7e9dbf8fe09eb9b31f5e1cd83b7ba1387/pyfastg-0.0.0.tar.gz" } ] }