{ "info": { "author": "Andrei Kucharavy", "author_email": "andrei.chiffa136@gmail.com", "bugtrack_url": null, "classifiers": [ "Development Status :: 3 - Alpha", "Intended Audience :: Developers", "Intended Audience :: Healthcare Industry", "Intended Audience :: Science/Research", "License :: OSI Approved :: BSD License", "Operating System :: POSIX :: Linux", "Programming Language :: Python :: 2.7", "Programming Language :: Python :: Implementation :: CPython", "Topic :: Scientific/Engineering :: Bio-Informatics", "Topic :: Scientific/Engineering :: Information Analysis", "Topic :: Scientific/Engineering :: Medical Science Apps.", "Topic :: Software Development :: Libraries :: Python Modules" ], "description": "[![License\nType](https://img.shields.io/badge/license-BSD3-blue.svg)](https://github.com/chiffa/BioFlow/blob/master/License-new_BSD.txt)\n[![Python\nversion](https://img.shields.io/badge/python-2.7-blue.svg)](https://www.python.org/downloads/release/python-2715/)\n[![Branch\nStatus](https://img.shields.io/badge/status-alpha-red.svg)](https://www.python.org/downloads/release/python-2715/)\n\nBioFlow Project\n===============\n\nInformation Flow Analysis in biological networks\n\n[![Build\nStatus](https://travis-ci.org/chiffa/BioFlow.svg?branch=master)](https://travis-ci.org/chiffa/BioFlow)\n[![Documentation Status](https://readthedocs.org/projects/bioflow/badge/?version=latest)](https://bioflow.readthedocs.io/en/latest/?badge=latest)\n[![Coverage\nStatus](https://coveralls.io/repos/chiffa/BioFlow/badge.svg?branch=master&service=github)](https://coveralls.io/github/chiffa/BioFlow?branch=master)\n[![Duplicate\nLines](https://img.shields.io/badge/duplicate%20lines-11.45%25-yellowgreen.svg)](http://clonedigger.sourceforge.net/)\n[![Code\nHealth](https://landscape.io/github/chiffa/BioFlow/master/landscape.svg?style=flat)](https://landscape.io/github/chiffa/BioFlow/master)\n\nDescription:\n------------\n\nThis project's goal is to predict a systemic effect of massive gene\nperturbation, whether triggered by a drug, causative mutation or a\ndisease (such as cancer or disease with complex genetic background).\nIt's main intended uses are the reduction of high-throughput experiments\nhit lists, in-silico prediction of de-novo drug toxicity based on their\nprotein binding profile and retrieval of most likely pathways explaining\na phenotype of interest from a complex genotype.\n\nIts main advantage is the integration of quantitative computational\npredictions with prior biological knowledge and ability to integrate\nsuch diverse source of knowledge as databases, simulation, publication\ndata and expert knowledge.\n\nUnlike similar solutions, it provides several levels of access to the\nunderlying data (integrated database instance with graph visualization,\ncourtesy of [neo4j graph platform](https://neo4j.com/), as well as\npython [numpy](http://www.numpy.org/)/[scikits](https://www.scipy.org/)\nsparse adjacency and laplacian graphs.\n\nThe application is currently under development (alpha), hence the API is\nunstable and can be changed at any point without notice. If you are\nusing it, please pin the version/commit number. If you run into issues,\nplease fill the github ticket.\n\nThe license is BSD 3-clause, in case of academic usage, please cite the\n*url* of this repository (publication is in preparation). The full API\ndocumentation is available at\n[readthedocs.org](http://bioflow.readthedocs.org/en/latest/).\n\nInstallation walk-through:\n--------------------------\n\n### Ubuntu desktop:\n\n1) Install the Anaconda python 2.7 and make it your default python. The\n full process is explained\n [here](https://docs.anaconda.com/anaconda/install/linux/)\n2) Isnstall libsuitesparse:\n\n > apt-get -y install libsuitesparse-dev\n\n3) Install neo4j:\n\n > wget -O - https://debian.neo4j.org/neotechnology.gpg.key | sudo apt-key add -\n > echo 'deb https://debian.neo4j.org/repo stable/' | sudo tee /etc/apt/sources.list.d/neo4j.list\n > sudo apt-get update\n > sudo apt-get install neo4j\n\n4) Install MongDB (Assuming Linux 18.04 - if not, see\n [here](https://docs.mongodb.com/manual/tutorial/install-mongodb-on-ubuntu/)):\n\n > sudo apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv 9DA31620334BD75D9DCB49F368818C72E52529D4\n > echo \"deb [ arch=amd64 ] https://repo.mongodb.org/apt/ubuntu bionic/mongodb-org/4.0 multiverse\" | sudo tee /etc/apt/sources.list.d/mongodb-org-4.0.list\n > sudo apt-get update\n > sudo apt-get install -y mongodb-org\n\nFor more information, refer to the [installation\nguide](http://bioflow.readthedocs.org/en/latest/guide.html#installation-and-requirements)\n\n5) Finally, install BioFlow :\n\n > pip install BioFlow\n\nOr, alternatively, in case command line interface is not desired:\n\n > git clone https://github.com/chiffa/BioFlow.git\n > cd \n > pip install -r requirements.txt\n\n### Docker:\n\nIf you want to build locally (notice you need to issue docker commands\nwith the actual docker-enabled user; usually prepending sudo to the\ncommands):\n\n > cd \n > docker build -t\n > docker run bioflow\n > docker-compose build\n > docker-compose up -d\n\nIf you want to pull from dockerhub or don't have access to BioFlow\ninstallation directory:\n\n > wget https://github.com/chiffa/BioFlow/blob/master/docker-compose.yml\n > docker-compose build\n > docker-compose up -d\n\nUsage walk-through:\n-------------------\n\n> **warning**\n>\n> While BioFlow provides an interface to download the databases\n> programmatically, the databases are subject to Licenses and Terms that\n> it's up to the end users to respect\n\nFor more information about data and config files, refer to the [data and\ndatabase\nguide](http://bioflow.readthedocs.org/en/latest/guide.html#data-and-databases-setup)\n\n### Python scripts:\n\nThis is the recommended method for using BioFlow.\n\nImport the minimal dependencies:\n\n > from bioflow.annotation_network.knowledge_access_analysis import auto_analyze as knowledge_analysis\n > from bioflow.molecular_network.interactome_analysis import auto_analyze as interactome_analysis\n > from bioflow.utils.io_routines import get_source_bulbs_ids\n > from bioflow.utils.top_level import map_and_save_gene_ids, rebuild_the_laplacians\n\nSet static folders and urls for the databases & pull the online\ndatabases:\n\n > set_folders('~/support') # script restart here is required to properly update all the folders\n > pull_online_dbs()\n\nSet the organism (human, S. Cerevisiae):\n\n > build_source_config('human') # script restart here is required to properly update all the folders\n\nMap the hits and the background genes (available through an experimental\ntechnique) to internal IDs:\n\n > map_and_save_gene_ids('path_to_hits.csv', 'path_to_background.csv')\n\nBioFlow expects the csv files to contain one gene per line. It is\ncapable of mapping genes based on the following ID types:\n\n- Gene names\n- HGCN symbols\n- PDB Ids\n- ENSEMBL Ids\n- RefSeq IDs\n- Uniprot IDs\n- Uniprot accession numbers\n\nPreferred mapping approach is through HGCN symbols and Gene names.\n\nRebuild the laplacians (not required unless background Ids List has been\nchanged):\n\n > rebuild_the_laplacians(all_detectable_genes=background_bulbs_ids)\n\nLaunch the analysis itself for the information flow in the interactome:\n\n > interactome_analysis([hits_ids],\n desired_depth=9,\n processors=3,\n background_list=background_bulbs_ids,\n skip_sampling=False,\n from_memoization=False)\n\nLaunch the analysis itself for the information flow in the annotation\nnetwork (experimental):\n\n > knowledge_analysis([hits_ids],\n desired_depth=20,\n processors=3,\n skip_sampling=False)\n\nWhere:\n\nhits\\_ids\n: list of hits\n\ndesired\\_depth\n: how many samples we would like to generate to compare against\n\nprocessors\n: how many threads we would like to launch in parallel (in general 3/4 works best)\n\nbackground\\_list\n: list of background Ids\n\nskip\\_sampling\n: if true, skips the sampling of background set and retrieves stored ones instead\n\nfrom\\_memoization\n: if true, assumes the information flow for the hits sample has already been computed\n\nBioFlow will print progress to the StdErr from then on and will output\nto the user's home directory, in a folder called 'outputs\\_YYYY-MM\\_DD\n\\':\n\n- .gdf file with the flow network and relevance statistics\n (Interactome\\_Analysis\\_output.gdf)\n- visualisation of information flow through nodes in the null vs hits\n sets based on the node degree\n- list of strongest hits (interactome\\_stats.tsv)\n\nThe .gdf file can be further analysed with more appropriate tools.\n\n### Command line:\n\n> **warning**\n>\n> Command line interface is currently unstable and is susceptible to\n> throw opaque errors.\n\nSetup environment (likely to take a while top pull all the online\ndatabases): :\n\n > bioflow initialize --~/data_store\n > bioflow downloaddbs\n > bioflow setorg human\n > bioflow loadneo4j\n\nSet the set of perturbed proteins on which we would want to base our\nanalysis :\n\n > bioflow setsource /home/ank/source_data/perturbed_proteins_ids.csv\n\nBuild network interfaces :\n\n > bioflow extractmatrix --interactome\n > bioflow extractmatrix --annotome\n\nPerform the analysis:\n\n > bioflow analyze --matrix interactome --depth 24 --processors 4\n > bioflow analyze --matrix annotome --depth 24 --processors 4\n\nThe results of analysis will be available in the output folder, and\nprinted out to the standard output.\n\n### Post-processing:\n\nThe .gdf file format is one of the standard format for graph exchange.\nIt contains the following columns for the nodes:\n\n- node ID\n- information current passing through the node\n- node type\n- legacy\\_id (most likely Uniprot ID)\n- degree of the node\n- whether it is present or not in the hits list (source)\n- p-value, comparing the information flow through the node to the flow\n expected for the random set of genes\n- -log10(p\\_value) (p\\_p-value)\n- rel\\_value (information flow relative to the flow expected for a\n random set of genes)\n- std\\_diff (how many standard deviations above the flow for a random\n set of genes the flow from a hits list is)\n\nThe most common pipleine involves using [Gephi open graph visualization\nplatform](https://gephi.org/):\n\n- Load the gdf file into gephy\n- Filter out all the nodes with information flow below 0.05 (Filters\n \\> Atrributes \\> Range \\> current)\n- Perform clustering (Statistics \\> Modularity \\> Randomize & use\n weights)\n- Filter out all the nodes below a significance threshold (Filters \\>\n Attributes \\> Range \\> p-value)\n- Set Color nodes based on the Modularity Class (Nodes \\> Colors \\>\n Partition \\> Modularity Class)\n- Set node size based on p\\_p-value (Nodes \\> Size \\> Ranking \\>\n p\\_p-value )\n- Set text color based on whether the node is in the hits list (Nodes\n \\> Text Color \\> Partition \\> source)\n- Set text size based on p\\_p-value (Nodes \\> Text Size \\> Ranking \\>\n p\\_p-value)\n- Show the lables (T on the bottom left)\n- Set labes to the legacy IDs (Notepad on the bottom)\n- Perform a ForeAtlas Node Separation (Layout \\> Force Atlas 2 \\>\n Dissuade Hubs & Prevent Overlap)\n- Adjust label size\n- Adjust labels position (Layout \\> LabelAdjust)\n\nFor more details or usage as a library, refer to the [usage\nguide](http://bioflow.readthedocs.org/en/latest/guide.html#basic-usage).\n\n\n", "description_content_type": "", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/chiffa/BioFlow", "keywords": "network analysis,systems biology,interactome,computational biology", "license": "BSD", "maintainer": "", "maintainer_email": "", "name": "BioFlow", "package_url": "https://pypi.org/project/BioFlow/", "platform": "", "project_url": "https://pypi.org/project/BioFlow/", "project_urls": { "Homepage": "https://github.com/chiffa/BioFlow" }, "release_url": "https://pypi.org/project/BioFlow/0.2.3/", "requires_dist": [ "numpy (<2.0)", "scipy (<1.0)", "matplotlib (<2.0)", "scikit-learn (<0.17)", "cython (<0.23)", "pymongo (<4.0)", "requests (<3.0)", "click (<6.0)", "scikits.sparse (<0.3)", "mock (<2.0)", "requests-ftp (<0.4)", "neo4j-driver (<2.0)", "coverage; extra == 'ci'", "pylint; extra == 'ci'", "coveralls; extra == 'ci'", "pyflakes; extra == 'ci'", "pep-8-naming; extra == 'ci'", "maccabe; extra == 'ci'", "flake-8; extra == 'ci'" ], "requires_python": "", "summary": "Information Flow Analysis in biological networks", "version": "0.2.3" }, "last_serial": 4403842, "releases": { "0.2.2": [ { "comment_text": "", "digests": { "md5": "31eed7b72d6b452243bc46bba4c7e79e", "sha256": "60dea338cd3403ce3c89bfb6a35c694c5fc48d2ebfbfe716034649fadd4d5271" }, "downloads": -1, "filename": "BioFlow-0.2.2-py2-none-any.whl", "has_sig": false, "md5_digest": "31eed7b72d6b452243bc46bba4c7e79e", "packagetype": "bdist_wheel", "python_version": "py2", "requires_python": null, "size": 131614, "upload_time": "2018-10-22T21:00:39", "url": "https://files.pythonhosted.org/packages/50/78/f9ac58efc0dcb16024f018163af5ef86a4ee9fd14a43f4b5d9c45e95b834/BioFlow-0.2.2-py2-none-any.whl" }, { "comment_text": "", "digests": { "md5": "74a28a4eb687e111832ca6d944a546f7", "sha256": "80b690397acc404f22fbe491ed70fdd437a337741b33d6f818b100438d958c81" }, "downloads": -1, "filename": "BioFlow-0.2.2.tar.gz", "has_sig": false, "md5_digest": "74a28a4eb687e111832ca6d944a546f7", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 104615, "upload_time": "2018-10-22T21:00:40", "url": "https://files.pythonhosted.org/packages/09/d5/333ec960932bb192f34f7156eaf9771ca6eb0838f94bf7b218ab57ee3dc6/BioFlow-0.2.2.tar.gz" } ], "0.2.3": [ { "comment_text": "", "digests": { "md5": "8725cdbf3f1e3274107e597a2e0a8dba", "sha256": "2d9a1ee9af4d0c363c191522e675121020e6a02e2d30e3fbff6e9e3f6e65b6dc" }, "downloads": -1, "filename": "BioFlow-0.2.3-py2-none-any.whl", "has_sig": false, "md5_digest": "8725cdbf3f1e3274107e597a2e0a8dba", "packagetype": "bdist_wheel", "python_version": "py2", "requires_python": null, "size": 131638, "upload_time": "2018-10-22T21:10:43", "url": "https://files.pythonhosted.org/packages/7c/22/cf182ee29c9166812b56e5019e0e4377e908376ee9284e30e33b53e426ec/BioFlow-0.2.3-py2-none-any.whl" }, { "comment_text": "", "digests": { "md5": "877a7fbedd1083632c8ad1cd213f91bd", "sha256": "1e35774ce34de251468e2c7673f5cc8a344467334d646f8310d39ee2508cd5eb" }, "downloads": -1, "filename": "BioFlow-0.2.3.tar.gz", "has_sig": false, "md5_digest": "877a7fbedd1083632c8ad1cd213f91bd", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 104624, "upload_time": "2018-10-22T21:10:44", "url": "https://files.pythonhosted.org/packages/11/b8/ff85686c4c751c955c9746ed4454d6a5d6dc4e416a00903d13682e6f80c4/BioFlow-0.2.3.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "8725cdbf3f1e3274107e597a2e0a8dba", "sha256": "2d9a1ee9af4d0c363c191522e675121020e6a02e2d30e3fbff6e9e3f6e65b6dc" }, "downloads": -1, "filename": "BioFlow-0.2.3-py2-none-any.whl", "has_sig": false, "md5_digest": "8725cdbf3f1e3274107e597a2e0a8dba", "packagetype": "bdist_wheel", "python_version": "py2", "requires_python": null, "size": 131638, "upload_time": "2018-10-22T21:10:43", "url": "https://files.pythonhosted.org/packages/7c/22/cf182ee29c9166812b56e5019e0e4377e908376ee9284e30e33b53e426ec/BioFlow-0.2.3-py2-none-any.whl" }, { "comment_text": "", "digests": { "md5": "877a7fbedd1083632c8ad1cd213f91bd", "sha256": "1e35774ce34de251468e2c7673f5cc8a344467334d646f8310d39ee2508cd5eb" }, "downloads": -1, "filename": "BioFlow-0.2.3.tar.gz", "has_sig": false, "md5_digest": "877a7fbedd1083632c8ad1cd213f91bd", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 104624, "upload_time": "2018-10-22T21:10:44", "url": "https://files.pythonhosted.org/packages/11/b8/ff85686c4c751c955c9746ed4454d6a5d6dc4e416a00903d13682e6f80c4/BioFlow-0.2.3.tar.gz" } ] }