{ "info": { "author": "Daniel Hermansson", "author_email": "hedani@chalmers.se", "bugtrack_url": null, "classifiers": [ "Development Status :: 3 - Alpha", "Intended Audience :: Science/Research", "License :: OSI Approved :: MIT License", "Programming Language :: Python :: 2", "Programming Language :: Python :: 2.7", "Topic :: Scientific/Engineering :: Bio-Informatics" ], "description": "Menace\n======\n\nThis bundle of software is a basic implementation of the algorithm for\nextracting Peak-to-Trough Ratios from Metagenomic data, as first\ndescribed in `(Korem et. al, Science,\n2015) `__.\n\nInstallation:\n-------------\n\nPip\n~~~\n\nMake sure that \"pip\" is the PyPi command of your *python2* installation,\nthen:\n\n.. code:: bash\n\n pip install menace\n\nGit\n^^^\n\n.. code:: bash\n\n git clone git@github.com:zertan/Menace.git\n cd Menace\n python setup.py install\n\nThis should install the below *python* dependencies. The other\ndependencies have to be installed manually (if you have questions about\nthis I suggest you consult your cluster IT help desk).\n\nThe software has been tested on the \"hebbe\" cluster at\n`C3SE `__ which uses the \"slurm\" system for resource\nmanagement (thus slurm is the only queueing system currently supported).\n\nDependencies:\n~~~~~~~~~~~~~\n\n::\n\n Python2:\n numpy\n scipy\n pandas\n biopython\n matplotlib\n xmltodict\n configparser\n lmfit\n newick\n Jinja2\n doric\n -e git+https://github.com/PathoScope/PathoScope.git#egg=pathoscope\n\n`samtools `__\n\n`bamtools `__\n\n`bowtie2 `__\n\n`Pathoscope\n2.0 `__\n(should be installed by the above pip command but make sure 'pathoscope\nID' is accessible in the shell, ie. is on the system path)\n\n`parallel `__\n\n`DoriC `__ is a databse of\nchromosome origin locations (OriCs) which is a (recommended) optional\ndependency for the pipeline. Please visit the link and enter your e-mail\nto download.\n\nUsage\n-----\n\nYou can get an overview of the menace functionality by running\n``menace -h``.\n\n1. Initialize a project in current directory by running ``menace init``.\n Identify a set of NCBI genome reference accession numbers and put\n them in \"./searchStrings\" (or use the default one which includes a\n *minimal* set of references to bacteria common in the human gut).\n\n2. Identify a metagenomic cohort of interest (download manually or add\n URLs as described below) and add to the Data folder. Supported input:\n raw/gzipped/bzipped \".fastq\" files.\n\n3. Add information to the ``project.conf`` file.\n\n4. Edit ``loadmodules.sh`` to include the **python2** module of the\n cluster (or comment out the lines if python2 is accessible by\n default).\n\n5. Run ``menace full`` (use \"nohup {cmd} &\" to keep alive after logout\n if on a cluster login node).\n\n6. Wait for job to complete. Run ``menace collect`` in project\n directory.\n\nNotes\n^^^^^\n\nThe menace script is a common utility for all parts of the pipeline\nincluding downloading of references and metagenomic data, bulding a\nreference index, setting up the necessary file structure and submitting\nto slurm. Hence, all configuration is intended to be set up in\nproject.conf (please see ``bin/project.conf.example`` for an example).\n\nThe default 'searchStrings' will most probably not fit your purposes but\nis only an example. A more comprehensive Reference library will yield\nhigher coverage and more accurate values. A more comprehensive list of\nhuman gut bacteria is available at 'extra/referenceACClong.txt'.\n\nDirectory structure (*example*)\n-------------------------------\n\nWith the above usage example the path structure(s) will look something\nlike below.\n\n::\n\n $DATA_PATH\n \u251c \"Sample01\" (eg. ERR525688)\n . \u251c {sample01_1.fastq.gz}\n . \u2514 {sample01_2.fastq.gz} paired metagenomic reads\n .\n\n $REF_PATH\n \u251c Index\n | \u2514 {REF_NAME.*.bt2l} bowtie2 index files\n \u251c Fasta\n | \u2514 {accession.fasta}\n \u251c Headers\n | \u2514 {accession.xml} xml files containing extra genome references info\n \u2514 taxIDs.txt\n\n $DORIC_PATH\n \u251c bacteria_record.dat\n \u2514 bacteria_seq.fas\n\n $OUTPUT_PATH\n \u251c \"Sample01\"\n . \u251c depth\n . | \u2514 {accession.depth} coverage files for each reference\n . \u251c log\n | \u2514 {accession.log} output logs from piecewiseFit \n \u251c npy\n | \u2514 {accession_OriC_TerC.npy} numpy files with origin/terminus locations and relative C periods\n \u251c png\n | \u2514 {accession_fit.png} images of piecewise fit of the smoothed coverage\n \u2514 accession-sam-report.tsv Pathoscope2 reassignment report\n\nContents\n--------\n\nBelow follows a description of the main scripts in the package.\n\njobscript\n^^^^^^^^^\n\nA submit script for sending a batch job to slurm for parallel processing\non a computing cluster.\n\n**input:** none\n\n**output:** directory structure as specified in \"project.conf\"\n\nmainBuild.sh\n^^^^^^^^^^^^\n\nThe main build script with commands intended to be executed on the\ncluster.\n\n**input:** none\n\n**output:** temporary paths and files on compute nodes\n\nPTRMatrix.py\n^^^^^^^^^^^^\n\nTraverses the specified directory generated by mainBuild.sh and\nassembles information from each sample into tabular form (eg. averages\norigin locations from many samples for a better estimate).\n\n**input:** $OUTPUT\\_PATH, $DORIC\\_PATH, $REF\\_PATH, bin/accLoc.csv\n\n**output:** Abundance.csv, PTR.csv, DoublingTime.csv, Header.csv\n\npiecewiseFit.py\n^^^^^^^^^^^^^^^\n\nImplements the piecewise linear fit and prior checks on the generated\ndepth files to filter out those instances in which enough data was\ngenerated to produce a reliable coverage signal for estimating\nreplication origins. This data can be used further on, once those has\nbeen estimated using the full cohort, to produce PTR-vaules for each\nsample.\n\n**input:** {reference.depth}\n\n**output:** {reference\\_OriC.npy}, {reference\\_TerC.npy},\n{reference\\_coverage.png}, {reference\\_fit.log}\n\nfetchSeq.py\n^^^^^^^^^^^\n\nThis utility can be used to download '.fasta' reference files from the\nNCBI servers.\n\n**input:** searchStrings.txt,\n\n**output:** {reference.fasta}, {reference.xml}, taxIDs.txt\n", "description_content_type": null, "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://www.github.com/SysBio/menace/", "keywords": "systems biology research", "license": "MIT", "maintainer": "", "maintainer_email": "", "name": "menace", "package_url": "https://pypi.org/project/menace/", "platform": "", "project_url": "https://pypi.org/project/menace/", "project_urls": { "Homepage": "https://www.github.com/SysBio/menace/" }, "release_url": "https://pypi.org/project/menace/0.1.3/", "requires_dist": null, "requires_python": "", "summary": "A metagenomics pipeline to estimate relative cell periods.", "version": "0.1.3" }, "last_serial": 2530719, "releases": { "0.1.3": [ { "comment_text": "", "digests": { "md5": "faa6987c2f73a1bfc9aa9d5a90359c63", "sha256": "017fe228c33b849486dab7efc823580ca1fd0881d6473c10fb5586f39b5557a2" }, "downloads": -1, "filename": "menace-0.1.3.tar.gz", "has_sig": false, "md5_digest": "faa6987c2f73a1bfc9aa9d5a90359c63", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 3577931, "upload_time": "2016-12-20T16:34:59", "url": "https://files.pythonhosted.org/packages/ce/77/9f89b9fb4931f157a91d38d0895e40c2bd97555d37694345c74fd5fbdd85/menace-0.1.3.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "faa6987c2f73a1bfc9aa9d5a90359c63", "sha256": "017fe228c33b849486dab7efc823580ca1fd0881d6473c10fb5586f39b5557a2" }, "downloads": -1, "filename": "menace-0.1.3.tar.gz", "has_sig": false, "md5_digest": "faa6987c2f73a1bfc9aa9d5a90359c63", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 3577931, "upload_time": "2016-12-20T16:34:59", "url": "https://files.pythonhosted.org/packages/ce/77/9f89b9fb4931f157a91d38d0895e40c2bd97555d37694345c74fd5fbdd85/menace-0.1.3.tar.gz" } ] }