{ "info": { "author": "Darcy Jones", "author_email": "darcy.a.jones@postgrad.curtin.edu.au", "bugtrack_url": null, "classifiers": [ "Development Status :: 4 - Beta", "License :: OSI Approved :: BSD License", "Programming Language :: Python :: 3", "Programming Language :: Python :: 3.6", "Programming Language :: Python :: 3.7" ], "description": "# CATAStrophy\n\nCATAStrophy is a classification method for describing lifestyles/trophic characteristics\nof filamentous plant pathogens using carbohydrate-active enzymes (CAZymes).\nThe name CATAStrophy is a backronym portmanteau hybrid where \"CATAS\" means\nCAZyme Assisted Training And Sorting.\n\nCATAStrophy takes HMMER3 files from searches against [dbCAN](http://csbl.bmb.uga.edu/dbCAN/)\nas input and returns pseudo-probabilities (See details) of trophic class memberships for each file.\n\n\n## Installing\n\nCATAStrophy is a python program which can be used as a module or via a\ncommand-line interface.\n\n\nNOTE: Because the repository currently private the following `pip` command won't work.\nUse the methods to install from git instead for now.\n\nYou can install from Pypi using pip:\n\n```bash\npip3 install --user catastrophy\n```\n\nYou can also install directly from the git repository.\n\n```bash\npip3 install --user git+git@bitbucket.org:ccdm-curtin/catastrophy.git\n```\n\n```bash\ngit clone https://@bitbucket.org/ccdm-curtin/catastrophy.git ./catastrophy\ncd catastrophy\npip install --user .\n# Or use pip install -e . if you want to edit the modules.\n```\n\nCATAStrophy is tested to work with python 3.5+, and it depends on\n[numpy](http://www.numpy.org/).\nThe pip commands above should install these for you automatically but if you\nuse any of these packages yourself it's a good idea to install CATAStrophy in\na python [virtual environment](https://virtualenv.pypa.io/en/stable/)\n(You should probably use these when installing most python packages).\n\nUsing `virtualenv` is pretty easy, here's a basic rundown of the workflow.\n\n```bash\n# If it isn't installed already run one of these\n# Try to use the system package managers if possible to avoid mixing up system dependencies.\nsudo pip3 install virtualenv\nsudo apt install python3-virtualenv # Ubuntu and probably Debian\nsudo dnf install python3-virtualenv # Fedora 24+\n\n# Change dir to where you want the env to live (usually a project dir).\ncd my_project\n\n# Create a virtualenv in a folder ./env\n# python3.7 can be substituted with you version of python.\npython3.7 -m venv env\n```\n\nSo now the virtualenv is set up, now you can load it and install CATAStrophy\n\n```bash\n# Loads the virtualenv (essentially changes PYTHONPATH and some other env variables).\nsource env/bin/activate\n\npip3 install catastrophy\n# or\npip3 install git+https://@bitbucket.org/ccdm-curtin/catastrophy.git\n# or\ngit clone https://@bitbucket.org/ccdm-curtin/catastrophy.git ./\npip install .\n```\n\n## Using CATAStrophy\n\nThe command line interface is pretty simple, you just need to supply the input\nfiles and where to put the output. The input files should be the output\nfrom [HMMER3](http://hmmer.org/) `hmmscan` as either the raw HMMER3 text\noutput or the \"domain table\" output provided by the `--domtblout` flag.\nParsing the domain table output is about twice as fast as the regular text\noutput, so if you have lots of files to run it might be worth saving those files.\n\nThe easiest way to get a file like this is to annotate your proteome using\nthe dbCAN online tool at , and\nsave the HMMER3 raw text results locally.\nAssuming that you have this file locally you can run CATAStrophy like so:\n\n```bash\ncatastrophy -i my_dbcan_results.txt -f hmmer -o my_catastrophy_results.csv\n```\n\nThe output will be a tab-delimited file (which you can open in excel) with\nthe first row containing column headers and subsequent rows containing a\nlabel and the pseudo-probabilities of membership to each trophic class.\nThe `-f/--format` flag is optional and defaults to `hmmer`, but if you want to\nuse domain table output, you should include the flag `-f domtab` (run\n`catastropy --help` for more options).\n\nNOTE: In this document I use the `.csv` extension to mean any plain text tabular\nformat because excel doesn't recognise alternate extensions like `.tsv`.\nThe domain table output is actually space delimited and the catastrophy\noutput is a tab delimited file.\n\nBy default the filenames are used as the label but you can explicitly specify\na label using the `-l/--label` flag. The output from the command above will\nhave two lines, one containing the column headers and the other containing\nresults for the file `my_dbcan_results.txt` which will have the label\n\"my_dbcan_results.txt\".\n\nTo give it a nicer label you can run this.\n\n```bash\ncatastrophy -i my_dbcan_results.txt -l prettier_label -o my_catastrophy_results.csv\n```\n\nWhich would give the output line for `my_dbcan_results.txt` the label \"prettier_label\".\nUnfortunately, labels cannot contain spaces unless you explicitly escape them (quotes won't work).\n\nIf you want to run multiple files at the same time you just need to separate the files by spaces, like this:\n\n```bash\ncatastrophy -i dbcan_1.txt dbcan_2.txt -o my_catastrophy_results.csv\n\n# Or equivalently\ncatastrophy -i dbcan_*.txt -o my_catastrophy_results.csv\n```\n\nThe output from this will contain three rows, one containing the headers and\nthe other two containing the results for the files `dbcan_1.txt` and `dbcan_2.txt`\nwhich will be labelled by the filenames.\nNote that standard bash \"globbing\" patterns expand into a space delimited array,\nso you can easily use \"*\" or subshells if you like (eg. `$(find . -type f -name *.txt)` etc).\nTo explicitly label these files you can again supply the label flag with the space separated labels.\n\n```bash\ncatastrophy -i dbcan_1.txt dbcan_2.txt -l label1 label2 -o my_catastrophy_results.csv\n```\n\nNote that if you do use the label flag, the number of labels **must** be the same as the number of input files.\n\nBoth the input and output flags support standard input/output (they are actually the default values).\nSo you could change the single file commands from above to:\n\n```bash\ncat my_dbcan_results.txt | catastrophy -l prettier_label > my_catastrophy_results.csv\n\n# or using the convention for \"-\" representing stdin/stdout\n\ncat my_dbcan_results.txt | catastrophy -i - -l prettier_label -o - > my_catastrophy_results.csv\n```\n\nIf you don't spefify a label for stdin input the label will be \"\".\n\n\nIf you _really_ want to you could also mix and match stdin and filepaths using \"-\" to specify stdin.\n\n```bash\ncat dbcan_2.txt | catastrophy -i dbcan_1.txt - -o my_catastrophy_results.csv\n```\n\nSo the second result row in the output would come from stdin.\nOf course, if you cat multiple files into catastrophy they will all be treated\nas a single file so it doesn't usually make sense to use stdin with multiple inputs.\n\n\nFinally because dbCAN is updated as new CAZyme classes are created, merged,\nor split, catastrophy has a final parameter that allows you to select the\nmodel trained on a specific dbCAN version (starting from version 5).\n\nTo specify the version of the model to use, just include the `-m/--model`\nflag with one of the valid options (see `catastrophy -h` for the options).\n\n```bash\ncatastrophy -m v5 -i my_dbcan_results.txt -o my_catastrophy_results.csv\n```\n\nThe model versions just reflect the version of dbCAN that the model was trained against.\n\n\n## Running dbCAN locally\n\nIf you have lots of proteomes to run (or you're a command-line snob like me)\nthen you probably don't want to use the web interface.\nIn that case you can run the dbCAN pipeline locally using [HMMER](http://hmmer.org/).\n\nThe instructions for running the HMMER and the dbCAN parser can be found here\n in the readme.txt file.\nIt isn't the most friendly documentation though so i'll repeat it here\n(assuming that you've installed [HMMER](http://hmmer.org/) and are using a unix-like OS).\n\nFirst download the HMMs and the parser script.\n\n```bash\ncd \n\nmkdir -p ./data\nwget -qc -P ./data http://csbl.bmb.uga.edu/dbCAN/download/dbCAN-fam-HMMs.txt.v5\n\n# Optional, useful for summarising your dbCAN \n# results but not necessary for CATAStrophy.\nwget -qc -P ./data http://csbl.bmb.uga.edu/dbCAN/download/hmmscan-parser.sh\n```\n\nNote that I'm downloading a specific version of the database rather that just the latest one.\nNow we can convert the file containing HMM definitions into a HMMER database.\n\n```bash\nhmmpress ./data/dbCAN-fam-HMMs.txt.v5\n```\n\nNow we can run HMMER to find matches to the dbCAN HMMs.\nFor demonstration, we'll save both outputs.\n\n```bash\nhmmscan --domtblout my_fasta_hmmer.csv ./data/dbCAN-fam-HMMs.txt.v5 my_fasta.fasta > my_fasta_hmmer.txt\n```\n\nThe domain table is now in the file `my_fasta_hmmer.csv` and the plain hmmer\ntext output is in `my_fasta_hmmer.txt`.\nEither one of these files is appropriate for use with CATAStrophy, (just\nremember to specify the `--format` flag.\nIn practise, you'll probably only need the domain table output in which case you\ncould just redirect the standard output to `/dev/null` to delete it.\n\nIf you want to look at the dbCAN matches, you can use the summary script from\ndbCAN.\nThis script takes the domain table output from hmmscan as input and returns a new tabular file.\n\n```bash\nbash ./data/hmmscan-parser.sh my_fasta_hmmer.csv > my_fasta_dbcan.csv\n```\n\nAnd that's it!\n\n\n# Details\n\nSome extra details about the CATAStrophy method, including the classes used and the calculation of the RCD.\n\n\n", "description_content_type": "text/markdown", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://bitbucket.org/ccdm-curtin/catastrophy", "keywords": "fungi machine-learning bioinformatics", "license": "BSD", "maintainer": "", "maintainer_email": "", "name": "catastrophy", "package_url": "https://pypi.org/project/catastrophy/", "platform": "", "project_url": "https://pypi.org/project/catastrophy/", "project_urls": { "Homepage": "https://bitbucket.org/ccdm-curtin/catastrophy" }, "release_url": "https://pypi.org/project/catastrophy/0.0.1/", "requires_dist": [ "numpy (>=1.15.0)", "biopython (>=1.70)", "check-manifest ; extra == 'dev'", "scipy ; extra == 'dev'", "scikit-learn ; extra == 'dev'", "jupyter ; extra == 'dev'", "matplotlib ; extra == 'dev'", "seaborn ; extra == 'dev'", "coverage ; extra == 'test'", "pytest ; extra == 'test'" ], "requires_python": "", "summary": "A fungal trophy classifier based on CAZymes", "version": "0.0.1" }, "last_serial": 5185979, "releases": { "0.0.1": [ { "comment_text": "", "digests": { "md5": "05f6efda277e24fcd9d7a159e911f3c8", "sha256": "2673a306c416066de49a57c72afeba2a9f924917b5c811e1ac8b721efc7d8ab6" }, "downloads": -1, "filename": "catastrophy-0.0.1-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "05f6efda277e24fcd9d7a159e911f3c8", "packagetype": "bdist_wheel", "python_version": "py2.py3", "requires_python": null, "size": 214960, "upload_time": "2019-04-25T04:06:47", "url": "https://files.pythonhosted.org/packages/35/b6/6f0e72d28235f71b68f628fdf86e09de1aa303e57dc2acdc0d76c70b247c/catastrophy-0.0.1-py2.py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "a2ae29ef9d7c76119683862e2d7e6051", "sha256": "820ba8b60ef6027f11da6c63fe94ca54c7a2b5fbed8c7be1fe9a897086798353" }, "downloads": -1, "filename": "catastrophy-0.0.1.tar.gz", "has_sig": false, "md5_digest": "a2ae29ef9d7c76119683862e2d7e6051", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 23876, "upload_time": "2019-04-25T04:06:49", "url": "https://files.pythonhosted.org/packages/16/5e/d5990373a52e715e8c758bbd3d00a2accebf9d05c82db85702691374fd56/catastrophy-0.0.1.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "05f6efda277e24fcd9d7a159e911f3c8", "sha256": "2673a306c416066de49a57c72afeba2a9f924917b5c811e1ac8b721efc7d8ab6" }, "downloads": -1, "filename": "catastrophy-0.0.1-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "05f6efda277e24fcd9d7a159e911f3c8", "packagetype": "bdist_wheel", "python_version": "py2.py3", "requires_python": null, "size": 214960, "upload_time": "2019-04-25T04:06:47", "url": "https://files.pythonhosted.org/packages/35/b6/6f0e72d28235f71b68f628fdf86e09de1aa303e57dc2acdc0d76c70b247c/catastrophy-0.0.1-py2.py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "a2ae29ef9d7c76119683862e2d7e6051", "sha256": "820ba8b60ef6027f11da6c63fe94ca54c7a2b5fbed8c7be1fe9a897086798353" }, "downloads": -1, "filename": "catastrophy-0.0.1.tar.gz", "has_sig": false, "md5_digest": "a2ae29ef9d7c76119683862e2d7e6051", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 23876, "upload_time": "2019-04-25T04:06:49", "url": "https://files.pythonhosted.org/packages/16/5e/d5990373a52e715e8c758bbd3d00a2accebf9d05c82db85702691374fd56/catastrophy-0.0.1.tar.gz" } ] }