{
"info": {
"author": "Stephen Bond",
"author_email": "steve.bond@nih.gov",
"bugtrack_url": null,
"classifiers": [
"Development Status :: 5 - Production/Stable",
"Environment :: Console",
"Intended Audience :: Science/Research",
"License :: Public Domain",
"Natural Language :: English",
"Operating System :: MacOS :: MacOS X",
"Operating System :: POSIX :: Linux",
"Programming Language :: Python :: 3",
"Programming Language :: Python :: 3.4",
"Programming Language :: Python :: 3.5",
"Programming Language :: Python :: 3 :: Only",
"Topic :: Scientific/Engineering",
"Topic :: Scientific/Engineering :: Bio-Informatics",
"Topic :: Software Development :: Libraries :: Python Modules"
],
"description": "|Build_Status| |Coverage_Status| |PyPi_version|\n\n|RDMCL|\n\n--------------\n\nRecursive Dynamic Markov Clustering\n===================================\n\nA method for identifying hierarchical orthogroups among homologous sequences\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\n'Orthology' is a term that was coined to describe 'homology via speciation'\u00b9, which is now a concept broadly used as a predictor of shared gene function among species\u00b2\u207b\u00b3. From a systematics perspective, orthology also represents a natural schema for classifying/naming genes coherently. As we move into the foreseeable future, when the genomes of all species on the planet have been sequences, it will be important to catalog the evolutionary history of all genes and label them in a rational way. Considerable effort has been made to programmatically identify orthologs, leading to excellent software solutions and several large public databases for genome-scale predictions. What is currently missing, however, is a convenient method for fine-grained analysis of specific gene families.\n\nIn essence, RD-MCL is an extension of conventional `Markov clustering `_-based orthogroup prediction algorithms like `OrthoMCL `_, with three key differences:\n\n1) The similarity metric used to describe the relatedness of sequences is based on multiple sequence alignments, not pair-wise sequence alignments or BLAST. This significantly improves the quality of the information available to the clustering algorithm.\n2) The appropriate granularity of the Markov clustering algorithm, as is controlled by the 'inflation factor' and 'edge similarity threshold', is determined on the fly. This is in contrast to almost all other methods, where default parameters are selected at the outset and imposed indiscriminately on all datasets.\n3) Differences in evolutionary rates among orthologous groups of sequences are accounted for by recursive rounds of clustering.\n\n\nGetting started\n~~~~~~~~~~~~~~~\n\n`Click here a full use-case tutorial `_\n\nRD-MCL is hosted on the Python Package Index, so the easiest way to get the software and most dependencies is via `pip`:\n\n.. code:: text\n\n $: pip install rdmcl\n $: rdmcl -setup\n\n\nThe program will complain if you don't run '-setup' before the first time you use it, so make sure you do that.\n\nThe input for RD-MCL is a sequence file in `any of the many supported formats `_, where the name of each sequence is prefixed with an organism identifier. For example:\n\n.. code:: text\n\n >ath-At4g02970\n MNVYIDTETGSSFSITIDFGETVLEIKEKIEKSQGIPVSKQILYLDGKALEDDLHKIDYM\n ILFESRLLLRISPDADPNQSNEQTEQSKQIDDKKQEFCGIQDSSESKKITRVMARRVHNI\n YSSLPAYSLDELLGPKYSATVAVGGRTNQVVQPTEQASTSGTAKEVLRDSDSPVEKKIKT\n NPMKFTVHVKPYQEDTRMIHVEVNADDNVEELRKELVKMQERGELNLPHEAFHLLGLGSS\n ETCPHQNRSEEPNQCPTILMSPHGLQAIVT\n >cel-CE08215_2\n QIFVKVLGVSYAFKIHREDTVFDIKNDIEHRHDIPQHSYWLSFSGKRLEDHCSIGDYNIQ\n KSSTITMYFRSG\n >cel-CE16986\n MKATTVKENEVKDDRKLSLNEMLRKRCLQVKNTKMKNSSMPKFQYFVRLNGKTRTLNVNA\n SDTVEQGKMQLCHNARSTRMSYGGKPLSDQITFGEYNISNNSTMDLHFRI\n >hsa-Hs20473312\n MQIFVKTLTGKTITLEVEPSDTIENVKAKIQGKEGIPPDQQRLIFAGKQLEDGRTLSDYN\n IQKESTLHLVLRLLVVLRKGRRSLTPLPRRISTRERRLSWLS\n >sce-YDR139c\n MIVKVKTLTGKEISVELKESDLVYHIKELLEEKEGIPPSQQRLIFQGKQIDDKLTVTDAH\n LVEGMQLHLVLTLRGGN\n\n\nThe above is a few sequences from `KOG0001 `_, coming from Arabidopsis (ath), C. Elgans (cel), Human (hsa), and yeast (sce). Note the hyphen (-) separating each identifier from the gene name. This is important! Make sure there are no spurious hyphens in any of the gene names, and if you can't use a hyphen for some reason, set the delimiting character with the `-ts` flag.\n\nOnce you have your sequences named correctly, simply pass it into rdmcl:\n\n.. code:: text\n\n $: rdmcl your_seq_file.fa\n\n\nA new directory will be created which will contain all of the accoutrement associated with the run, including a 'final_clusters.txt' file, which is the result you'll probably be most interested in.\n\nThere are several parameters you can modify; use `:$ rdmcl -h` to get a listing of them. They are also individually described in the `wiki `_.\n\nVideo of Evolution 2017 talk\n----------------------------\n\nI discuss the rationale and high level implementation details of RD-MCL\n\n|EvolutionVid|\n\nDistributing RD-MCL on a cluster\n--------------------------------\n\nD-MCL will parallelize creation of all-by-all graphs while searching MCL parameter space. Once a graph has been created it is saved in a database, thus preventing repetition of the 'hard' work if/when the same cluster is identified again at a later time. This means that the computational burden of a given run will tend to be high at the beginning of that run and decrease with time.\n\nTo spread the work out across multiple nodes during the 'hard' part, launch workers with the `launch_worker `_ script bundled with RD-MCL:\n\n.. code:: bash\n\n $: launch_worker --workdb \n\nBy default, `launch_worker` will use all of the cores it can find, so either sequester the entire node or pass in the `--max_cpus` flag to restrict it. I have run as many as 100 workers at a time, but be aware that this sort of pressure can lead to some instability (i.e., lost jobs from the queue and frozen master threads). Twenty workers is usually safe.\n\nNext, launch RD-MCL with the `--workdb` flag set to the same path you specified for `launch_worker`:\n\n.. code:: bash\n\n $: rdmcl --workdb \n\nRD-MCL will now send its expensive all-by-all work to a queue and wait around for one of the workers to do the calculations. You can keep track of how busy the workers are by running the `monitor script `_ in the same directory as the workers:\n\n.. code:: bash\n\n $: monitor_dbs\n\n Press return to terminate.\n #Master AveMhb #Worker AveWhb #queue #subq #proc #subp #comp #HashWait #IdWait ConnectTime\n 29 19.0 16 51.0 1 362 22 12 29 25 25 0.01\n\n\nAlso, you can send an arbitrary number of RD-MCL jobs to the same worker pool, no problem.\n\nReferences\n----------\n\n\u00b9 Fitch, W. M. `Distinguishing homologous from analogous proteins `_. *Systemat. Zool.* **19**, 99\u2013106 (1970).\n\n\u00b2 Gabald\u00f3n, T. and Koonin, E. V. `Functional and evolutionary implications of gene orthology `_. *Nature reviews. Genetics.* **14**, 360-366 (2013).\n\n\u00b3 Koonin, E. V. `Orthologs, paralogs, and evolutionary genomics `_. *Annual review of genetics.* **39**, 309-338 (2005).\n\n\nContact\n-------\n\nIf you have any comments, suggestions, or concerns, feel free to create an issue in the issue tracker or to get in touch with me directly at steve.bond@nih.gov\n\n.. |Build_Status| image:: https://travis-ci.org/biologyguy/RD-MCL.svg?branch=master\n :target: https://travis-ci.org/biologyguy/RD-MCL\n.. |Coverage_Status| image:: https://img.shields.io/coveralls/biologyguy/RD-MCL/master.svg\n :target: https://coveralls.io/github/biologyguy/RD-MCL?branch=master\n.. |PyPi_version| image:: https://img.shields.io/pypi/v/rdmcl.svg\n :target: https://pypi.python.org/pypi/rdmcl\n.. |RDMCL| image:: https://raw.githubusercontent.com/biologyguy/RD-MCL/master/rdmcl/images/rdmcl-logo.png\n :target: https://github.com/biologyguy/RD-MCL/wiki\n :height: 200 px\n.. |EvolutionVid| image:: https://img.youtube.com/vi/52STQpKv8j4/0.jpg\n :target: https://www.youtube.com/watch?v=52STQpKv8j4",
"description_content_type": null,
"docs_url": null,
"download_url": "",
"downloads": {
"last_day": -1,
"last_month": -1,
"last_week": -1
},
"home_page": "https://github.com/biologyguy/RD-MCL",
"keywords": "computational biology,bioinformatics,orthogroup,ortholog,biology,Markov clustering,MCL",
"license": "Public Domain",
"maintainer": "",
"maintainer_email": "",
"name": "rdmcl",
"package_url": "https://pypi.org/project/rdmcl/",
"platform": "",
"project_url": "https://pypi.org/project/rdmcl/",
"project_urls": {
"Homepage": "https://github.com/biologyguy/RD-MCL"
},
"release_url": "https://pypi.org/project/rdmcl/1.1.0/",
"requires_dist": null,
"requires_python": "",
"summary": "RDMCL recursively clusters groups of homologous sequences into orthogroups.",
"version": "1.1.0"
},
"last_serial": 3459692,
"releases": {
"1.0.0": [
{
"comment_text": "",
"digests": {
"md5": "c97c965b6cf3a466e543547ce3ccab69",
"sha256": "9e7057fc491aa74abf76b0890d9be65565ab78d3f5d0bf35b1b3a761dc82c0be"
},
"downloads": -1,
"filename": "rdmcl-1.0.0.tar.gz",
"has_sig": false,
"md5_digest": "c97c965b6cf3a466e543547ce3ccab69",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 39828,
"upload_time": "2017-03-14T18:41:57",
"url": "https://files.pythonhosted.org/packages/a0/15/fd3ea3bfb06222884cf366d6fd1ae9ae3e13d7baca3baea18add7cdbc3cc/rdmcl-1.0.0.tar.gz"
}
],
"1.0.1": [
{
"comment_text": "",
"digests": {
"md5": "71f52916069a1b86d24791f7c5688b10",
"sha256": "d44fba1873ab3e1e6449a2ab5e29727628c4f0c5353cc20c512e3dd2e9004960"
},
"downloads": -1,
"filename": "rdmcl-1.0.1.tar.gz",
"has_sig": false,
"md5_digest": "71f52916069a1b86d24791f7c5688b10",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 39856,
"upload_time": "2017-03-14T19:59:45",
"url": "https://files.pythonhosted.org/packages/41/2b/9525c334a8752be888963ebd8e01fca07a503222442ff9351230cecc14c1/rdmcl-1.0.1.tar.gz"
}
],
"1.0.2": [
{
"comment_text": "",
"digests": {
"md5": "011eb0793aa1852f89cc2143d53b17ea",
"sha256": "a3ec9df59074cfc087dff2ab82d4a1cbb66e648ada57258b8153bd8ae9f57a2b"
},
"downloads": -1,
"filename": "rdmcl-1.0.2.tar.gz",
"has_sig": false,
"md5_digest": "011eb0793aa1852f89cc2143d53b17ea",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 43616,
"upload_time": "2017-05-12T20:26:34",
"url": "https://files.pythonhosted.org/packages/10/ab/f684470c1bb7b2bdfd88b4cd28f41881ea28e2ffe091a0e669a3725881f1/rdmcl-1.0.2.tar.gz"
}
],
"1.0.3": [
{
"comment_text": "",
"digests": {
"md5": "ec7c698e7f586f6d98730c1340e33f85",
"sha256": "fb5ee3c07c87947eda244229c50b250102ad4ac31db2f4b65780f4fb1051d826"
},
"downloads": -1,
"filename": "rdmcl-1.0.3.tar.gz",
"has_sig": false,
"md5_digest": "ec7c698e7f586f6d98730c1340e33f85",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 44962,
"upload_time": "2017-05-16T18:36:28",
"url": "https://files.pythonhosted.org/packages/e6/d7/903d13bb1d030f83fdb61774159bee5d231a38935068a88113dd5d92dd9b/rdmcl-1.0.3.tar.gz"
}
],
"1.1.0": [
{
"comment_text": "",
"digests": {
"md5": "6192ebe7b6a995f43199bf08d82d2c0b",
"sha256": "80c8a94dda93ace73199d6573744c5ab534ce609ab25597465f17486aaa0f263"
},
"downloads": -1,
"filename": "rdmcl-1.1.0.tar.gz",
"has_sig": false,
"md5_digest": "6192ebe7b6a995f43199bf08d82d2c0b",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 71129,
"upload_time": "2018-01-03T19:04:22",
"url": "https://files.pythonhosted.org/packages/88/c8/b9ff1749bdf1d1792403cb7e4719ddfeedbca68f3cfa06b78f15b6327343/rdmcl-1.1.0.tar.gz"
}
]
},
"urls": [
{
"comment_text": "",
"digests": {
"md5": "6192ebe7b6a995f43199bf08d82d2c0b",
"sha256": "80c8a94dda93ace73199d6573744c5ab534ce609ab25597465f17486aaa0f263"
},
"downloads": -1,
"filename": "rdmcl-1.1.0.tar.gz",
"has_sig": false,
"md5_digest": "6192ebe7b6a995f43199bf08d82d2c0b",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 71129,
"upload_time": "2018-01-03T19:04:22",
"url": "https://files.pythonhosted.org/packages/88/c8/b9ff1749bdf1d1792403cb7e4719ddfeedbca68f3cfa06b78f15b6327343/rdmcl-1.1.0.tar.gz"
}
]
}