{ "info": { "author": "Alexander Lashkov, Sergey Rubinsky, Polina Eistrikh-Heller", "author_email": "alashkov83@gmail.com", "bugtrack_url": null, "classifiers": [ "Development Status :: 1 - Planning", "Environment :: Console", "Environment :: Win32 (MS Windows)", "Environment :: X11 Applications", "Intended Audience :: Education", "Intended Audience :: Science/Research", "License :: OSI Approved :: GNU General Public License v3 (GPLv3)", "Natural Language :: English", "Operating System :: OS Independent", "Programming Language :: Python :: 3.4", "Programming Language :: Python :: 3.5", "Programming Language :: Python :: 3.6", "Programming Language :: Python :: 3.7", "Programming Language :: Python :: 3 :: Only", "Programming Language :: Python :: Implementation :: CPython", "Topic :: Scientific/Engineering :: Bio-Informatics" ], "description": "Hydrocluster - Bioimformatics Tool\n==================================\n\nShort description\n-----------------\nThe program Hydrocluster is designed to determine the position, size and content of hydrophobic,\nhydrophilic and charged clusters in protein molecules. The program is based on the DBSCAN algorithm.\n\n**Keywords:** molecular modeling, bioinformatic, protein structure,\nhydrophobic core, hydrophobic cluster, DBSCAN\n\nInstallation\n------------\n\n```shell\npip install --upgrade hydrocluster\n```\n(or pip3 in distributive with default python 2 version)\n\nUser Interface\n--------------\n\n### Command line\n\nThe program is called with the command \u2019hydrocluster\u2019 and following\nparameters:\n```shell\nhydrocluster [-h][-i INPUT][-emin EMIN][-emax EMAX][-es ESTEP]\n[-smin SMIN][-smax SMAX][-g {tkgui,cli,testlist}][-o OUTPUT][-c CHAINS]\n[-rl RESLIST][-pt{hydropathy,menv,fuzzyoildrop,nanodroplet,aliphatic_core,hydrophilic,positive,negative}]\n[-pH PH][-sc {si_score,calinski,dbcv}][-nf][-na][-eps EPS][-min_samples MIN_SAMPLES]\n```\n#### Arguments:\n\n**-h, --help** \nshow help message and exit\n\n**-i INPUT, --input INPUT** \nInput file name (pdb,cif, ent, .hjson) - pdb file name, cif file name,\nindividual id pdb or hjson configuration file name for testlist\n\n**-emin EMIN, --emin EMIN** \nMinimum EPS value (\u212b). Default=3.0\n\n**-emax EMAX, --emax EMAX** \nMaximum EPS value (\u212b). Default=15.0\n\n**-es ESTEP, --estep ESTEP** \nStep of EPS (\u212b). Default=0.1\n\n**-smin SMIN, --smin SMIN** \nMinimum MIN SAMPLES. Default=3\n\n**-smax SMAX, --smax SMAX** \nMaximum MIN SAMPLES. Default=50\n\n**-g {tkgui,cli,testlist}, --gui** \nUI modes. Default=\u2019tkgui\u2019 (tkgui - graphic interface, cli - command\nline, testlist - using testlist module for data processing (see -i\nfilename.txt and -o filename of data base).\n\n**-o OUTPUT, --output OUTPUT** \nOutput directory name/file name or db name\n\n**-c CHAINS, --chains CHAINS** \nSelected chains. Default=None\n\n**-rl RESLIST, --reslist RESLIST** \nSelected amino acid residues. Default=None\n\n**-pt{hydropathy,menv,fuzzyoildrop,nanodroplet,aliphatic\\_core,hydrophilic,positive,negative}, --ptable** \nProperty table for weighting. Default=\u2019hydropathy\u2019\n\n**-pH PH** \npH value for calculatation of partial charges (positive or negative). Default=7.0\n\n**-sc {si\\_score,calinski,dbcv}, --score {si\\_score,calinski,dbcv}** \nScore coefficient. Default=\u2019calinski\u2019\n\n**-nf, --noise\\_filter** \nActivate filter of noise for scoring function (**Not recommended!!!**).\n\n**-na, --noauto** \nNo automatic mode.\n\n**-eps EPS** \nEPS value (\u212b). Default=3.0\n\n**-min\\_samples MIN\\_SAMPLES** \nMIN SAMPLES value. Default=3\n\n**At startup of hydrocluster without any parameters the program opens\nwith graphics interface.**\n\n### Examples:\n```shell\nhydrocluster -i 1atg.pdb -g cli -o 1atg\n```\nProcessing of file\\_name.pdb by command line interface and file\\_name\nfolder on return\n\nFile\\_name folder consists of file\\_name.py file for processing by\npymol, binary file (.dat) with saved session state, file\\_name.log file\nwith saved log-data and two png files with pictures.\n\n```shell\nhydrocluster -g testlist -i defaultt.hjson\n```\n\nReading of configuration file default.json and processing it by testlis. An example of a configuration file (with parameter comments) can be found at https://github.com/alashkov83/hydrocluster/blob/master/PDB_LISTS/default.hjson. Project\\_name.db file and project\\_name\\_data folder consisting of tree structure with data files will be returned.\n\nGraphical User Interface\n------------------------\n\nGUI was realized using Tkinter. It consists of a panel for selecting the\noperation mode, a window for graphical representation of clustering\nresults Cluster analysis, and a window for displaying log file.\n\n\nAt the beginning of working with the graphical interface, it is\nnecessary to select the desired hydrophobicity/hydrophilicity table in\nthe sub-window of the mode selection window, select the method for\nscoring of clustering in the metrics window and run on Manual (Manual mode ->\nStart) or automatic mode of operation (Auto mode -> Start) in one of\nthe underlying windows. In the automatic mode, the optimal parameters\neps and min\\_samples are selected by enumeration within the given\nboundaries and with the given step. Upon completion of the work in the\nautomatic mode, when you click Options -> Solution analysis -> Autotune colormap, you can\nget a graphical interpretation of the process of selecting the optimal\nvalues namely dependencies min\\_samples (eps) and min\\_samples (eps\u00b3).\nThe point corresponding to the optimal parameters is marked in\ncolor.\n\n\nThe Cluster analysis window presents a three-dimensional image of\nclusters selected by the program in a protein molecule. Appropriate mtnu\nsections allow you to make a coordinate grid in the image and get a brief comment\non the picture.\n\nThe Log window shows the numerical results of clustering, namely the\nnumber of chains and clusters, the percentage of noise and the optimal\nvalues of the hyperparameters (eps,min\\_samples) and the metric used.\nFurther study of the macromolecule can be carried out using the PyMol\nprogram (Options-> OpenPyMol).\n\n\n### Menu options:\n\n**File->**\n\nOpen File - opens PDB or mmCIF file on a disk \nOpen ID PDB - opens file from RSCB PDB data bank with ID PDB \nLoad State - loads program state, saved in file \nSave PyMOL script - saves script (.py) for further processing with PyMOL \nSave State - saves the current state of program in file \nSave Picture - saves the clustering result in png format file \nSave LOG - saves log file of the current session \nQuit - quit from the program\n\n**Options->** \nSelect clustering solution -> By local max (min) - shows other solutions of cluster analysis by local extrema of scoring for make choice its \nSelect clustering solution -> By max (min) values - shows other solutions of cluster analysis by values of scoring for make choice its \nSolution analysis -> Autotune colormap - shows graphs obtained as a result of clustering\nparameters selection. Marked point corresponds optimal values of eps and min\\_samples \nSolution analysis -> Autotune 3D-map - shows 3D-graph obtained as a result of clustering\nparameters selection \nSolution analysis -> Scan by parameter - scans some values of clustering solutions by one of the parameter (eps or min\\_samples) when second parameter are fixed \nOpen PyMol - opens PyMol for further data display \nAbout Protein - displays information about protein \nPlot settings -> Plot grid - makes coordinate grid in the Cluster analysis window \nPlot settings -> Plot legend - displays the brief description of the picture \nDmod (experimental, checkbox) - modification interpoint distances, instead clusterization points weights. moddist(u, w) = dist(u, w)/(w u)/2)), where w and u - weighting coefficients of points \nClear log - clears log information in the appropriate window \n\n**Help->** \nAbout - displays information about program, its license and version, and version of scikit-learn installed on the computer \nReadme - opens system web-browser and shows this paper \n\n\nTheory\n------\n\nHydrophobic cores and hydrophobic clusters play an important role in the\nfolding of the protein, being the skeleton for functionally important\namino acid residues of enzyme proteins. In the cases of ligands of\namphiphilic nature, the hydrophobic clusters themselves are included in\nthe functionally important regions of the molecules. The interaction\nwith them should be taken into account, for example, when evaluating\nmolecular docking solutions. Hydrocluster programm is based on\nensity-Based Spatial Clustering of Applications with Noise (DBSCAN) \\[1].\nAtomic coordinates, their type and description of amino acid residues\n(a. r.) and chemical groups \\[2] are loaded from a file of the PDB, mmCIF formats, or directly from\nthe Protein Data Bank. For each a.r. (or chemical group) from the table of relative\nhydrophobicity center of mass of non-H atoms is calculated. As weights\nin the cluster analysis, various tables of a.r. \\[3-7] (group \\[2]) hydrophobicity known in\nthe literature are used (see Table1 or Table2). Separately, for clustering\nelectrically charged amino acid residues, the function of calculating\nweighting coefficients as modules of partial charges of side groups\naccording to the formulas, which are derived from the\nHenderson-Hasselbach equation, is implemented \\[8]. Alternative: modification interpoint distances, \ninstead clusterization points weights. moddist(u, w) = dist(u, w)/(w u)/2)), \nwhere w and u - weighting coefficients of points. As hyperparameters DBSCAN uses t\nhe epsilon neighborhood radius (eps) and the minimum number of neighbors (min\\_samples). Eps is defined as the\nmaximum distance (in Angstrom (\u212b)) between the centers of mass of\nhydrophobic a.r. (or chemical groups) which are adjacent in one cluster. The\nmin\\_samples/eps\u00b3 ratio is proportional to the maximum distribution\ndensity of the centers of mass of the hydrophobic a.r. (or chemical groups). \nInternal clustering validation measures (descibed in Table 3) \nare used as the quality criteria for cluster analysis. For\nclusters of complex shape, it is better to use the silhouette\ncoefficient. At the same time, Calinski and Harabasz score, which uses\nthe distance between the element and the center of the cluster,\ncorrectly estimates the areas of clusters with the highest density. This\nareas are of interest from the point of view of the structural\norganization of proteins. A feature of the DBSCAN algorithm is the\nstrong dependence of clustering results on the parameters - eps and\nmin\\_samples. Hydrocluster implemented the selection of these parameters\nby simply iterating over their values at user-defined boundaries,\nfollowed by sorting the results according to the criterion of maximizing (minimizing)\nthe value of the corresponding estimated coefficient.\n\nTable 1. Normalised (by Alanine) hydrophobic weights of amino acid residues\n---------------------------------------------------------------------------\n\n| a.r. | Hydropathy \\[3] | Fuzzyoildrop \\[4] | MENV \\[5] | Nanodroplet \\[6] | Aliphatic \\[7] |\n|------|-----------------|-------------------|-----------|------------------|----------------|\n| ALA | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 |\n| VAL | 2.333 | 1.418 | 2.52 | 0.867 | 2.9 |\n| LEU | 2.111 | 1.369 | 2.64 | 0.904 | 3.9 |\n| ILE | 2.5 | 1.544 | 2.94 | 1.016 | 3.9 |\n| PHE | 1.556 | 1.583 | 2.58 | 0.963 | - |\n| TRP | - | 1.497 | 2.03 | 0.900 | - |\n| MET | 1.056 | 1.448 | 1.64 | 0.799 | - |\n| CYS | 1.389 | 1.748 | 3.48 | 0.588 | - |\n| THR | - | 0.538 | 1.82 | 0.424 | - |\n| SER | - | - | - | 0.372 | - |\n| GLY | - | - | - | 0.477 | - |\n\nTable 2. Hydrophobic weights of chemical (Rekker's) groups \\[2]\n-------------------------------------------------------------\n\n| Chemical radical | Hydrophobic weight |\n|------------------|--------------------|\n| C\u2086H\u2085 (phenyl) | 1.903 |\n| CH | 0.315 |\n| CH\u2082 | 0.519 |\n| CH\u2083 | 0.724 |\n| Indolyl | 1.903 |\n\nTable 3. Internal clustering validation measures\n------------------------------------------------\n\n| Scoring function | Range of values | Optimal value | Realisation | Paper |\n|-------------------------|-----------------|---------------|--------------|-----------|\n| Calinski-Harabasz score | 0 -> | maximum | scikit-learn | \\[9] |\n| Silhouette score | -1 ... 1 | maximum | scikit-learn | \\[10] |\n| S_Dbw | 0 -> | minimum | internal | \\[11, 12] |\n| | | | | |\n\nRequirements\n------------\n\n* Python 3.4 or higher (CPython only support)\n* psutil\n* progressbar2\n* matplotlib>=1.5.1\n* numpy>=1.14.2\n* scikit_learn>=0.19.1\n* biopython>=1.71\n* mmtf-python>=1.1.0\n* msgpack>=0.5.6\n\nTo easily browse through db files you will need a DB Browser for SQLite\n(). It is recommended to install Pymol\nmolecular viewer (version: 1.7+).\n\n**For MS Windows:** Use Anaconda () for Windows -\nit includes majority of the dependencies required. But mmtf-python and\nmsgpack not available on Anaconda - need to use pip. Define environment\nvariable PYTHONIOENCODING to UTF-8. For correct display of the Angstrom\nsymbol use console fonts including this symbol (for example, SimSun font\nfamily).\n\n\nReferences\n----------\n1. Ester, M., H. P. Kriegel, J. Sander, and X. Xu, In: Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining, Portland, OR, AAAI Press,226-231. 1996\n2. R. Mannhold, R. F. Rekker Perspectives in Drug Discovery and Design, 18: 1\u201318, 2000.\n3. J. Kyte, R. F. Doolittle. J Mol Biol. 1982. 157, 105-132.\n4. Brylinski M, Konieczny L, Roterman I. Int J Bioinform Res Appl. 2007;3(2):234-60.\n5. D. Bandyopadhyay .E. L. Mehler.Proteins 2008.72.646-659\n6. Zhu C. Q., Gao Y. R. , Li H. et.al.// Proc. NAS. 2016.113.12946.\n7. Ikai, A.J. 1980. J. Biochem. 88, 1895-1898.\n8. Dexter S. Moore BIOCHEMICAL EDUCATION 13(1) 1985.\n9. Calinski T., Harabasz J. // Communications in Statistics. 1974. 3 . 1.\n10. Rousseeuw P. Comput. Appl. Math. 1987. 20. 53.\n11. M. Halkidi and M. Vazirgiannis, in ICDM, Washington, DC, USA, 2001, pp. 187\u2013194.\n12. Tong, J. & Tan, H. J. Electron.(China) (2009) 26: 258. https://doi.org/10.1007/s11767-007-0151-8\n\n\n", "description_content_type": "text/markdown", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/alashkov83/hydrocluster", "keywords": "molecular modeling,bioinformatic,protein structure,hydrophobic core,hydrophobic cluster,DBSCAN", "license": "GPL v.3", "maintainer": "Alexander Lashkov", "maintainer_email": "", "name": "hydrocluster", "package_url": "https://pypi.org/project/hydrocluster/", "platform": "any", "project_url": "https://pypi.org/project/hydrocluster/", "project_urls": { "Homepage": "https://github.com/alashkov83/hydrocluster" }, "release_url": "https://pypi.org/project/hydrocluster/0.2.0/", "requires_dist": [ "hjson", "psutil", "scipy", "progressbar2", "matplotlib (>=1.5.1)", "numpy (>=1.14.2)", "scikit-learn (>=0.18.0)", "biopython (>=1.71)", "mmtf-python (>=1.1.0)", "msgpack (>=0.5.6)" ], "requires_python": ">=3.4", "summary": "Cluster analysis of hydrophobic or charged regions of macromolecules. The program is based on the DBSCAN algorithm.", "version": "0.2.0" }, "last_serial": 4782617, "releases": { "0.1.0": [ { "comment_text": "", "digests": { "md5": "4d7ce48af5e6a78a71e0ce835445f1c6", "sha256": "e07c1f85aef06c975f51fc79c36ccf6eb8a8f8c644174a9ce4f51176efd57013" }, "downloads": -1, "filename": "hydrocluster-0.1.0-py3-none-any.whl", "has_sig": false, "md5_digest": "4d7ce48af5e6a78a71e0ce835445f1c6", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": ">=3.4", "size": 45748, "upload_time": "2018-10-15T08:02:16", "url": "https://files.pythonhosted.org/packages/0a/ae/47e23604385202303930ae5cb223156ca8f368d856890faf74afbf3f9064/hydrocluster-0.1.0-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "5fc5baabd0bf79e54c4d543d51de04b6", "sha256": "af1f11f847a8117f8ce636cba22629abc4041c43c9d7244411416979cdd20b86" }, "downloads": -1, "filename": "hydrocluster-0.1.0.tar.gz", "has_sig": false, "md5_digest": "5fc5baabd0bf79e54c4d543d51de04b6", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.4", "size": 30511, "upload_time": "2018-10-15T08:02:18", "url": "https://files.pythonhosted.org/packages/3b/c2/938b005401e2da0d23045023c96aa14e92249476c61cc90a9ed126be56b3/hydrocluster-0.1.0.tar.gz" } ], "0.2.0": [ { "comment_text": "", "digests": { "md5": "5f623a02386dbf57d0763afa74f32e7b", "sha256": "6824a8d06f1df3fde91122a8460f70874664e7898c034bbd91ce6cf0b4b67b22" }, "downloads": -1, "filename": "hydrocluster-0.2.0-py3-none-any.whl", "has_sig": false, "md5_digest": "5f623a02386dbf57d0763afa74f32e7b", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": ">=3.4", "size": 56093, "upload_time": "2019-02-05T15:30:21", "url": "https://files.pythonhosted.org/packages/00/0d/786b1d0bb4ad598da09f601a4c29fb9f48b454a1febbb35d552dcf358963/hydrocluster-0.2.0-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "bf1040ebf4b6fbd49fe6c9bab94162ee", "sha256": "560e1dba88ad35cf7bb7ba0d86f68215ef0c0781f2e68981408ef9bef5902c22" }, "downloads": -1, "filename": "hydrocluster-0.2.0.tar.gz", "has_sig": false, "md5_digest": "bf1040ebf4b6fbd49fe6c9bab94162ee", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.4", "size": 40388, "upload_time": "2019-02-05T15:30:23", "url": "https://files.pythonhosted.org/packages/76/cb/bb2cf400bee5a37e2017a9ce1330c4f8665c15f1159967b09b40b437780f/hydrocluster-0.2.0.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "5f623a02386dbf57d0763afa74f32e7b", "sha256": "6824a8d06f1df3fde91122a8460f70874664e7898c034bbd91ce6cf0b4b67b22" }, "downloads": -1, "filename": "hydrocluster-0.2.0-py3-none-any.whl", "has_sig": false, "md5_digest": "5f623a02386dbf57d0763afa74f32e7b", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": ">=3.4", "size": 56093, "upload_time": "2019-02-05T15:30:21", "url": "https://files.pythonhosted.org/packages/00/0d/786b1d0bb4ad598da09f601a4c29fb9f48b454a1febbb35d552dcf358963/hydrocluster-0.2.0-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "bf1040ebf4b6fbd49fe6c9bab94162ee", "sha256": "560e1dba88ad35cf7bb7ba0d86f68215ef0c0781f2e68981408ef9bef5902c22" }, "downloads": -1, "filename": "hydrocluster-0.2.0.tar.gz", "has_sig": false, "md5_digest": "bf1040ebf4b6fbd49fe6c9bab94162ee", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.4", "size": 40388, "upload_time": "2019-02-05T15:30:23", "url": "https://files.pythonhosted.org/packages/76/cb/bb2cf400bee5a37e2017a9ce1330c4f8665c15f1159967b09b40b437780f/hydrocluster-0.2.0.tar.gz" } ] }