{
    "info": {
        "author": "Tim D.",
        "author_email": "",
        "bugtrack_url": null,
        "classifiers": [
            "License :: OSI Approved :: MIT License",
            "Operating System :: OS Independent",
            "Programming Language :: Python :: 3.7"
        ],
        "description": "\n# hhsearch-python\n\n## Author: Tim. D\n\n##### Current version: 1.12 - Python 3.7\n\nThis small package was made to handle data output by the software suite HHSearch. It was tested with output of the HHSearch version 1.5. The project's idea and draft itself originates from Dr. Schmidt and was done as a final task for one of his university modules.\n\n>HHsearch is a software suite for detecting remote homologues of proteins and generating high-quality alignments for homology modeling and function prediction.\n\n[HH-Suite Github](https://github.com/soedinglab/hh-suite) | [Quick Guide to HHSearch](http://ftp.tuebingen.mpg.de/pub/protevo/HHsearch/HHsearch1.5.01/HHsearch-guide.pdf) \n\n## Installation\n\nYou can simply install this package through your pip version. \n```sh\npip install hhsearch-python\n```\n[Find this package on PyPi!](https://pypi.org/project/hhsearch-python/)\n\n------\n### Requirements\nFor full functionalities you need the following packages as well.\n```\npandas==0.23.4\nmatplotlib==3.0.2\nnumpy==1.15.4\nPillow==6.0.0\npymol==0.1.0\n```\n\nExcept for PyMol, everything can easily be installed through pip install. PyMol needs to be installed separately, as well as being installed through `pip` to be used in your regular Python environment. \n\n##### ```pip``` installation:\n```pip install -c schrodinger pymol``` | ```pip install -c schrodinger pymol```\n\n| PyMol Version | Documentation |\n| ------ | ------ |\n| MAC | https://pymolwiki.org/index.php/MAC_Install | \n| Windows | https://pymolwiki.org/index.php/Windows_Install | \n| Linux | https://pymolwiki.org/index.php/Linux_Install |\n\n------\n# Wrapper -  Jupyter Notebook\nFor this whole module, a wrapper with a UI has been created as [Jupyter Notebook]([https://jupyter.org/](https://jupyter.org/)). \nYou just need to open the Jupyter Notebook in this repo with Jupyter, have your `.hhm` and `.hhs` files in subfolders somewhere in the same folder and install the module with `pip install hhsearch-python`. The whole notebook itself is pretty self-explanatory and gives you almost all the options of the functions in this module as a nice UI. \n### Recommended if you like automation and simplicity through UI usage.\n\n------\n# Functionalities \n## Broad information about Query & Hit\n\nThere are a small handful of functions within this package which can be used to generate a decent organized (visualized) output. However, for this all to work properly, you need to have all the needed `.hhm` as well as all `.hhs` files somewhere located in your current working directory. \n\n```python\n# lets first import all our functions from the module.\nfrom hhsearch_python import *\n\nhhs_file = \"data/hhs/d1e0ta1.hhs\" # path to your .hhs file.\n\n# first, we can use extract_HHSearch_data() to extract the whole HHSearch statistics into a pandas.DataFrame.\nhhs_hits_statistics = extract_HHSearch_data(hhs_file)\n```\n----\nHowever, we also want regular information about the Query itself, as well as about selected hits. For that we can use the two separate function `extract_HHSearch_main` for the query `.hhs` file, and `get_alignment_term` for a selected hit of the previous created `pandas.DataFrame`.\n```python\nquery_dict = extract_HHSearch_main(hhs_file)\n\n# As an example how this dict() output looks like: \nprint(query_dict)\n>> {'Query': 'Query d1e0ta1 b.58.1.1 (A:70-167) Pyruvate kinase (PK) {Escherichia coli [TaxId: 562]}',\n     'pdb_id': '1e0t',\n     'alignment_term': '/1e0t//A/70-167/CA', \n     'full_term': '/1e0t//A//CA', \n     'file_name': 'd1e0ta1'\n     }\n# alignment_term is needed for a proper PyMol alignment later down the road, \n# as well as full_term, ignoring the specific residues. \n\n# Let's get information about the second hit of the statistics from the .hhs file. \nhit_dict = get_alignment_term(hhs_hits_statistics, 2)\nprint(hit_dict)\n>> {'pdb_id': '2vgb', \n    'alignment_term': '/2vgb//A/160-261/CA', \n    'full_term': '/2vgb//A//CA', \n    'file_name': 'd2vgba1'}\n# except for the key \"Query\", get_alignment_term() outputs a structure identical dict() as extract_HHSearch_main()\n```\n------\n## Colorized Alignments - HTML formatted\n\nHaving selected the second alignment as our target-of-choice, we now desire more information about the alignment itself, so we extract the actual alignment with  `get_full_alignment`. It takes two arguments: the `.hhs` file of the query, as well as the number of the hit within the `.hhs` file, just like `get_alignment_term`. So preferably, one looks at the previously created pandas.DataFrame `hhs_hits_statistics` and choose a hit of interest from that. \n\n```python\n# This also creates a html formatted file in a separate folder - /alignments_highlighted/<query>/<NoX-name>.html\n# and also the same file as alignment.html in a folder called /lastrun/, all for your convenience. \nalignment_of_interest = get_full_alignment(hhs_file, 2)\n```\nThe HTML formatted output looks like the example below. As you can see, **h**elices and sh**e**ets are colorized. \n> <img src=\"https://raw.githubusercontent.com/MrRedPandabaer/hhsearch-python/master/example_alignment.jpg\" width=\"800\">\n\n\n\nAlso, if you desire this formatting to be applied on the whole `.hhs` file, then you can use the function `highlight_hhs_full(hhs_file)` and use the path of the desired `.hhs` file as an argument. It returns the given hhs file as a colorized HTML formatted string and also stores within a separate folder `/alignments_highlighted/<query-name>_full.html` as well as in the `/lastrun folder under the filename hhs_full_colorized.html`.\n\n```python\n# outputs the whole .hhs file colorized in the above-shown pattern. \nfull_hhs_colorized = highlight_hhs_full(hhs_file)\n```\n------\n## PyMol Alignments - Visualization | Animation\nHaving alignments organized and colorized is all useful, but we also want to actually create a more visual representation of the chosen alignment. For that, we can use the previous created dictionaries `query_dict` and `hit_dict` and give their information as arguments to the function `pymol_alignment()`. This function also returns the [rmsd value of atomic positions](https://en.wikipedia.org/wiki/Root-mean-square_deviation_of_atomic_positions) in [\u00c3\u00a5ngstr\u00c3\u00b6m](https://en.wikipedia.org/wiki/Angstrom). \n\n```python\n# building up the information from the query. \npdb_1 = query_dict.get(\"pdb_id\")\naln_term_1 = query_dict.get(\"pdb_id\")\nfull_term_1 = query_dict.get(\"pdb_id\")\n\n# buildung up the needed information from the chosen hit. \npdb_2 = hit_dict.get(\"pdb_id\")\naln_term_2 = hit_dict.get(\"pdb_id\")\nfull_term_2 = hit_dict.get(\"pdb_id\")\n\n# Also returns the RMSD values for the alignment. \nrmsd = pymol_alignment(pdb_1, \n                       pdb_2, \n                       aln_term_1, \n                       aln_term_2,\n                       full_term_1,\n                       full_term_2)\n\nprint(rmsd)\n>> (0.8026888370513916, 85, 5, 1.2078778743743896, 98, 160.0, 98)\n# In this example RMSD Value is about 0.803 \u00c3\u2026 over 85 C-\u00ce\u00b1lpha atoms. \n```\n\nThis will create two images in a different folder, as well as a `no_zoom.pse` file, which can be opened with PyMol, alongside with the necessary `.cif` files of the PDB entries into a separate folder called `/cif/`. \nAbout the pictures: One being zoomed-in into the area of `aln_term_1`whih is in our example:  `/1e0t//A/70-167/CA`, showing the area of interest, as well as a non-zoomed--in picture of `/1e0t//A//CA` in our example.\nThese images are stored into the `/lastrun/` folder, as well as in the folder `/PyMol_img/<pdb_1>/<pdb_1>-<pdb_2>/`.\n\n\n| Zoom | No-Zoom|\n| ------ | ------ |\n| <img src=\"https://raw.githubusercontent.com/MrRedPandabaer/hhsearch-python/master/main_zoom.png\" width=\"500\"> | <img src=\"https://raw.githubusercontent.com/MrRedPandabaer/hhsearch-python/master/no_zoom.png\" width=\"500\">\n\nHowever, `pymol_alignment` also has an option to output an animated picture instead of just static pictures, as well as the option of a frame multiplier, which needs to be an integer up to 4. But this option takes much more time to process, but of course, gives a _nicer_ output. Each frame multiplier basically doubles the time necessary to create the 360\u00c2\u00b0 view of the model.  The frames are stored into a subdir `/animation` in the `lastrun/` folder, alongside with the animated gif, as well as in the separate folder `PyMol_img/<pdb_1>/<pdb_1>-<pdb_2>/animation/<framemultiplier>`, while the animated gif is stored in the folder upper `/animation`.\n```python\n# as an example we will create an animated gif with the frame multiplier of 4\npymol_alignment(pdb_1,  \n        pdb_2,  \n        aln_term_1,  \n        aln_term_2,  \n        full_term_1,  \n        full_term_2,   \n        animation = True,   \n        framemultiplier= 4)\n ```\n\nBe aware, which each run, the lastrun folder's animation subfolder will always be cleared, so there's no confusion in case one runs one time with the animation feature, and in the next run without it. \n\n>  ```# Example for animation = True, framemultiplier = 4 of our example```\n> <img src=\"https://raw.githubusercontent.com/MrRedPandabaer/hhsearch-python/master/animation_zoom.gif\" width=\"500\">\n\n## Barplots of chosen spans\nAt last, we want to create a barplot of the frequencies of the amino acids within our query based on the [HHMs](https://en.wikipedia.org/wiki/Hidden_Markov_model), as well as in our chosen hit. For that, we first need to extract the frequencies of the `.hhm` file. This gives us a pandas.DataFrame with all the frequencies normed to one, calculated on information of the [HHSuit Wiki](https://github.com/soedinglab/hh-suite/wiki).```\n```\nFrequency calculation:  \nentry = -1000 * log_2(frequency) \nfrequency = 2^(-entry/1000)\n```\n```python\n# First we need to set the path of the two hhm files. Luckily, we stored the file_names before.\nquery_filename = f'data/hhm/{query_dict.get(\"file_name\")}.hhm'\nhit_filename = f'data/hhm/{hit_dict.get(\"file_name\")}.hhm'\n\nquery_frequencies = read_in_frequencies(query_filename)\nhit_frequencies = read_in_frequencies(hit_filename)\n\n```\n\nThe output DataFrame of the frequencies looks eventually like this:\n\n| Pos | AS | A |  C | D | E | F | G | (...) |\n|-|-|-|-|-|-|-|-|-|\n|1|M1|0.030019|0.000000|0.004325|0.014670|0.037111|0.012379|(...) |\n|(...)|(...)|(...)|(...)|(...)|(...)|(...)|(...)|(...) |\n\nHowever, having the frequencies is one thing, we also want to visualize them. For that, one can use the `plot_frequencies` function. This function takes in seven arguments in total, while only one is a requirement. You need to pass down the created pandas.DataFrame of the frequencies. If desired, the name of the created subfolder `barplots/<name>` can be changed. I personally recommend to use the filenames out of the `query_dict` and the `hit_dict` with `query_dict.get(\"file_name\")` and `hit_dict.get(\"file_name\")`. The threshold describes the minimal frequency which has to be hit, so it ends up in the plot. Recommended would be something around 0.1, which equals 10%. Next, we need to set the span_start and span_end for our plot. As an example, we will pick the 1st residue as start and 50th as the end of the span. The filename describes the name the file will be stored under in the `/lastrun` folder. Also, if one likes, one can add a title to the plot, however, I personally dislike this option, since it disturbs the cleaner look. Depending on the span you are choosing, this process can also take some decent time.\n\n```python\nplot_frequencies(query_frequencies, # the pd.DataFrames of the frequencies\n                 name = query_dict.get(\"file_name\"),  # desired output name. \n                 threshold = 0.1,  # 10% threshold\n                 span_start = 1,  # span starting @ 1\n                 span_end = 50,  # span ending @ 50\n                 filename = \"query_barplot.png\",  \n                 title = False\n                 )\n\nplot_frequencies(hit_frequencies, # the pd.DataFrames of the frequencies\n                 name = hit_dict.get(\"file_name\"),  # desired output name. \n                 threshold = 0.1,  # 10% threshold\n                 span_start = 1,  # span starting @ 1\n                 span_end = 50,  # span ending @ 50\n                 filename = \"hit_barplot.png\",  \n                 title = False\n                 )\n  ```\n\n| Query : d1e0ta1, 1-50, min. 10% | Hit : d2vgba1, 1-50, min. 10% | \n| - | - |\n|<img src=\"https://raw.githubusercontent.com/MrRedPandabaer/hhsearch-python/master/query_barplot.png\" width = \"400\">| <img src=\"https://raw.githubusercontent.com/MrRedPandabaer/hhsearch-python/master/hit_barplot.png\" width = \"400\">| \n\n-----\n\n#### Contact Information: \n| Telegram | Email \n|-|-|\n[<img src=\"https://telegram.org/img/t_logo.png\" width = 40>](https://t.me/MrRedPanda)| [<img src=\"https://www.witopia.com/wp-content/uploads/flat-email-icon.png\" width = 40>](mailto:\"tim@besonders.net\") \n\n\n",
        "description_content_type": "text/markdown",
        "docs_url": null,
        "download_url": "",
        "downloads": {
            "last_day": -1,
            "last_month": -1,
            "last_week": -1
        },
        "home_page": "https://github.com/MrRedPandabaer/hhsearch-python",
        "keywords": "",
        "license": "",
        "maintainer": "",
        "maintainer_email": "",
        "name": "hhsearch-python",
        "package_url": "https://pypi.org/project/hhsearch-python/",
        "platform": "",
        "project_url": "https://pypi.org/project/hhsearch-python/",
        "project_urls": {
            "Homepage": "https://github.com/MrRedPandabaer/hhsearch-python"
        },
        "release_url": "https://pypi.org/project/hhsearch-python/1.12/",
        "requires_dist": null,
        "requires_python": "",
        "summary": "A small package to deal with HHSearch files.",
        "version": "1.12"
    },
    "last_serial": 5168117,
    "releases": {
        "1.12": [
            {
                "comment_text": "",
                "digests": {
                    "md5": "9e4c9ea8579a95d5a95b2ba1bfa1b83d",
                    "sha256": "7d7f9fe879718c8e2d6040be13312de66db7db83570333213e3f9ba044005beb"
                },
                "downloads": -1,
                "filename": "hhsearch_python-1.12-py3-none-any.whl",
                "has_sig": false,
                "md5_digest": "9e4c9ea8579a95d5a95b2ba1bfa1b83d",
                "packagetype": "bdist_wheel",
                "python_version": "py3",
                "requires_python": null,
                "size": 16441,
                "upload_time": "2019-04-20T16:35:15",
                "url": "https://files.pythonhosted.org/packages/92/df/7f3d88521c9b7a1a3c8e30fb9ebaca4ee8aa31d4e1434a4168cc0abb8dbc/hhsearch_python-1.12-py3-none-any.whl"
            },
            {
                "comment_text": "",
                "digests": {
                    "md5": "108536d80ebd115158192853e296c659",
                    "sha256": "74b00bab9a43be4250bc9c158f06d87d88de1bf3fe29f29626c6f5fc39082acd"
                },
                "downloads": -1,
                "filename": "hhsearch_python-1.12.tar.gz",
                "has_sig": false,
                "md5_digest": "108536d80ebd115158192853e296c659",
                "packagetype": "sdist",
                "python_version": "source",
                "requires_python": null,
                "size": 19519,
                "upload_time": "2019-04-20T16:35:18",
                "url": "https://files.pythonhosted.org/packages/d5/9e/f554598520a4f67603db575547be89bf7077d2f326a9ee178a47971f39dd/hhsearch_python-1.12.tar.gz"
            }
        ]
    },
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "md5": "9e4c9ea8579a95d5a95b2ba1bfa1b83d",
                "sha256": "7d7f9fe879718c8e2d6040be13312de66db7db83570333213e3f9ba044005beb"
            },
            "downloads": -1,
            "filename": "hhsearch_python-1.12-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "9e4c9ea8579a95d5a95b2ba1bfa1b83d",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 16441,
            "upload_time": "2019-04-20T16:35:15",
            "url": "https://files.pythonhosted.org/packages/92/df/7f3d88521c9b7a1a3c8e30fb9ebaca4ee8aa31d4e1434a4168cc0abb8dbc/hhsearch_python-1.12-py3-none-any.whl"
        },
        {
            "comment_text": "",
            "digests": {
                "md5": "108536d80ebd115158192853e296c659",
                "sha256": "74b00bab9a43be4250bc9c158f06d87d88de1bf3fe29f29626c6f5fc39082acd"
            },
            "downloads": -1,
            "filename": "hhsearch_python-1.12.tar.gz",
            "has_sig": false,
            "md5_digest": "108536d80ebd115158192853e296c659",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 19519,
            "upload_time": "2019-04-20T16:35:18",
            "url": "https://files.pythonhosted.org/packages/d5/9e/f554598520a4f67603db575547be89bf7077d2f326a9ee178a47971f39dd/hhsearch_python-1.12.tar.gz"
        }
    ]
}