{ "info": { "author": "Pavel Polishchuk", "author_email": "pavel_polishchuk@ukr.net", "bugtrack_url": null, "classifiers": [ "License :: OSI Approved :: BSD License", "Operating System :: OS Independent", "Programming Language :: Python :: 3", "Topic :: Scientific/Engineering :: Bio-Informatics", "Topic :: Scientific/Engineering :: Chemistry" ], "description": "# CReM - chemically reasonable mutations\n\n**CReM** is an open-source Python framework to generate chemical structures using a fragment-based approach.\n\nThe main idea behind is similar to matched molecular pairs considering context that fragments in the identical context are interchangeable. Therefore, one can create a database of interchangeable fragments and use it for generation of chemically valid structures.\n\n**Features:** \n1) Generation of a custom fragment database \n2) Three modes of structure generation: MUTATE, GROW, LINK \n3) Context radius to consider for replacement \n4) Fragment size to replace and the size of a replacing fragment \n5) Protection of atoms from modification (e.g. scaffold protection) \n6) Replacements with fragments occurred in a fragment database with certain minimal frequency \n7) Make randomly chosen replacements up to the specified number \n\n## Installation\n\nSeveral command line utilities will be installed to create fragment databases and `crem` module will become available in Python imports to generate structures.\n\nFrom pypi package\n```text\npip install crem\n```\n\nManually from repository\n```text\ngit clone https://github.com/DrrDom/crem\ncd crem\npython3 setup.py sdist bdist_wheel\npip install dist/crem-0.1-py3-none-any.whl\n```\n\nUninstall\n```text\npip uninstall crem\n```\n\n## Dependencies\n\n`crem` requires `rdkit>=2017.09`. To run the guacamol test `guacamol` should be installed.\n\n## Generation of a fragment database\n\nFragmentation of input structures:\n```text\nfragmentation -i input.smi -o frags.txt -c 32 -v\n```\n\nConvert fragments to standardized representation of a core and a context of a given radius:\n```text\nfrag_to_env -i frags.txt -o r3.txt -r 3 -c 32 -v\n```\n\nRemove duplicated lines in the output file and count frequency of occurrence of fragemnt-context pairs. These (`sort` and `uniq`) are `bash` utilities but since Win10 is Linux-friendly that should not be a big issue for Win users to execute them\n```text\nsort r3.txt | uniq -c > r3_c.txt\n```\n\nCreate DB and import the file to a database table\n```text\nenv_to_db -i r3_c.txt -o fragments.db -r 3 -c -v\n```\n\nLast three steps should be executed for each radius. All tables can be stored in the same database.\n\n## Structure generation\n\nImport necessary functions from the main module\n```python\nfrom crem.crem import mutate_mol, grow_mol, link_mols\nfrom rdkit import Chem\n```\n\nCreate a molecute and **mutate** it. Only one heavy atom will be substituted. Default radius is 3.\n```python\nm = Chem.MolFromSmiles('c1cc(OC)ccc1C') # toluene\nmols = list(mutate_mol(m, db_name='replacements.db', max_size=1))\n```\noutput example\n```text\n['CCc1ccc(C)cc1',\n 'CC#Cc1ccc(C)cc1',\n 'C=C(C)c1ccc(C)cc1',\n 'CCCc1ccc(C)cc1',\n 'CC=Cc1ccc(C)cc1',\n 'CCCCc1ccc(C)cc1',\n 'CCCOc1ccc(C)cc1',\n 'CNCCc1ccc(C)cc1',\n 'COCCc1ccc(C)cc1',\n ...\n 'Cc1ccc(C(C)(C)C)cc1']\n```\n\n\nAdd hydrogens to the molecule to **mutate hydrogens** as well\n```python\nmols = list(mutate_mol(Chem.AddHs(m), db_name='replacements.db', max_size=1))\n```\noutput\n```text\n['CCc1ccc(C)cc1',\n 'CC#Cc1ccc(C)cc1',\n 'C=C(C)c1ccc(C)cc1',\n 'CCCc1ccc(C)cc1',\n 'Cc1ccc(C(C)C)cc1',\n 'CC=Cc1ccc(C)cc1',\n ...\n 'COc1ccc(C)cc1C',\n 'C=Cc1cc(C)ccc1OC',\n 'COc1ccc(C)cc1Cl',\n 'COc1ccc(C)cc1CCl']\n```\n\n**Grow** molecule. Only hydrogens will be replaced. Hydrogens should not be added explicitly.\n```python\nmols = list(grow_mol(m, db_name='replacements_sc2.db'))\n```\noutput\n```text\n['COc1ccc(C)c(Br)c1',\n 'COc1ccc(C)c(C)c1',\n 'COc1ccc(C)c(Cl)c1',\n 'COc1ccc(C)c(OC)c1',\n 'COc1ccc(C)c(N)c1',\n ...\n 'COc1ccc(CCN)cc1']\n```\n\nCreate the second molecule and **link** it to toluene\n```python\nm2 = Chem.MolFromSmiles('NCC(=O)O') # glycine\nmols = list(link_mols(m, m2, db_name='replacements.db'))\n```\noutput\n```text\n['Cc1ccc(OCC(=O)NCC(=O)O)cc1',\n 'Cc1ccc(OCCOC(=O)CN)cc1',\n 'COc1ccc(CC(=N)NCC(=O)O)cc1',\n 'COc1ccc(CC(=O)NCC(=O)O)cc1',\n 'COc1ccc(CC(=S)NCC(=O)O)cc1',\n 'COc1ccc(CCOC(=O)CN)cc1']\n```\n\nYou can vary the size of a linker and specify the distance between two attachment points in a linking fragment. There are many other arguments available in these functions, look at their **docstrings** for details.\n\n##### Multiprocessing\nAll functions have an argument `ncores` and can make mupltile replacement in one molecule in parallel. If you want to process several molecules in parallel you have to write your own code. However, the described functions are generators and cannot be used with `multiprocessing` module. Therefore, three complementary functions `mutate_mol2`, `grow_mol2` and `link_mols2` were created. They return the list with results and can be pickled and used with `multiprocessing.Pool` or other tools.\n\nExample:\n```python\nfrom multiprocessing import Pool\nfrom functools import partial\nfrom crem.crem import mutate_mol2\nfrom rdkit import Chem\n\np = Pool(2)\ninput_smi = ['c1ccccc1N', 'NCC(=O)OC', 'NCCCO']\ninput_mols = [Chem.MolFromSmiles(s) for s in input_smi]\n\nres = list(p.imap(partial(mutate_mol2, db_name='replacements.db', max_size=1), input_mols))\n```\n\n`res` would be a list of lists with SMILES of generated molecules\n\n## Bechmarks\n\n##### Guacamol\n\n|task|SMILES LSTM*|SMILES GA*|Graph GA*|Graph MCTS*|CReM\n|---|:---:|:---:|:---:|:---:|:---:|\n|Celecoxib rediscovery|**1.000**|0.732|**1.000**|0.355|**1.000**\n|Troglitazone rediscovery|**1.000**|0.515|**1.000**|0.311|**1.000**\n|Thiothixene rediscovery|**1.000**|0.598|**1.000**|0.311|**1.000**\n|Aripiprazole similarity|**1.000**|0.834|**1.000**|0.380|**1.000**\n|Albuterol similarity|**1.000**|0.907|**1.000**|0.749|**1.000**\n|Mestranol similarity|**1.000**|0.79|**1.000**|0.402|**1.000**\n|C11H24|**0.993**|0.829|0.971|0.410|0.966\n|C9H10N2O2PF2Cl|0.879|0.889|**0.982**|0.631|0.940\n|Median molecules 1|**0.438**|0.334|0.406|0.225|0.371\n|Median molecules 2|0.422|0.38|0.432|0.170|**0.434**\n|Osimertinib MPO|0.907|0.886|0.953|0.784|**0.995**\n|Fexofenadine MPO|0.959|0.931|0.998|0.695|**1.000**\n|Ranolazine MPO|0.855|0.881|0.92|0.616|**0.969**\n|Perindopril MPO|0.808|0.661|0.792|0.385|**0.815**\n|Amlodipine MPO|0.894|0.722|0.894|0.533|**0.902**\n|Sitagliptin MPO|0.545|0.689|**0.891**|0.458|0.763\n|Zaleplon MPO|0.669|0.413|0.754|0.488|**0.770**\n|Valsartan SMARTS|0.978|0.552|0.990|0.04|**0.994**\n|Deco Hop|0.996|0.970|**1.000**|0.590|**1.000**\n|Scaffold Hop|0.998|0.885|**1.000**|0.478|**1.000**\n|total score|17.341|14.398|17.983|9.011|17.919\n\n## License\nBSD-3\n\n## Citation\n\n\n\n", "description_content_type": "text/markdown", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/DrrDom/crem", "keywords": "", "license": "", "maintainer": "", "maintainer_email": "", "name": "crem", "package_url": "https://pypi.org/project/crem/", "platform": "", "project_url": "https://pypi.org/project/crem/", "project_urls": { "Homepage": "https://github.com/DrrDom/crem" }, "release_url": "https://pypi.org/project/crem/0.1/", "requires_dist": [ "rdkit (>=2017.09) ; extra == 'rdkit'" ], "requires_python": ">=3.6", "summary": "CReM: chemically reasonable mutations framework", "version": "0.1" }, "last_serial": 5713644, "releases": { "0.1": [ { "comment_text": "", "digests": { "md5": "91755da1bc8ba74f10cff8d568d963d3", "sha256": "703a9d4462ace746406a92ccb2a77b50ea90a174af0744fa678f443f3f3839b8" }, "downloads": -1, "filename": "crem-0.1-py3-none-any.whl", "has_sig": false, "md5_digest": "91755da1bc8ba74f10cff8d568d963d3", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": ">=3.6", "size": 24458, "upload_time": "2019-08-22T07:14:54", "url": "https://files.pythonhosted.org/packages/a5/79/77c701c0c0eb2ce6701319a95cfd29493e8e51618c5bb621cd40bf4f0841/crem-0.1-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "3dbb1c1bbdb85df4a7f9758636fee06e", "sha256": "55c4dcbe560c22209543219d705b9b24a78222adcb26e8e0531f455bf1bbfe15" }, "downloads": -1, "filename": "crem-0.1.tar.gz", "has_sig": false, "md5_digest": "3dbb1c1bbdb85df4a7f9758636fee06e", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.6", "size": 20106, "upload_time": "2019-08-22T07:14:57", "url": "https://files.pythonhosted.org/packages/49/5e/1ae619500a3515623022dcfe0a8b1a9d1d5f008b2eb2655caf766fff025b/crem-0.1.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "91755da1bc8ba74f10cff8d568d963d3", "sha256": "703a9d4462ace746406a92ccb2a77b50ea90a174af0744fa678f443f3f3839b8" }, "downloads": -1, "filename": "crem-0.1-py3-none-any.whl", "has_sig": false, "md5_digest": "91755da1bc8ba74f10cff8d568d963d3", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": ">=3.6", "size": 24458, "upload_time": "2019-08-22T07:14:54", "url": "https://files.pythonhosted.org/packages/a5/79/77c701c0c0eb2ce6701319a95cfd29493e8e51618c5bb621cd40bf4f0841/crem-0.1-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "3dbb1c1bbdb85df4a7f9758636fee06e", "sha256": "55c4dcbe560c22209543219d705b9b24a78222adcb26e8e0531f455bf1bbfe15" }, "downloads": -1, "filename": "crem-0.1.tar.gz", "has_sig": false, "md5_digest": "3dbb1c1bbdb85df4a7f9758636fee06e", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.6", "size": 20106, "upload_time": "2019-08-22T07:14:57", "url": "https://files.pythonhosted.org/packages/49/5e/1ae619500a3515623022dcfe0a8b1a9d1d5f008b2eb2655caf766fff025b/crem-0.1.tar.gz" } ] }