{ "info": { "author": "Andy Boughton", "author_email": "abought@umoch.edu", "bugtrack_url": null, "classifiers": [ "Development Status :: 4 - Beta", "Intended Audience :: Developers", "License :: OSI Approved :: MIT License", "Programming Language :: Python :: 3.6", "Programming Language :: Python :: 3.7" ], "description": "# ZORP: A helpful GWAS parser\n\n[![Build Status](https://api.travis-ci.org/abought/zorp.svg?branch=develop)](https://api.travis-ci.org/abought/zorp)\n\n## Why?\nZORP is intended to abstract away differences in file formats, and help you work with GWAS data from many \ndifferent sources.\n\n- Provide a single unified interface to read text, gzip, or tabixed data\n- Separation of concerns between reading and parsing (with parsers that can handle the most common file formats)\n- Includes helpers to auto-detect data format and filter for variants of interest \n\n## Why not?\nZORP provides a high level abstraction. This means that it is convenient, at the expense of speed.\n\nFor GWAS files, ZORP does not sort the data for you, because doing so in python would be quite slow. You will still \nneed to do some basic data preparation before using.\n\n## Usage\n### Python\n```python\nfrom zorp import readers, parsers\n\n# Create a reader instance. This example specifies each option for clarity, but sniffers are provided to auto-detect \n# common format options.\nsample_parser = parsers.GenericGwasLineParser(marker_col=1, pvalue_col=2, is_neg_log_pvalue=True,\n delimiter='\\t')\nreader = readers.TabixReader('input.bgz', parser=sample_parser, skip_rows=1, skip_errors=True)\n\n# After parsing the data, values of pre-defined fields can be cleaned up, or used to perform lookups\nreader.add_transform('rsid', lambda variant: some_rsid_finder(variant.chrom, variant.pos, variant.ref, variant.alt))\n\n# We can filter data to the variants of interest. If you use a domain specific parser, columns can be referenced by name\nreader.add_filter('chrom', '19') # This row must have the specified value for the \"chrom\" field\nreader.add_filter(lambda row: row.neg_log_pvalue > 7.301) # Provide a function that can operate on all parsed fields\nreader.add_filter('neg_log_pvalue') # Exclude values with missing data for the named field \n\n# Iteration returns containers of cleaned, parsed data (with fields accessible by name).\nfor row in reader:\n print(row.chrom)\n\n# Tabix files support iterating over all or part of the file\nfor row in reader.fetch('X', 500_000, 1_000_000):\n print(row)\n\n# Write a compressed, tabix-indexed file containing the subset of variants that match filters, choosing only specific \n# columns. The data written out will be cleaned and standardized by the parser into a well-defined format. \nout_fn = reader.write('outfile.txt', columns=['chrom', 'pos', 'pvalue'], make_tabix=True)\n\n# Real data is often messy. If a line fails to parse, the problem will be recorded.\nfor number, message, raw_line in reader.errors:\n print('Line {} failed to parse: {}'.format(number, message))\n\n```\n\n### Command line file conversion\nThe file conversion feature of zorp is also available as a command line utility. See `zorp-convert --help` for details\nand the full list of supported options.\n\nThis utility is currently in beta; please inspect the results carefully.\n\nTo auto-detect columns based on a library of commonly known file formats:\n\n`$ zorp-convert --auto infile.txt --dest outfile.txt --compress`\n\nOr specify your data columns exactly: \n\n`$ zorp-convert infile.txt --dest outfile.txt --index --skip-rows 1 --chrom_col 1 --pos_col 2 --ref_col 3 --alt_col 4 --pvalue_col 5 --beta_col 6 --stderr_beta_col 7 --allele_freq_col 8`\n\nThe `--index` option requires that your file be sorted first. If not, you can tabix the standard output format manually \nas follows.\n\n```\n$ (head -n 1 && tail -n +2 | sort -k1,1 -k 2,2n) | bgzip > \n$ tabix -p vcf\n```\n\n## Development\n\nTo install dependencies and run in development mode:\n\n`pip install -e '.[test,perf]'`\n\nTo run unit tests, use\n\n```bash\n$ flake8 zorp\n$ mypy zorp\n$ pytest tests/\n```\n\n\n", "description_content_type": "text/markdown", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/abought/zorp", "keywords": "sample setuptools development", "license": "", "maintainer": "", "maintainer_email": "", "name": "zorp", "package_url": "https://pypi.org/project/zorp/", "platform": "", "project_url": "https://pypi.org/project/zorp/", "project_urls": { "Bug Reports": "https://github.com/abought/zorp/issues", "Homepage": "https://github.com/abought/zorp", "Source": "https://github.com/abought/zorp/" }, "release_url": "https://pypi.org/project/zorp/0.1.0/", "requires_dist": [ "pysam", "fastnumbers (==2.2.1) ; extra == 'perf'", "coverage ; extra == 'test'", "pytest ; extra == 'test'", "pytest-flake8 ; extra == 'test'", "pytest-mypy ; extra == 'test'" ], "requires_python": ">=3.6", "summary": "ZORP: A helpful GWAS parser", "version": "0.1.0" }, "last_serial": 5941583, "releases": { "0.1.0": [ { "comment_text": "", "digests": { "md5": "9b6487e29b7f3b0f8ee5b2685c028c40", "sha256": "29a5761bdbb3d2d816277a946b24138ed9ca5c34cc41d9311fecf108e70268a1" }, "downloads": -1, "filename": "zorp-0.1.0-py3-none-any.whl", "has_sig": false, "md5_digest": "9b6487e29b7f3b0f8ee5b2685c028c40", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": ">=3.6", "size": 19252, "upload_time": "2019-10-07T21:27:45", "url": "https://files.pythonhosted.org/packages/1c/64/fa56530edd8238988923f4c2718502c0d93ece21d208955d22db09c93a7d/zorp-0.1.0-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "4beade8d7c56e804dee88bac5248fa15", "sha256": "bdf06ecb4e7e6eb4afb16be977b20e29fa4112b6fc42cf6c641e8350e47e23ef" }, "downloads": -1, "filename": "zorp-0.1.0.tar.gz", "has_sig": false, "md5_digest": "4beade8d7c56e804dee88bac5248fa15", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.6", "size": 19658, "upload_time": "2019-10-07T21:27:48", "url": "https://files.pythonhosted.org/packages/1b/d6/c1d0e42427d616672788d9354f6cb071db3fd70f89b9ed22636505abf417/zorp-0.1.0.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "9b6487e29b7f3b0f8ee5b2685c028c40", "sha256": "29a5761bdbb3d2d816277a946b24138ed9ca5c34cc41d9311fecf108e70268a1" }, "downloads": -1, "filename": "zorp-0.1.0-py3-none-any.whl", "has_sig": false, "md5_digest": "9b6487e29b7f3b0f8ee5b2685c028c40", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": ">=3.6", "size": 19252, "upload_time": "2019-10-07T21:27:45", "url": "https://files.pythonhosted.org/packages/1c/64/fa56530edd8238988923f4c2718502c0d93ece21d208955d22db09c93a7d/zorp-0.1.0-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "4beade8d7c56e804dee88bac5248fa15", "sha256": "bdf06ecb4e7e6eb4afb16be977b20e29fa4112b6fc42cf6c641e8350e47e23ef" }, "downloads": -1, "filename": "zorp-0.1.0.tar.gz", "has_sig": false, "md5_digest": "4beade8d7c56e804dee88bac5248fa15", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.6", "size": 19658, "upload_time": "2019-10-07T21:27:48", "url": "https://files.pythonhosted.org/packages/1b/d6/c1d0e42427d616672788d9354f6cb071db3fd70f89b9ed22636505abf417/zorp-0.1.0.tar.gz" } ] }