{ "info": { "author": "Hyeshik Chang, Kamil Slowikowski", "author_email": "hyeshik@snu.ac.kr, slowikow@broadinstitute.org", "bugtrack_url": null, "classifiers": [ "Development Status :: 4 - Beta", "Intended Audience :: Developers", "Intended Audience :: Science/Research", "License :: OSI Approved :: MIT License", "Operating System :: OS Independent", "Programming Language :: C", "Programming Language :: Python", "Topic :: Scientific/Engineering :: Bio-Informatics", "Topic :: Software Development :: Libraries :: Python Modules" ], "description": "This module allows fast random access to files compressed with bgzip_ and\r\nindexed by tabix_. It includes a C extension with code from klib_. The bgzip\r\nand tabix programs are available here_.\r\n\r\nInstallation\r\n------------\r\n\r\n::\r\n\r\n pip install --user pytabix\r\n\r\n\r\nSynopsis\r\n--------\r\n\r\nGenomics data is often in a table where each row corresponds to a genomic\r\nregion (start, end) or a position::\r\n\r\n chrom pos snp\r\n 1 1000760 rs75316104\r\n 1 1000894 rs114006445\r\n 1 1000910 rs79750022\r\n 1 1001177 rs4970401\r\n 1 1001256 rs78650406\r\n\r\nWith tabix_, you can quickly retrieve all rows in a genomic region by\r\nspecifying a query with a sequence name, start, and end:\r\n\r\n.. code:: python\r\n\r\n import tabix\r\n\r\n # Open a remote or local file.\r\n url = \"ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20100804/\"\r\n url += \"ALL.2of4intersection.20100804.genotypes.vcf.gz\"\r\n\r\n tb = tabix.open(url)\r\n\r\n # These queries are identical. A query returns an iterator over the results.\r\n records = tb.query(\"1\", 1000000, 1250000)\r\n records = tb.queryi(0, 1000000, 1250000)\r\n records = tb.querys(\"1:1000000-1250000\")\r\n\r\n # Each record is a list of strings.\r\n for record in records:\r\n print record[:3]\r\n\r\n.. code:: python\r\n\r\n ['1', '1000760', 'rs75316104']\r\n ['1', '1000760', 'rs75316104']\r\n ['1', '1000894', 'rs114006445']\r\n ['1', '1000910', 'rs79750022']\r\n ['1', '1001177', 'rs4970401']\r\n ['1', '1001256', 'rs78650406']\r\n\r\n\r\nExample\r\n-------\r\n\r\nLet's say you have a table of gene coordinates:\r\n\r\n.. code:: bash\r\n\r\n $ zcat example.bed.gz | shuf | head -n5 | column -t\r\n chr19 53611131 53636172 55786 ZNF415\r\n chr10 72149121 72150375 221017 CEP57L1P1\r\n chr4 185009858 185139113 133121 ENPP6\r\n chrX 132669772 133119672 2719 GPC3\r\n chr6 134924279 134925376 114182 FAM8A6P\r\n\r\nSort_ it by chromosome, then by start and end positions. Then, use bgzip_ to\r\ndeflate the file into compressed blocks:\r\n\r\n.. code:: bash\r\n\r\n $ zcat example.bed.gz | sort -k1V -k2n -k3n | bgzip > example.bed.bgz\r\n\r\nThe compressed size is usually slightly larger than that obtained with gzip.\r\n\r\nIndex the file with tabix_:\r\n\r\n.. code:: bash\r\n\r\n $ tabix -s 1 -b 2 -e 3 example.bed.gz\r\n \r\n $ ls\r\n example.bed.gz example.bed.bgz example.bed.bgz.tbi\r\n\r\n.. _bgzip: http://samtools.sourceforge.net/tabix.shtml\r\n.. _tabix: http://samtools.sourceforge.net/tabix.shtml\r\n.. _klib: https://github.com/jmarshall/klib\r\n.. _here: http://sourceforge.net/projects/samtools/files/tabix/\r\n.. _Sort: https://www.gnu.org/software/coreutils/manual/html_node/Details-about-version-sort.html#Details-about-version-sort", "description_content_type": null, "docs_url": null, "download_url": "UNKNOWN", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/slowkow/pytabix", "keywords": "tabix, bgzip, bioinformatics, genomics", "license": "MIT", "maintainer": "Kamil Slowikowski", "maintainer_email": "slowikow@broadinstitute.org", "name": "pytabix", "package_url": "https://pypi.org/project/pytabix/", "platform": "", "project_url": "https://pypi.org/project/pytabix/", "project_urls": { "Download": "UNKNOWN", "Homepage": "https://github.com/slowkow/pytabix" }, "release_url": "https://pypi.org/project/pytabix/0.1/", "requires_dist": null, "requires_python": null, "summary": "Python interface for tabix", "version": "0.1" }, "last_serial": 1471279, "releases": { "0.0.2": [ { "comment_text": "", "digests": { "md5": "c558be1d55a8b72c92668837305054e2", "sha256": "7cdefa37f77e59c1ddea3c402e00cb352e5be2aa7fef41feb49ed930a86ace5f" }, "downloads": -1, "filename": "pytabix-0.0.2.tar.gz", "has_sig": false, "md5_digest": "c558be1d55a8b72c92668837305054e2", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 46806, "upload_time": "2015-03-21T15:38:59", "url": "https://files.pythonhosted.org/packages/c5/37/e30f6b04237801d072938ca55fde9c3773cc981043152a9ccc16a028f321/pytabix-0.0.2.tar.gz" } ], "0.1": [ { "comment_text": "", "digests": { "md5": "bf9c069c3787c0c240255b917ef34405", "sha256": "0774f1687ebd41811fb07a0e50951b6be72d7cc7e22ed2b18972eaf7482eb7d1" }, "downloads": -1, "filename": "pytabix-0.1.tar.gz", "has_sig": false, "md5_digest": "bf9c069c3787c0c240255b917ef34405", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 45811, "upload_time": "2014-04-16T17:49:24", "url": "https://files.pythonhosted.org/packages/84/6a/520ecf75c2ada77492cb4ed21fb22aed178e791df434ca083b59fffadddd/pytabix-0.1.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "bf9c069c3787c0c240255b917ef34405", "sha256": "0774f1687ebd41811fb07a0e50951b6be72d7cc7e22ed2b18972eaf7482eb7d1" }, "downloads": -1, "filename": "pytabix-0.1.tar.gz", "has_sig": false, "md5_digest": "bf9c069c3787c0c240255b917ef34405", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 45811, "upload_time": "2014-04-16T17:49:24", "url": "https://files.pythonhosted.org/packages/84/6a/520ecf75c2ada77492cb4ed21fb22aed178e791df434ca083b59fffadddd/pytabix-0.1.tar.gz" } ] }