{ "info": { "author": "Ryan Dale", "author_email": "dalerr@niddk.nih.gov", "bugtrack_url": null, "classifiers": [], "description": "ENCODE dataframe\n================\n\nI wanted a better way of exploring and downloading raw data from the ENCODE\nproject.\n\nFor example, I'd like to get the BAM files for all ChIP-seq experiments done in\nuninduced MEL cells (from the mm9 assembly).\n\nOne strategy would be to individually go through each track hub (e.g., histone\nmods from LICR, http://genome.cit.nih.gov/cgi-bin/hgFileUi?db=mm9&g=wgEncodeLicrHistone), filter data, and download files individually.\n\nAnother strategy would be to go directly to the download page\n(http://hgdownload.cse.ucsc.edu/goldenPath/mm9/encodeDCC/wgEncodeLicrHistone/)\nand extract the files that end in `.bam`.\n\nThis small package takes advantage of the `files.txt` files (here's an `example\n`_)\nthat describe all the metadata on the download page.\n\nThe `files.txt` files are downloaded from each ENCODE track hub in the assembly\nof interest. Then these files are parsed and concatenated together into one\nbig `pandas.DataFrame` that can be used to find the data you care about.\n\nInstallation\n------------\n\n::\n\n pip install encode-dataframe\n\n\nUsage\n-----\nMirror the files. This may take a minute or so. If you've cloned the git\nrepo, you already have a copy of the mm9 files.\n\n>>> import encode_dataframe as edf\n>>> edf.mirror_metadata_files('mm9')\n\nCreate a large DataFrame:\n\n>>> df = edf.encode_dataframe('mm9')\n\n>>> len(df)\n5865\n\nArmed with the dataframe, we can now slice and dice to get the data we care\nabout. Eventually I'd like to run a ChromHMM segmentation on MEL cells, but\nI need to get the data first . . .\n\nChoose a cell type\n\n>>> interesting = df.cell == 'MEL'\n\nAnd only BAM files\n\n>>> interesting &= df.type == 'bam'\n\nAnd only ChIP- or DNase-seq\n\n>>> interesting &= df.dataType.isin(['ChipSeq', 'DnaseSeq'])\n\nAnd only untreated (in this case, uninduced) cells:\n\n>>> interesting &= df.treatment != 'DMSO_2.0pct'\n\nAnd only one replicate (some have 2 or 3)\n\n>>> interesting &= df.replicate == '1'\n\nAnd only those that don't have some issue with them (looks like older versions\nhave some text in the objStatus field):\n\n>>> interesting &= df.objStatus.isnull()\n\nHow many do we have to work with?\n\n>>> m = df[interesting]\n>>> len(m)\n60\n\nSome of these are controls (input or IgG), and there are some duplicates (looks\nlike H3K4me3 ChIP-seq uses 2 different controls; CTCF was done by different\ngroups). How many unique antibodies?\n\n>>> len(m.antibody.unique())\n46\n\nSo here are the files I should download:\n\n>>> urls = m.url.values", "description_content_type": null, "docs_url": null, "download_url": "UNKNOWN", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "UNKNOWN", "keywords": null, "license": "MIT", "maintainer": null, "maintainer_email": null, "name": "encode-dataframe", "package_url": "https://pypi.org/project/encode-dataframe/", "platform": "UNKNOWN", "project_url": "https://pypi.org/project/encode-dataframe/", "project_urls": { "Download": "UNKNOWN", "Homepage": "UNKNOWN" }, "release_url": "https://pypi.org/project/encode-dataframe/0.1/", "requires_dist": null, "requires_python": null, "summary": "Convert UCSC's ENCODE metadata into pandas DataFrames", "version": "0.1" }, "last_serial": 1199150, "releases": { "0.1": [ { "comment_text": "", "digests": { "md5": "09d15c1f341867d0927284f7fc736e4a", "sha256": "35531d104c716c6f42f1569dfb2dbc0b79924802a19fe03de36a573d150238aa" }, "downloads": -1, "filename": "encode_dataframe-0.1-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "09d15c1f341867d0927284f7fc736e4a", "packagetype": "bdist_wheel", "python_version": "2.7", "requires_python": null, "size": 5470, "upload_time": "2014-08-22T18:46:36", "url": "https://files.pythonhosted.org/packages/cc/12/ba0fd98f1843543812bb49279a1209bb4be9879db6026dd72829ca4d0cbf/encode_dataframe-0.1-py2.py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "87d738103c364160648689bfe724b4f9", "sha256": "5350067688e83d3f093cb3181ab087de720aa43d6086bec3cbedfc0bdc987228" }, "downloads": -1, "filename": "encode-dataframe-0.1.tar.gz", "has_sig": false, "md5_digest": "87d738103c364160648689bfe724b4f9", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 3265, "upload_time": "2014-08-22T18:46:33", "url": "https://files.pythonhosted.org/packages/d1/b4/e71596bcddc05ae1cf1dfe6dedc9f15dcb87b08f8a4e7111cdc4497590d3/encode-dataframe-0.1.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "09d15c1f341867d0927284f7fc736e4a", "sha256": "35531d104c716c6f42f1569dfb2dbc0b79924802a19fe03de36a573d150238aa" }, "downloads": -1, "filename": "encode_dataframe-0.1-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "09d15c1f341867d0927284f7fc736e4a", "packagetype": "bdist_wheel", "python_version": "2.7", "requires_python": null, "size": 5470, "upload_time": "2014-08-22T18:46:36", "url": "https://files.pythonhosted.org/packages/cc/12/ba0fd98f1843543812bb49279a1209bb4be9879db6026dd72829ca4d0cbf/encode_dataframe-0.1-py2.py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "87d738103c364160648689bfe724b4f9", "sha256": "5350067688e83d3f093cb3181ab087de720aa43d6086bec3cbedfc0bdc987228" }, "downloads": -1, "filename": "encode-dataframe-0.1.tar.gz", "has_sig": false, "md5_digest": "87d738103c364160648689bfe724b4f9", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 3265, "upload_time": "2014-08-22T18:46:33", "url": "https://files.pythonhosted.org/packages/d1/b4/e71596bcddc05ae1cf1dfe6dedc9f15dcb87b08f8a4e7111cdc4497590d3/encode-dataframe-0.1.tar.gz" } ] }