{
    "info": {
        "author": "Ryan Dale",
        "author_email": "dalerr@niddk.nih.gov",
        "bugtrack_url": null,
        "classifiers": [],
        "description": "ENCODE dataframe\n================\n\nI wanted a better way of exploring and downloading raw data from the ENCODE\nproject.\n\nFor example, I'd like to get the BAM files for all ChIP-seq experiments done in\nuninduced MEL cells (from the mm9 assembly).\n\nOne strategy would be to individually go through each track hub (e.g., histone\nmods from LICR, http://genome.cit.nih.gov/cgi-bin/hgFileUi?db=mm9&g=wgEncodeLicrHistone), filter data, and download files individually.\n\nAnother strategy would be to go directly to the download page\n(http://hgdownload.cse.ucsc.edu/goldenPath/mm9/encodeDCC/wgEncodeLicrHistone/)\nand extract the files that end in `.bam`.\n\nThis small package takes advantage of the `files.txt` files (here's an `example\n<http://hgdownload.cse.ucsc.edu/goldenPath/mm9/encodeDCC/wgEncodeLicrHistone/files.txt>`_)\nthat describe all the metadata on the download page.\n\nThe `files.txt` files are downloaded from each ENCODE track hub in the assembly\nof interest.  Then these files are parsed and concatenated together into one\nbig `pandas.DataFrame` that can be used to find the data you care about.\n\nInstallation\n------------\n\n::\n\n    pip install encode-dataframe\n\n\nUsage\n-----\nMirror the files.  This may take a minute or so.  If you've cloned the git\nrepo, you already have a copy of the mm9 files.\n\n>>> import encode_dataframe as edf\n>>> edf.mirror_metadata_files('mm9')\n\nCreate a large DataFrame:\n\n>>> df = edf.encode_dataframe('mm9')\n\n>>> len(df)\n5865\n\nArmed with the dataframe, we can now slice and dice to get the data we care\nabout.  Eventually I'd like to run a ChromHMM segmentation on MEL cells, but\nI need to get the data first . . .\n\nChoose a cell type\n\n>>> interesting = df.cell == 'MEL'\n\nAnd only BAM files\n\n>>> interesting &= df.type == 'bam'\n\nAnd only ChIP- or DNase-seq\n\n>>> interesting &= df.dataType.isin(['ChipSeq', 'DnaseSeq'])\n\nAnd only untreated (in this case, uninduced) cells:\n\n>>> interesting &= df.treatment != 'DMSO_2.0pct'\n\nAnd only one replicate (some have 2 or 3)\n\n>>> interesting &= df.replicate == '1'\n\nAnd only those that don't have some issue with them (looks like older versions\nhave some text in the objStatus field):\n\n>>> interesting &= df.objStatus.isnull()\n\nHow many do we have to work with?\n\n>>> m = df[interesting]\n>>> len(m)\n60\n\nSome of these are controls (input or IgG), and there are some duplicates (looks\nlike H3K4me3 ChIP-seq uses 2 different controls; CTCF was done by different\ngroups).  How many unique antibodies?\n\n>>> len(m.antibody.unique())\n46\n\nSo here are the files I should download:\n\n>>> urls = m.url.values",
        "description_content_type": null,
        "docs_url": null,
        "download_url": "UNKNOWN",
        "downloads": {
            "last_day": -1,
            "last_month": -1,
            "last_week": -1
        },
        "home_page": "UNKNOWN",
        "keywords": null,
        "license": "MIT",
        "maintainer": null,
        "maintainer_email": null,
        "name": "encode-dataframe",
        "package_url": "https://pypi.org/project/encode-dataframe/",
        "platform": "UNKNOWN",
        "project_url": "https://pypi.org/project/encode-dataframe/",
        "project_urls": {
            "Download": "UNKNOWN",
            "Homepage": "UNKNOWN"
        },
        "release_url": "https://pypi.org/project/encode-dataframe/0.1/",
        "requires_dist": null,
        "requires_python": null,
        "summary": "Convert UCSC's ENCODE metadata into pandas DataFrames",
        "version": "0.1"
    },
    "last_serial": 1199150,
    "releases": {
        "0.1": [
            {
                "comment_text": "",
                "digests": {
                    "md5": "09d15c1f341867d0927284f7fc736e4a",
                    "sha256": "35531d104c716c6f42f1569dfb2dbc0b79924802a19fe03de36a573d150238aa"
                },
                "downloads": -1,
                "filename": "encode_dataframe-0.1-py2.py3-none-any.whl",
                "has_sig": false,
                "md5_digest": "09d15c1f341867d0927284f7fc736e4a",
                "packagetype": "bdist_wheel",
                "python_version": "2.7",
                "requires_python": null,
                "size": 5470,
                "upload_time": "2014-08-22T18:46:36",
                "url": "https://files.pythonhosted.org/packages/cc/12/ba0fd98f1843543812bb49279a1209bb4be9879db6026dd72829ca4d0cbf/encode_dataframe-0.1-py2.py3-none-any.whl"
            },
            {
                "comment_text": "",
                "digests": {
                    "md5": "87d738103c364160648689bfe724b4f9",
                    "sha256": "5350067688e83d3f093cb3181ab087de720aa43d6086bec3cbedfc0bdc987228"
                },
                "downloads": -1,
                "filename": "encode-dataframe-0.1.tar.gz",
                "has_sig": false,
                "md5_digest": "87d738103c364160648689bfe724b4f9",
                "packagetype": "sdist",
                "python_version": "source",
                "requires_python": null,
                "size": 3265,
                "upload_time": "2014-08-22T18:46:33",
                "url": "https://files.pythonhosted.org/packages/d1/b4/e71596bcddc05ae1cf1dfe6dedc9f15dcb87b08f8a4e7111cdc4497590d3/encode-dataframe-0.1.tar.gz"
            }
        ]
    },
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "md5": "09d15c1f341867d0927284f7fc736e4a",
                "sha256": "35531d104c716c6f42f1569dfb2dbc0b79924802a19fe03de36a573d150238aa"
            },
            "downloads": -1,
            "filename": "encode_dataframe-0.1-py2.py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "09d15c1f341867d0927284f7fc736e4a",
            "packagetype": "bdist_wheel",
            "python_version": "2.7",
            "requires_python": null,
            "size": 5470,
            "upload_time": "2014-08-22T18:46:36",
            "url": "https://files.pythonhosted.org/packages/cc/12/ba0fd98f1843543812bb49279a1209bb4be9879db6026dd72829ca4d0cbf/encode_dataframe-0.1-py2.py3-none-any.whl"
        },
        {
            "comment_text": "",
            "digests": {
                "md5": "87d738103c364160648689bfe724b4f9",
                "sha256": "5350067688e83d3f093cb3181ab087de720aa43d6086bec3cbedfc0bdc987228"
            },
            "downloads": -1,
            "filename": "encode-dataframe-0.1.tar.gz",
            "has_sig": false,
            "md5_digest": "87d738103c364160648689bfe724b4f9",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 3265,
            "upload_time": "2014-08-22T18:46:33",
            "url": "https://files.pythonhosted.org/packages/d1/b4/e71596bcddc05ae1cf1dfe6dedc9f15dcb87b08f8a4e7111cdc4497590d3/encode-dataframe-0.1.tar.gz"
        }
    ]
}