{ "info": { "author": "Shiqi Tu", "author_email": "tushiqi@picb.ac.cn", "bugtrack_url": null, "classifiers": [ "Development Status :: 4 - Beta", "Intended Audience :: Science/Research", "License :: OSI Approved :: GNU General Public License v3 (GPLv3)", "Operating System :: OS Independent", "Programming Language :: Python :: 2", "Programming Language :: Python :: 3", "Topic :: Scientific/Engineering :: Bio-Informatics" ], "description": "=============================\nIntroduction to MAnorm2_utils\n=============================\n\n:Author: Shiqi Tu\n:Contact: tushiqi@picb.ac.cn\n:Version: 1.0.0\n:Date: 2018-08-24\n\n:code:`MAnorm2_utils` is designed to coordinate with MAnorm2_, an R package for\ndifferential analysis with ChIP-seq_ signals between two or more groups of\nreplicate samples. :code:`MAnorm2_utils` is primarily used for processing a set\nof ChIP-seq samples into a regular table recording the read abundances and\nenrichment states of a list of genomic bins in each of these samples.\n\n.. _MAnorm2: https://github.com/tushiqi/MAnorm2\n.. _ChIP-seq: https://en.wikipedia.org/wiki/ChIP-sequencing\n\n\nUsage\n------------------------------\n\nThe primary utility of :code:`MAnorm2_utils` comes from the two scripts bound\nwith it, named :code:`profile_bins` and :code:`sam2bed`, respectively.\n\n\nProfiling ChIP-seq signals in reference genomic regions\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nGiven the peak regions and mapping positions of reads of each of a set of\nChIP-seq_ samples, :code:`profile_bins` comes up with a list of reference\ngenomic bins (each being enriched for ChIP-seq signals in at least one of the\nsamples), and deduces the read count as well as enrichment status of each of\nthe bins in each sample. Refer to MACS_ for more information about the\ntechnical terms mentioned above.\n\n.. _MACS: https://genomebiology.biomedcentral.com/\n articles/10.1186/gb-2008-9-9-r137\n\nWe recommend `MACS 1.4`_ for identifying peaks for ChIP-seq samples associated\nwith narrow genomic regions of reads enrichment (e.g., samples for most\ntranscription factors and histone modifications like H3K4me3 and H3K27ac). In\nfact, although having a general applicability, :code:`profile_bins` is\nspecifically suited to processing the output files generated by MACS 1.4. For\nhistone modifications constituting broad enriched domains (e.g., H3K9me3 and\nH3K27me3), we recommend SICER_ as the peak caller.\n\n.. _MACS 1.4: https://github.com/taoliu/MACS/downloads\n.. _SICER: https://academic.oup.com/bioinformatics/article/25/15/1952/212783\n\nThe following is a sample usage of :code:`profile_bins` of the simplest form:\n\n.. code:: bash\n\n profile_bins --peaks=peak1.bed,peak2.bed \\\n --reads=read1.bed,read2.bed \\\n --labs=s1,s2 -n example\n\n.. Note::\n\n :code:`profile_bins` only recognizes BED-formatted_ input files. For read\n alignment results stored in SAM_ files, use first :code:`sam2bed` to\n transform them into BED files before calling :code:`profile_bins` (BED files\n created by :code:`sam2bed` have been specifically designed to suit\n :code:`profile_bins`; see also the `section below`__). For BAM-formatted_\n files, refer to Samtools_ for converting them into SAM files.\n\n.. _BED-formatted: BED_\n.. _BED: http://genome.ucsc.edu/FAQ/FAQformat.html#format1\n.. _BAM-formatted: SAM_\n.. _SAM: https://samtools.github.io/hts-specs/SAMv1.pdf\n.. _Samtools: https://www.htslib.org/\n__ `Transforming SAM into BED files`_\n\nIf everything goes smoothly, the command above will generate two files, named\n``example_profile_bins_log.txt`` and ``example_profile_bins.xls``,\nrespectively. The former records the full list of parameter settings for\ncalling :code:`profile_bins`, as well as some summary statistics regarding each\nof the supplied ChIP-seq samples. The latter gives the read count and\nenrichment status for each deduced reference genomic bin in each sample, and\nhas a format like the following (data shown here is only for illustration):\n\n.. table:: Example output of :code:`profile_bins`\n :align: right\n\n ====== ======= ======= ============ ============ ============= =============\n chrom start end s1.read_cnt s2.read_cnt s1.occupancy s2.occupancy\n ====== ======= ======= ============ ============ ============= =============\n chr1 28112 29788 115 4 1 0\n chr1 164156 166417 233 194 1 1\n chr1 166417 168417 465 577 1 1\n chr1 168417 169906 15 34 0 1\n ====== ======= ======= ============ ============ ============= =============\n\nTo clarify, a genomic bin is \"occupied\" by a ChIP-seq sample if and only if its\nmiddle point is covered by some peak region of the sample.\n\n:code:`profile_bins` supports a number of parameters for a customized\nconfiguration for deducing reference genomic bins as well as counting the reads\nfalling in them. Type :code:`profile_bins --help` in the command line for a\ncomplete list of these parameters and a brief description of each of them.\nAmong others, several parameters deserve specific attention:\n\n- By default, :code:`profile_bins` merges peaks from all the provided ChIP-seq\n samples into a consensus set of peak regions, and divides up each *broad*\n merged peak into consecutive genomic bins. Specify :code:`--typical-bin-size`\n to control the size of such genomic bins. Note that the merged peaks having a\n size comparable to this parameter are left untouched.\n\n The default value of :code:`--typical-bin-size`, which is 2000, suits well\n the ChIP-seq samples of histone modifications. For ChIP-seq samples of\n transcription factors, setting the parameter to 1000 is recommended.\n\n- In cases where summit positions of the supplied peaks are available (e.g.,\n when the peaks are called by using `MACS 1.4`_), you may provide\n :code:`profile_bins` with this information via specifying :code:`--summits`.\n Summit positions will be used to determine an appropriate start point for\n dividing up a broad merged peak.\n\n- Alternatively, you can directly specify a set of genomic regions as the\n reference bins to profile, by setting :code:`--bins` to a BED_ file. In this\n case, :code:`profile_bins` focuses on these provided bins and suppresses the\n peak merging procedure.\n\n :code:`--typical-bin-size` and :code:`--summits` are ignored when\n :code:`--bins` is specified.\n\n- Before being assigned to reference bins, each read (or read pair) is\n converted into a genomic locus representing the middle point of the\n underlying DNA fragment. By default, :code:`profile_bins` treats the supplied\n reads as single-end, and shifts downstream the 5' end of each of them by\n :code:`--shiftsize` to reach the putative middle point. :code:`--shiftsize`\n defaults to 100, and may be set to half of the practical DNA fragment size\n selected in the library preparation process.\n\n- Set :code:`--paired` to indicate the reads are paired-end. In this case,\n middle point of the underlying DNA fragment associated with each read pair\n could be accurately inferred. Note that two reads from the same ChIP-seq\n sample are considered as a read pair only if they have *exactly the same*\n name (i.e., the 4th column in a BED_ file).\n\n :code:`--shiftsize` is ignored when :code:`--paired` is set.\n\n- :code:`--keep-dup` controls the program's behavior regarding duplicate reads\n (or read pairs) potentially resulting from PCR amplification. For single-end\n reads, two reads are considered as duplicates if their 5' ends are mapped to\n the same genomic locus; for paired-end reads, two read pairs are considered\n as duplicates if their implied DNA fragments occupy the same genomic\n interval.\n\n By default, :code:`profile_bins` preserves all the reads (or read pairs) for\n the counting procedure. For both paired-end reads and deep-sequencing\n single-end reads, we strongly recommend setting :code:`--keep-dup` to 1 to\n enhance the specificity of downstream analyses. In that case, for each\n ChIP-seq sample only one read (or read pair) of a set of duplicates is\n retained for counting. Note also that the output log file records, for each\n sample, the ratio of reads (or read pairs) that are removed due to\n :code:`--keep-dup`.\n\n- :code:`profile_bins` supports the idea of using a configuration file to\n deliver parameters, to avoid repeated typing in the command line. To do that,\n write a configuration file following the format as demonstrated below, and\n pass it to :code:`--parameters`::\n\n peaks=peak1.bed,peak2.bed\n reads=read1.bed,read2.bed\n labs=s1,s2\n n=example\n summits=summit1.bed,summit2.bed\n paired\n keep-dup=1\n\n Note that :code:`--parameters` could be used in mixture with the other\n command-line arguments.\n\nRefer to the `Manual of MAnorm2_utils`_ for a full specification of the\nparameters supported by :code:`profile_bins`.\n\n.. _Manual of MAnorm2_utils: https://github.com/tushiqi/MAnorm2_utils/\n tree/master/docs\n\n\nTransforming SAM into BED files\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\n:code:`sam2bed` is designed to coordinate with :code:`profile_bins`, since the\nlatter only accepts BED-formatted_ files. The simplest form of calling\n:code:`sam2bed` is as follows:\n\n.. code:: bash\n\n sam2bed -i File.sam -o File.bed\n\nThe program will read from the standard input stream if :code:`-i` is not\nspecified.\n\nIn the vast majority of cases, the default setting of most of the parameters\nsupported by :code:`sam2bed` should be used.\nThe only parameter that may be customized in\npractice is :code:`--min-qual`, which controls the program's behavior\nregarding filtering out the SAM_ alignment records with a low mapping quality.\nType :code:`sam2bed --help` in the command line for a brief description of each\nparameter supported by :code:`sam2bed`.\n\n\n\n\n\n\n", "description_content_type": "text/x-rst", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/tushiqi/MAnorm2_utils", "keywords": "ChIP-seq MAnorm2", "license": "", "maintainer": "", "maintainer_email": "", "name": "MAnorm2-utils", "package_url": "https://pypi.org/project/MAnorm2-utils/", "platform": "", "project_url": "https://pypi.org/project/MAnorm2-utils/", "project_urls": { "Homepage": "https://github.com/tushiqi/MAnorm2_utils" }, "release_url": "https://pypi.org/project/MAnorm2-utils/1.0.0/", "requires_dist": null, "requires_python": "", "summary": "To pre-process a set of ChIP-seq samples", "version": "1.0.0" }, "last_serial": 4191750, "releases": { "1.0.0": [ { "comment_text": "", "digests": { "md5": "6ff0dea0d7513304ebe709f4c8d749c0", "sha256": "8582c65c17beb4675046998bdd7c64254a2b92342acb7837f01ac097c8effc46" }, "downloads": -1, "filename": "MAnorm2_utils-1.0.0-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "6ff0dea0d7513304ebe709f4c8d749c0", "packagetype": "bdist_wheel", "python_version": "py2.py3", "requires_python": null, "size": 36526, "upload_time": "2018-08-21T09:50:15", "url": "https://files.pythonhosted.org/packages/e1/eb/8f003c8779223322e337a354a18c9ccb7d35208ad1f440ac81cbd113df8b/MAnorm2_utils-1.0.0-py2.py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "db654d0969a65803453ee880ef25c9db", "sha256": "139b85e898b12a7e97e131f26135a78302ae45d316eb89329670120c1139ae33" }, "downloads": -1, "filename": "MAnorm2_utils-1.0.0.tar.gz", "has_sig": false, "md5_digest": "db654d0969a65803453ee880ef25c9db", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 1749754, "upload_time": "2018-08-21T09:50:20", "url": "https://files.pythonhosted.org/packages/d8/3d/a4f21016a54e347eb8a48f2c9d317263fe283e07c2ecff0612073f8ab89d/MAnorm2_utils-1.0.0.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "6ff0dea0d7513304ebe709f4c8d749c0", "sha256": "8582c65c17beb4675046998bdd7c64254a2b92342acb7837f01ac097c8effc46" }, "downloads": -1, "filename": "MAnorm2_utils-1.0.0-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "6ff0dea0d7513304ebe709f4c8d749c0", "packagetype": "bdist_wheel", "python_version": "py2.py3", "requires_python": null, "size": 36526, "upload_time": "2018-08-21T09:50:15", "url": "https://files.pythonhosted.org/packages/e1/eb/8f003c8779223322e337a354a18c9ccb7d35208ad1f440ac81cbd113df8b/MAnorm2_utils-1.0.0-py2.py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "db654d0969a65803453ee880ef25c9db", "sha256": "139b85e898b12a7e97e131f26135a78302ae45d316eb89329670120c1139ae33" }, "downloads": -1, "filename": "MAnorm2_utils-1.0.0.tar.gz", "has_sig": false, "md5_digest": "db654d0969a65803453ee880ef25c9db", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 1749754, "upload_time": "2018-08-21T09:50:20", "url": "https://files.pythonhosted.org/packages/d8/3d/a4f21016a54e347eb8a48f2c9d317263fe283e07c2ecff0612073f8ab89d/MAnorm2_utils-1.0.0.tar.gz" } ] }