{ "info": { "author": "Jens Preussner", "author_email": "jens.preussner@mpi-bn.mpg.de", "bugtrack_url": null, "classifiers": [ "Intended Audience :: Science/Research", "License :: OSI Approved :: MIT License", "Programming Language :: Python :: 3", "Topic :: Scientific/Engineering :: Bio-Informatics" ], "description": "# Streamed and parallel demultiplexing of fastq files\n\n## Quickstart\n\n```\npydemult --fastq input.fastq.gz --barcodes barcodes.txt --threads 4 --writer-threads 16\n```\n\n## Requirements and usage\n\n`pydemult` allows you to demultiplex fastq files in a streamed and parallel way. It expects that a **sample barcode** can be matched by a regular expression from the first line of each fastq entry and that sample barcodes are known in advance.\n\nSuppose we have a file containing **sample barcodes** like this:\n\n```\nSample Barcode\nsample1 CTTCAA\nsample2 CAACAA\nsample3 GTACGG\n```\n\nand a typical entry in the fastq file looks like this:\n\n```\n@HWI-ST808:140:H0L10ADXX:1:1101:8463:2:NNNNNN:CTTCCA\nTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTATGATGCTGTGAGTTCC\n+\n@CCDDDDFHHHHHJIJFDDDDDDDDDBDDDDDBB0@B#####################\n```\n\nSince the sample barcode is six bases long, we have to set the corresponding `--barcode-regex` option to `(.*):(?P[ATGCN]{6}` in the call\n\n```\npydemult --fastq input.fastq.gz --barcodes barcodes.txt --barcode-regex \"(.*):(?P[ATGCN]{6}\"\n```\n\n### Barcode and UMI regular expressions\n\nBy default, `pydemult` parses the read name for the cell barcode with regular expressions. Cell barcodes are indicated by a capturing group called `CB`, while (optional) UMIs are indicated by a capturing group called `UMI`. Some examples include:\n\n- `(.*):(?P[ATGCN]{11})`, for a cell barcode of length 11 that is present after the last colon of the read name.\n- `(.*):CELL_(?P[ATGCN]{10}):UMI_(?P[ATGCN]{8})`, for a cell barcode of length 10, followed by a UMI sequence of length 8. For DropSeq data preprocessed by the [umis](https://github.com/vals/umis) tool, a regex like this is advisable.\n\n### Output\n\n`pydemult` will create a compressed fastq file for each **sample barcode**, with the filename taken from the corresponding *Sample* column entry of `barcodes.txt`. \n\n### A note on multithreading\n\n`pydemult` divides its work into a demultiplexing and output part. The main thread streams the input and lazily distributes data blobs (of size `--buffer-size`) across `n` different demultiplexing threads (set with `--threads`), where the actual work happens. Demultiplexed input is then sent over to `m` threads for writing into individual output files (set with `--writer-threads`). Reading and demultiplexing are fast and CPU-bound operations, while output speed is determined by how fast data can be written to the underlying file system. In our experience, output is much slower than demultiplexing itself and requires proportionally more cores to speed up the runtime. We obtained best results when distributing output to three threads for each demultiplexing thread (`1:3` ratio of `--threads` to `--writer-threads`). \n\n## License\n\nThe project is licensed under the MIT license. See the `LICENSE` file for details.", "description_content_type": "", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/jenzopr/pydemult", "keywords": "", "license": "MIT", "maintainer": "", "maintainer_email": "", "name": "pydemult", "package_url": "https://pypi.org/project/pydemult/", "platform": "", "project_url": "https://pypi.org/project/pydemult/", "project_urls": { "Homepage": "https://github.com/jenzopr/pydemult" }, "release_url": "https://pypi.org/project/pydemult/0.6/", "requires_dist": null, "requires_python": "", "summary": "Streamed and parallel demultiplexing of fastq files in python", "version": "0.6", "yanked": false, "yanked_reason": null }, "last_serial": 7025256, "releases": { "0.1": [ { "comment_text": "", "digests": { "md5": "77f4dcf30b63b04c427ce51f8e2e2215", "sha256": "1e51faa022fbf9ce84515686b0aafb82e974d60ef87d71d595e69ece65860ebe" }, "downloads": -1, "filename": "pydemult-0.1-py3.6.egg", "has_sig": false, "md5_digest": "77f4dcf30b63b04c427ce51f8e2e2215", "packagetype": "bdist_egg", "python_version": "3.6", "requires_python": null, "size": 12000, "upload_time": "2018-08-23T14:23:03", "upload_time_iso_8601": "2018-08-23T14:23:03.993476Z", "url": "https://files.pythonhosted.org/packages/24/06/7ac96c4210ec935f6f07b562995b51d913599bd678efa8bb9aeb4307dc35/pydemult-0.1-py3.6.egg", "yanked": false, "yanked_reason": null }, { "comment_text": "", "digests": { "md5": "5f374837d8bbf83e1227de0441db678a", "sha256": "823fa6e0ac7bc096551d78d5721b27d774b1f7bcd9f249e21cb519ea80b70ae5" }, "downloads": -1, "filename": "pydemult-0.1-py3-none-any.whl", "has_sig": false, "md5_digest": "5f374837d8bbf83e1227de0441db678a", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 6734, "upload_time": "2018-08-23T14:23:02", "upload_time_iso_8601": "2018-08-23T14:23:02.071372Z", "url": "https://files.pythonhosted.org/packages/1f/9d/81c187ee1bd681973fca6dd2b6c1c8a31bdcd046a79bd1bd4a9af11852d4/pydemult-0.1-py3-none-any.whl", "yanked": false, "yanked_reason": null }, { "comment_text": "", "digests": { "md5": "7c1ebadd66fb1667b26039091287eda2", "sha256": "a255a88fe198f4af0b566c5990f305b8bc433e94473bbeb5931dfaf895b6ca04" }, "downloads": -1, "filename": "pydemult-0.1.tar.gz", "has_sig": false, "md5_digest": "7c1ebadd66fb1667b26039091287eda2", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 6319, "upload_time": "2018-08-23T14:23:05", "upload_time_iso_8601": "2018-08-23T14:23:05.257774Z", "url": "https://files.pythonhosted.org/packages/17/7f/b51eb4b9029fc4f234a0b513ac2834b26f52af60cfacbc29a6e4ebfb0cb3/pydemult-0.1.tar.gz", "yanked": false, "yanked_reason": null } ], "0.2": [ { "comment_text": "", "digests": { "md5": "3385bb152329ab22a80895122b9834c8", "sha256": "98aef3605d558ce8e456ac765ffd72eb43ab96e81cde5c46b085ee9220678b0f" }, "downloads": -1, "filename": "pydemult-0.2-py3-none-any.whl", "has_sig": false, "md5_digest": "3385bb152329ab22a80895122b9834c8", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 8672, "upload_time": "2018-08-28T14:31:17", "upload_time_iso_8601": "2018-08-28T14:31:17.777079Z", "url": "https://files.pythonhosted.org/packages/cd/98/5135a65b8521200b0574dc92406b99da3691de665d32ef86548cfc1f152e/pydemult-0.2-py3-none-any.whl", "yanked": false, "yanked_reason": null }, { "comment_text": "", "digests": { "md5": "41a50b48f42a4de5379bfee8a32e4b3f", "sha256": "befff5bbb6baec89456502417feda39d5c5df94c2ce52289024eacd6b667e811" }, "downloads": -1, "filename": "pydemult-0.2.tar.gz", "has_sig": false, "md5_digest": "41a50b48f42a4de5379bfee8a32e4b3f", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 6413, "upload_time": "2018-08-28T14:36:57", "upload_time_iso_8601": "2018-08-28T14:36:57.179270Z", "url": "https://files.pythonhosted.org/packages/92/60/117da5ea5f5aff8db03ce8d491bf65705244420d929fbfa1f8e4a606207f/pydemult-0.2.tar.gz", "yanked": false, "yanked_reason": null } ], "0.3": [ { "comment_text": "", "digests": { "md5": "4f408189bb85f67eaa8b9b2086004dbd", "sha256": "c9e58868516eaf691daaa649eee4d696f3e2c9c6d7ab1e613f9cf0ec826be1ab" }, "downloads": -1, "filename": "pydemult-0.3.tar.gz", "has_sig": false, "md5_digest": "4f408189bb85f67eaa8b9b2086004dbd", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 6822, "upload_time": "2018-10-04T07:44:34", "upload_time_iso_8601": "2018-10-04T07:44:34.082391Z", "url": "https://files.pythonhosted.org/packages/9b/66/918dca6bf58bb0fc35531c68af801ee8bea49c5be2d972f7015094733a8e/pydemult-0.3.tar.gz", "yanked": false, "yanked_reason": null } ], "0.4": [ { "comment_text": "", "digests": { "md5": "64f2e39a5e8597498809aa49520574e1", "sha256": "fe902ac1614f572ec0c4aefc950ef97fda0c9a3575667f1e6aa2c9fc33856ab3" }, "downloads": -1, "filename": "pydemult-0.4.tar.gz", "has_sig": false, "md5_digest": "64f2e39a5e8597498809aa49520574e1", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 7002, "upload_time": "2018-10-17T13:09:15", "upload_time_iso_8601": "2018-10-17T13:09:15.016233Z", "url": "https://files.pythonhosted.org/packages/9a/10/5880497aca69ab64096cfe309b618abecaecce0f5e148d1a84e9b78d9ae0/pydemult-0.4.tar.gz", "yanked": false, "yanked_reason": null } ], "0.4.1": [ { "comment_text": "", "digests": { "md5": "1044393ea96c153e1e7cf84a1d8d48ec", "sha256": "5a44dc6c819fd4394b282d6aeff4b571b61654486ce6e4b782d3cc0731c3a0e2" }, "downloads": -1, "filename": "pydemult-0.4.1.tar.gz", "has_sig": false, "md5_digest": "1044393ea96c153e1e7cf84a1d8d48ec", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 7020, "upload_time": "2018-10-26T08:19:04", "upload_time_iso_8601": "2018-10-26T08:19:04.404198Z", "url": "https://files.pythonhosted.org/packages/9b/c0/787fd2902fb62cb2e5a5622fc328b904e99739188a4fabe972164c46ea5c/pydemult-0.4.1.tar.gz", "yanked": false, "yanked_reason": null } ], "0.5": [ { "comment_text": "", "digests": { "md5": "2c5a27a63741c717360cfd3fc119a457", "sha256": "feab11915351976ceb3b6559170ed7613cedf977b26d4b9884ab24887ae20154" }, "downloads": -1, "filename": "pydemult-0.5.tar.gz", "has_sig": false, "md5_digest": "2c5a27a63741c717360cfd3fc119a457", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 7051, "upload_time": "2019-10-31T10:19:32", "upload_time_iso_8601": "2019-10-31T10:19:32.605259Z", "url": "https://files.pythonhosted.org/packages/0d/bc/b947b75912fd699644d28b432d5e611e8437e445686b164364b293393feb/pydemult-0.5.tar.gz", "yanked": false, "yanked_reason": null } ], "0.6": [ { "comment_text": "", "digests": { "md5": "34b05b2949f9405e48add90dca5686eb", "sha256": "59df3d38af23ee2cb6a2a383497ba868f788a2f1758600d859ae9767ad6376ae" }, "downloads": -1, "filename": "pydemult-0.6.tar.gz", "has_sig": false, "md5_digest": "34b05b2949f9405e48add90dca5686eb", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 7112, "upload_time": "2020-04-15T15:03:20", "upload_time_iso_8601": "2020-04-15T15:03:20.821266Z", "url": "https://files.pythonhosted.org/packages/25/c4/130f6be7b897947b42c0ab957b14b9eb42ea65372758a150a6ff60df710f/pydemult-0.6.tar.gz", "yanked": false, "yanked_reason": null } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "34b05b2949f9405e48add90dca5686eb", "sha256": "59df3d38af23ee2cb6a2a383497ba868f788a2f1758600d859ae9767ad6376ae" }, "downloads": -1, "filename": "pydemult-0.6.tar.gz", "has_sig": false, "md5_digest": "34b05b2949f9405e48add90dca5686eb", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 7112, "upload_time": "2020-04-15T15:03:20", "upload_time_iso_8601": "2020-04-15T15:03:20.821266Z", "url": "https://files.pythonhosted.org/packages/25/c4/130f6be7b897947b42c0ab957b14b9eb42ea65372758a150a6ff60df710f/pydemult-0.6.tar.gz", "yanked": false, "yanked_reason": null } ], "vulnerabilities": [] }