{ "info": { "author": "Oren Ben-Kiki", "author_email": "oren@ben-kiki.org", "bugtrack_url": null, "classifiers": [ "Development Status :: 3 - Alpha", "Intended Audience :: Developers", "License :: OSI Approved :: MIT License", "Programming Language :: Python :: 3.6", "Programming Language :: Python :: 3.7", "Topic :: Scientific/Engineering :: Bio-Informatics" ], "description": "TGUtils - Utilities for Tanay Group lab code\n============================================\n\n.. image:: https://travis-ci.org/tanaylab/tgutils.svg?branch=master\n :target: https://travis-ci.org/tanaylab/tgutils\n\n.. image:: https://readthedocs.org/projects/tgutils/badge/?version=latest\n :target: https://tgutils.readthedocs.io/en/latest/?badge=latest\n :alt: Documentation Status\n\nThis package contains common utilities used by the Tanay Group Python lab code (for example,\n``metacell``). These utilities are generally useful and not associated with any specific project.\n\nPhantom Types\n-------------\n\nThe vanilla ``np.ndarray``, ``pd.Series`` and ``pd.DataFrame`` types are very generic. They are the\nsame regardless of the element data type used, and in the case of ``np.ndarray``, the number of\ndimensions (array vs. matrix).\n\nTo understand the code, it is helpful to keep track of a more detailed data type - whether the\nvariable is an array or a matrix, and what the element data type is. To facilitate this, ``tgutils``\nprovides what is known as \"Phantom Types\". These types can be used in ``mypy`` type declarations,\neven though the actual data type of the variables remains the vanilla Numpy/Pandas data type.\n\nSee ``tgutils.numpy`` and ``tgutils.pandas`` for the list of provided phantom types. To\nuse them, instead of the vanilla:\n\n.. code-block:: python\n\n import numpy as np\n import pandas as pd\n\nWrite the modified:\n\n.. code-block:: python\n\n import tgutils.numpy as np\n import tgutils.pandas as pd\n\nThis this will provide access to the vanilla symbols using the ``np.`` and/or ``pd.`` prefixes, and\nwill also provide access to the enhanced functionality described below.\n\nFor example, instead of writing:\n\n.. code-block:: python\n\n def compute_some_integers() -> np.ndarray:\n ...\n\n def compute_some_floats() -> np.ndarray:\n ...\n\n def compute_stuff(foo: np.ndarray, bar: np.ndarray) -> ...:\n ...\n\n def workflow() -> ...:\n foo = compute_some_integers()\n bar = compute_some_floats()\n compute_stuff(foo, bar)\n\nYou should write:\n\n.. code-block:: python\n\n def compute_some_integers() -> np.ArrayInt32:\n ...\n\n def compute_some_floats() -> np.MatrixFloat64:\n ...\n\n def compute_stuff(foo: np.ArrayInt32, bar: np.MatrixFloat64) -> ...:\n ...\n\n def workflow() -> ...:\n foo = compute_some_integers()\n bar = compute_some_floats()\n compute_stuff(foo, bar)\n\nThis will allow the reader to understand the exact data types involved. Even better, it will allow\n``mypy`` to verify that you actually pass the correct data type to each function invocation.\nFor example, if you by mistake write ``compute_stuff(bar, foo)`` then ``mypy`` will complain that\nthe data types do not match - even though, under the covers, both ``foo`` and ``bar`` have exactly\nthe same data type at run-time: ``np.ndarray``.\n\nTo further help with ``mypy`` type checking, the ``tgutils`` package includes a ``stubs`` directory\ncontaining very partial quick-and-dirty type stubs for ``numpy`` and ``pandas`` (ideally, some brave\nsoul(s) would tackle the very difficult issue of providing proper stubs for these libraries,\nallowing for the removal of the ``tgutils`` stubs). Importing ``tgutils.setup_mypy`` module\nset ``MYPYPATH`` to this stubs directory, which is also a hack (see the ``metacell`` package for an\nexample of using this in your ``setup.py`` file).\n\nType Operations\n...............\n\nControl over the data types is also important when performing computations. It affects performance,\nmemory consumption and even the semantics of some operations. For example, integer elements can\nnever be ``NaN`` while floating point elements can, boolean elements have their own logic, and\nstring elements are different from numeric elements.\n\nTo help with this, ``tgutils`` provides two functions, ``am`` and ``be``. Both these functions\nreturn the requested data type, but ``am`` is just an assertion while ``be`` is a cast operation.\nThat is, writing ``ArrayInt32.am(foo)`` will return ``foo`` as an ``ArrayInt32``, or will raise an\nerror if ``foo`` is not an array of ``int32``; while writing ``ArrayInt32.be(foo)`` will always\nreturn an ``ArrayInt32``, which is either ``foo`` if it is an array of ``int32``, or a copy of\n``foo`` whose elements are the conversion of the elements of ``foo`` to ``int32``.\n\nDe/serialization\n................\n\nThe phantom types also provide read and write operations for efficiently storing data on the disk.\nThat is, writing ``ArrayInt32.read(path)`` will read an array of ``int32`` elements from the\nspecified path, and ``ArrayInt32.write(foo, path)`` will write an array of ``int32`` elements\ninto the specified path.\n\nDynaMake\n--------\n\nImport ``tgutils.make`` instead of ``dynamake.make``. This will achieve the following:\n\nUsing Qsub\n..........\n\nThe ``tgutils.tg_qsub`` script deals with submitting jobs to run on the SunGrid cluster in the\nTanay Group lab.\n\nA ``tgutils.make.tg_require`` function allows for collecting context for optimizing the slot\nallocation of ``tg_qsub`` for maximizing the cluster utilization and minimizing wait times. This has\nno effect unless the collected context values are explicitly used in the ``run_prefix`` and/or\n``run_suffix`` action wrapper of some step.\n\nThis is a convoluted and sub-optimal mechanism but has significant performance benefits in the\nspecific environment it was designed for.\n\nApplications\n------------\n\nImport ``tgutils.application`` instead of ``dynamake.application``. This will achieve the following:\n\nResources\n.........\n\nBy default, the Python process is restricted in the number of simultaneous open files. This\nis raised by ``tgutils`` to the maximum allowed by the operating system.\n\nNumpy Errors\n............\n\nBy default, ``numpy`` ignores several kinds of numeric errors. This is modified by ``tgutils``\nto raise an appropriate exception. This increases the robustness of the results.\n\nNumpy Random Number Generation\n..............................\n\nBy default, ``dynamake`` only handles the Python random number generator. This is extended by\n``tgutils`` so that the ``numpy`` random number generator is seeded with the same seeds as the\nPython random number generator, even in parallel calls. This seeding ensures results are replicable\n(when using the same non-zero seed).\n\nLogging\n.......\n\nThe default Python logging that prints to ``stderr`` works well for a single application. However,\nwhen running multiple applications in parallel, log messages may get interleaved resulting in\ngarbled output.\n\nThis is solved by ``tgutils`` using the ``tgutils.application.tg_qsub_logger``, which wraps\nthe default logger with a ``tgutils.application.FileLockLoggerAdapter``. This uses a file\nlock operation around each emitted log message to ensure it is atomic. The lock file is chosen\nto be compatible with the one used by the ``tgutils.tg_qsub`` script, so that log messages\nfrom this script will also be protected.\n\nParallel\n........\n\nWhen running a large number of very small tasks, it possible to let ``multiprocessing.Pool`` run\neach task on the much smaller number of available threads. However, this is less efficient. An\nalternative is to use ``tgutils.application.indexed_range`` which will partition the large\nrange of task indices into equal-sized sub-ranges, one per process. Reporting progress can be\ndone using the ``tgutils.application.ParallelCounter`` class.\n\nOther Utilities\n---------------\n\nTests\n.....\n\nThe provided ``tgutils.tests`` module provides ``TestWithReset`` which properly\nresets all the global state for each test, and ``TestWithFiles`` which also creates\na fresh temporary directory for each test. You can create new files using\n``tgutils.tests.write_file`` and verify file contents using\n``tgutils.tests.TestWithFiles.expect_file``.\n\nCaching\n.......\n\nYou can use the ``tgutils.cache.Cache`` class for a lightweight generic cache mechanism.\nIt uses weak references to hold onto expensive-to-compute data.\n\nYAML\n....\n\nYou can use ``tgutils.load_yaml.load_dictionary`` for a lightweight verification of loaded\nYAML data.", "description_content_type": "text/x-rst", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/tanaylab/tgutils.git", "keywords": "pandas numpy bioinformatics", "license": "MIT", "maintainer": "", "maintainer_email": "", "name": "tgutils", "package_url": "https://pypi.org/project/tgutils/", "platform": "", "project_url": "https://pypi.org/project/tgutils/", "project_urls": { "Homepage": "https://github.com/tanaylab/tgutils.git" }, "release_url": "https://pypi.org/project/tgutils/0.2.18/", "requires_dist": null, "requires_python": "", "summary": "Common utilities used by the Tanay Group Python lab code.", "version": "0.2.18" }, "last_serial": 5937306, "releases": { "0.2.17": [ { "comment_text": "", "digests": { "md5": "683d4387c227a5eb54f59e1225187285", "sha256": "b88b0835c41d4da1b0d55e10d76165a604676c2a7b3b5eb86f887206eca7c540" }, "downloads": -1, "filename": "tgutils-0.2.17.tar.gz", "has_sig": false, "md5_digest": "683d4387c227a5eb54f59e1225187285", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 26944, "upload_time": "2019-10-06T10:33:46", "url": "https://files.pythonhosted.org/packages/b8/ef/07a305b057ea9e7d4c0c3cbebc5b0cdb83e68ee59fd8bf4353fa529fe7d5/tgutils-0.2.17.tar.gz" } ], "0.2.18": [ { "comment_text": "", "digests": { "md5": "ac1e4f4d03dd4140587d1725c4cd8aed", "sha256": "4ec70543c0f83919792bbff0472d353fd13927437837008d132a99410bb9f8d6" }, "downloads": -1, "filename": "tgutils-0.2.18.tar.gz", "has_sig": false, "md5_digest": "ac1e4f4d03dd4140587d1725c4cd8aed", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 26959, "upload_time": "2019-10-07T06:46:35", "url": "https://files.pythonhosted.org/packages/92/c1/c8c37772f27d963b71ef630278d6db7d6eb737d8a8241f897e9f16436ad0/tgutils-0.2.18.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "ac1e4f4d03dd4140587d1725c4cd8aed", "sha256": "4ec70543c0f83919792bbff0472d353fd13927437837008d132a99410bb9f8d6" }, "downloads": -1, "filename": "tgutils-0.2.18.tar.gz", "has_sig": false, "md5_digest": "ac1e4f4d03dd4140587d1725c4cd8aed", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 26959, "upload_time": "2019-10-07T06:46:35", "url": "https://files.pythonhosted.org/packages/92/c1/c8c37772f27d963b71ef630278d6db7d6eb737d8a8241f897e9f16436ad0/tgutils-0.2.18.tar.gz" } ] }