{ "info": { "author": "Armel Lefebvre, Jonathan de Bruin", "author_email": "A.E.J.Lefebvre@uu.nl, j.debruin1@uu.nl", "bugtrack_url": null, "classifiers": [ "Development Status :: 4 - Beta", "Intended Audience :: Developers", "License :: OSI Approved :: MIT License", "Programming Language :: Python :: 2", "Programming Language :: Python :: 2.7", "Programming Language :: Python :: 3", "Programming Language :: Python :: 3.3", "Programming Language :: Python :: 3.4", "Programming Language :: Python :: 3.5", "Programming Language :: Python :: 3.6", "Topic :: Software Development :: Build Tools" ], "description": "Path2Insight \n============\n\n|travis| |readthedocs|\n\n.. |travis| image:: https://travis-ci.org/armell/path2insight.svg?branch=master\n :target: https://travis-ci.org/armell/path2insight\n.. |readthedocs| image:: https://readthedocs.org/projects/path2insight/badge/\n :target: https://readthedocs.org/projects/path2insight/badge/\n\nPath2Insight (p2i) is a modular and scalable python module which aims at\noffering a unified and comprehensive set of processing tools for analyzing\nfile paths. P2i supports static file systems analysis without requiring access\nto the original physical storage. Basically, a scan of the storage\u2019s content\nexported as a text file suffices to explore the saved resources. There is also\nno need to access the content of the files as the p2i module import file paths\nas strings.\n\nOnce loaded, the file paths are stored in-memory as a python object enabling:\npreprocessing, text processing and descriptive analysis of folders and files.\n\n**Preprocessing:** Sample, sort and select files based on multiple criteria (e.g.\nparent folder, depth).\n\n**Text processing:** Chunk file paths into tokens (full path, stem and name),\nn-grams or complete paths with the help of several extensible tokenizers.\nAlso, taggers offer the option to aggregate files based on their structure and\ncontent (which prepare paths for further analysis such as entity recognition\nor classification tasks).\n\n**Descriptive analysis:** P2i implements counters for tokens, stems and\nextensions. It supports also statistical features such as X2 tests on the\ndistribution of extensions, stems and names. Further, a representation of the\ncomplexity of the folder structure is facilitated by folder-depth analysis\nfunctionalities.\n\nThe table below shows how Path2Insight differ and complement functionalities\noffered by lower-level python modules (pathlib and os.path).\n\n+--------------------------------------------+-----------------------------------------------------------------------------------------+------------------------------------------+----------------------------------------------+\n| Functionality | P2i | Pathlib | os.path |\n+============================================+=========================================================================================+==========================================+==============================================+\n| Preprocessing | Pathlib + Sampling, sorting, selection | match, joinpath | Normcase, norm path |\n+--------------------------------------------+-----------------------------------------------------------------------------------------+------------------------------------------+----------------------------------------------+\n| Descriptive statistics | Counters: stem, extension, name. Taggers. Tokenizers | os.stat | os.stat |\n+--------------------------------------------+-----------------------------------------------------------------------------------------+------------------------------------------+----------------------------------------------+\n| Text processing | Pathlib + Tokens, n-grams, taggers, lower, upper,... | Stem, name, parent, extension drive, ... | Split |\n+--------------------------------------------+-----------------------------------------------------------------------------------------+------------------------------------------+----------------------------------------------+\n| Access or modify information on the system | No, can be linked to additional metadata (datetimes, users) by joining on the full path | Yes, chmod, current folder. ... | Yes, user, size, datetimes, descriptors, ... |\n+--------------------------------------------+-----------------------------------------------------------------------------------------+------------------------------------------+----------------------------------------------+\n\nP2i is dependency free (only pathlib2 is required for Python 2.7 users), fast\nand scalable path processing toolkit. It is compliant with the major data\nanalysis python modules such as pandas, scikit-learn, nltk and matplotlib to\nextent the analytical possibilities of path2insight.\n\nExample\n=======\n\nImport the module and load a demo dataset with static file paths (or use\n`path2insight.walk` to collect from you file system).\n\n.. code:: python\n\n >>> import path2insight\n >>> from path2insight.datasets import load_ensembl\n\n >>> filepaths = load_ensembl()\n\n.. code:: python \n\n >>> path2insight.depth_counts(filepaths)\n Counter({3: 1, 4: 11, 5: 39424, 6: 5543, 7: 2733, 8: 3388})\n\n.. code:: python\n\n >>> path2insight.token_counts(filepaths).most_common(10)\n [('txt', 31977),\n ('gene', 13798),\n ('ensembl', 12727),\n ('dm', 12500),\n ('homolog', 7380),\n ('fa', 5890),\n ('chromosome', 5011),\n ('feature', 4878),\n ('dna', 4608),\n ('90', 3404)]\n\n.. code:: python\n\n >>> path2insight.extension_counts(filepaths).most_common(10)\n [('.gz', 44427),\n ('', 3094),\n ('.bb', 847),\n ('.nsq', 349),\n ('.nin', 349),\n ('.nhr', 349),\n ('.tsv', 336),\n ('.psq', 250),\n ('.pin', 250),\n ('.phr', 250)]\n\n.. code:: python\n\n >>> path2insight.select_re(filepaths, level5='micro.*')\n [PosixFilePath('/Volumes/release-90/variation/VEP/microtus_ochrogaster_vep_90_MicOch1.0.tar.gz'),\n PosixFilePath('/Volumes/release-90/variation/VEP/microtus_ochrogaster_refseq_vep_90_MicOch1.0.tar.gz'),\n PosixFilePath('/Volumes/release-90/variation/VEP/microtus_ochrogaster_merged_vep_90_MicOch1.0.tar.gz'),\n PosixFilePath('/Volumes/release-90/variation/VEP/microcebus_murinus_vep_90_Mmur_2.0.tar.gz'),\n PosixFilePath('/Volumes/release-90/rdf/microtus_ochrogaster/microtus_ochrogaster_xrefs.ttl.gz.graph'),\n\n\n.. code:: python\n\n >>> path2insight.distance_on_token(filepaths[0:10]) \n array([[ 0. , 2. , 1.41421356, 3. , 3. ],\n [ 2. , 0. , 2.44948974, 3.31662479, 3.31662479],\n [ 1.41421356, 2.44948974, 0. , 3. , 3. ],\n [ 3. , 3.31662479, 3. , 0. , 1.41421356],\n [ 3. , 3.31662479, 3. , 1.41421356, 0. ]])\n\n\nInstallation and dependencies\n=============================\n\nPath2Insight is available on Pypi. This make it possible to install it with\nthrough:\n\n.. code:: bash\n\n pip install path2insight\n\nTo upgrade path2insight use \n\n.. code:: bash\n\n pip install --upgrade path2insight\n\nPath2Insight is available for Python 2.7 and Python 3.4+. Path2Insight depends\nheavily on the pathlib_ module. This module is part of Python 3.4 or higher.\nFor Python 2, the backport pathlib2_ is used. Therefore, it is advised to use\nPath2Insight with Python 3.4 or higher.\n\n.. _pathlib: https://docs.python.org/3/library/pathlib.html\n.. _pathlib2: https://pypi.python.org/pypi/pathlib2/\n\nSome of the submodules of Path2Insight depend on other Python packages (numpy,\npandas, sklearn, scipy, jellyfish). One can get a full installation by\ninstalling the packages in the `requirements-full.txt` file.\n\n.. code:: bash\n\n pip install -r requirements-full.txt\n\n\nCite\n====\n\nFollows. \n\nAuthors\n=======\n\n- Armel Lefebvre\n- Jonathan de Bruin\n\n\n\n\n", "description_content_type": null, "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/armell/path2insight", "keywords": "filepath pathlib datamanagement exploration", "license": "MIT", "maintainer": "", "maintainer_email": "", "name": "path2insight", "package_url": "https://pypi.org/project/path2insight/", "platform": "", "project_url": "https://pypi.org/project/path2insight/", "project_urls": { "Homepage": "https://github.com/armell/path2insight" }, "release_url": "https://pypi.org/project/path2insight/1.0b2/", "requires_dist": [ "parameterized; extra == 'test'", "pytest; extra == 'test'" ], "requires_python": "", "summary": "A set of tools to retrieve information from filepaths.", "version": "1.0b2" }, "last_serial": 3581517, "releases": { "1.0b1": [ { "comment_text": "", "digests": { "md5": "c9ecc47ba443272e1a15df230b79e027", "sha256": "bdced5a25cde8512a446493c529c553163a91dbf9f16129f38e9fe7df86789f4" }, "downloads": -1, "filename": "path2insight-1.0b1-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "c9ecc47ba443272e1a15df230b79e027", "packagetype": "bdist_wheel", "python_version": "py2.py3", "requires_python": null, "size": 1596908, "upload_time": "2018-02-14T14:06:27", "url": "https://files.pythonhosted.org/packages/fe/a5/622530e7d0ad434758337e4c63fac4dac2dffc55c567894791e929d43487/path2insight-1.0b1-py2.py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "854effc226af0785c580138c883b00b4", "sha256": "e96f39eff113e8e8699f2d9da24824afb99a947497c1591757a02d3590ab886b" }, "downloads": -1, "filename": "path2insight-1.0b1.tar.gz", "has_sig": false, "md5_digest": "854effc226af0785c580138c883b00b4", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 1591813, "upload_time": "2018-02-14T14:06:38", "url": "https://files.pythonhosted.org/packages/3c/ad/d1090769d04604fb85fd695040a4d918285b5b47a09c4bb1ca0e4ce28eb6/path2insight-1.0b1.tar.gz" } ], "1.0b2": [ { "comment_text": "", "digests": { "md5": "ee0de692aa8889d317ac07bd89341eed", "sha256": "1d618de89d5d131abacb717f6a4b0e5451f29a4eb76a5e2660e1972c34f4801b" }, "downloads": -1, "filename": "path2insight-1.0b2-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "ee0de692aa8889d317ac07bd89341eed", "packagetype": "bdist_wheel", "python_version": "py2.py3", "requires_python": null, "size": 1601691, "upload_time": "2018-02-14T14:20:43", "url": "https://files.pythonhosted.org/packages/a3/eb/a21d19530150551e9eaca37f433abd99b09d0d29c7d7213ce99711532fef/path2insight-1.0b2-py2.py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "061a4cf5c41fcde4c0655a20244cdce2", "sha256": "32db51b2dce585b5e9f4ba446590695ecf0c9877ee4f7b6745ac92bb4627e747" }, "downloads": -1, "filename": "path2insight-1.0b2.tar.gz", "has_sig": false, "md5_digest": "061a4cf5c41fcde4c0655a20244cdce2", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 1591816, "upload_time": "2018-02-14T14:20:46", "url": "https://files.pythonhosted.org/packages/05/ed/6f919bc4de9f0b0a21e2e68786c9ea96ca4634ca3e9c1eb288152a76fe88/path2insight-1.0b2.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "ee0de692aa8889d317ac07bd89341eed", "sha256": "1d618de89d5d131abacb717f6a4b0e5451f29a4eb76a5e2660e1972c34f4801b" }, "downloads": -1, "filename": "path2insight-1.0b2-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "ee0de692aa8889d317ac07bd89341eed", "packagetype": "bdist_wheel", "python_version": "py2.py3", "requires_python": null, "size": 1601691, "upload_time": "2018-02-14T14:20:43", "url": "https://files.pythonhosted.org/packages/a3/eb/a21d19530150551e9eaca37f433abd99b09d0d29c7d7213ce99711532fef/path2insight-1.0b2-py2.py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "061a4cf5c41fcde4c0655a20244cdce2", "sha256": "32db51b2dce585b5e9f4ba446590695ecf0c9877ee4f7b6745ac92bb4627e747" }, "downloads": -1, "filename": "path2insight-1.0b2.tar.gz", "has_sig": false, "md5_digest": "061a4cf5c41fcde4c0655a20244cdce2", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 1591816, "upload_time": "2018-02-14T14:20:46", "url": "https://files.pythonhosted.org/packages/05/ed/6f919bc4de9f0b0a21e2e68786c9ea96ca4634ca3e9c1eb288152a76fe88/path2insight-1.0b2.tar.gz" } ] }