{ "info": { "author": "Jacopo De Luca", "author_email": "jacopo.delu@gmail.com", "bugtrack_url": null, "classifiers": [ "Development Status :: 5 - Production/Stable", "Intended Audience :: Developers", "License :: OSI Approved :: GNU General Public License v3 (GPLv3)", "Programming Language :: Python :: 3" ], "description": "# DirtyText # \nSearches for [ab]using of Unicode glyphs.\n\n## Installation\nDirtyText package can be installed through pip :snake: :\n\n $ pip install dirtytext\n\nor downloaded from [GitHub](https://github.com/jacopodl/dirtytext).\n\n# Quick tour: #\n\n## Common options: ##\n- Read from file: -f \\\n- Save modified text: -s \\\n- Text filter: --filter\n- Pipeline mode: -p\n\n### :mag_right: Looks for ZERO-WIDTH characters: ###\n $> echo \"This text\u200c\u200c\u200c\u200c\u200d\u200c\u202c\u200c\u200c\u200c\u200c\u200c\u200d\u202c\u200d\u200d \u200c\u200c\u200c\u200c\u200d\u202c\ufeff\u200ccontains\u200c\u200c\u200c\u200c\u200d\u202c\ufeff\u200c\u200c\u200c\u200c\u200c\u200d\u202c\ufeff\ufeff\u200c\u200c\u200c\u200c\u200c\u202c\u200c\u200c\u200c\u200c\u200c\u200c\u200d\u200d\u200d\ufeff\u200c\u200c\u200c\u200c\u200d\u202c\ufeff\ufeff \u200c\u200c\u200c\u200c\u200d\ufeff\u200c\u202c\u200c\u200c\u200c\u200c\u200d\u202c\ufeff\u200czero-width\u200c\u200c\u200c\u200c\u200d\u202c\u200d\u200c chars\" | dirtytext --zero -v\n\nwill produce the following output:\n\n```text\nContains zero-width characters: True\nJSON: \n[{\"idx\": 0, \"char\": \"\\ufeff\", \"cval\": \"FEFF\", \"infos\": null}, \n{\"idx\": 10, \"char\": \"\\u200c\", \"cval\": \"200C\", \"infos\": null}, \n{\"idx\": 11, \"char\": \"\\u200c\", \"cval\": \"200C\", \"infos\": null}, ...]\n```\n\n### :mag_right: Looks for CONFUSABLES characters: ###\n\n $> echo \"hello\" | dirtytext --confusables greek -v\n\nwill produce the following output:\n\n```text\nContains confusables characters: True\nJSON:\n[{\"idx\": 2, \"char\": \"l\", \"cval\": \"006C\", \"infos\": [{\"target\": \"0399\", \"description\": \"GREEK CAPITAL LETTER IOTA\"}]}, \n{\"idx\": 3, \"char\": \"l\", \"cval\": \"006C\", \"infos\": [{\"target\": \"0399\", \"description\": \"GREEK CAPITAL LETTER IOTA\"}]}, \n{\"idx\": 4, \"char\": \"o\", \"cval\": \"006F\", \"infos\": [{\"target\": \"03BF\", \"description\": \"GREEK SMALL LETTER OMICRON\"}, \n{\"target\": \"03C3\", \"description\": \"GREEK SMALL LETTER SIGMA\"}]}]\n```\n\n### :mag_right: Looks and filter anomalies in LATIN text: ###\n```text\nexample.txt:\n\nIt\u2004\u217dan\u205fbe\u2004argue\u217e\u2009that\u2008the\u2004\u217domputer\u2009\u2170s human\u2170ty\u2019s\u2008attempt to\u2007rep\u217c\u2170\u217date\u2007the\u2008human\u2004brain.\nThis\u2005\u2170s perhaps\u2009an unattainable goal. \nHowever, unattainable goals often lead to outstanding accomplishment.\n```\n $> dirtytext -f example.txt --lsubs --filter -s out.txt\n\n```text\nout.txt:\n\nIt\u2004can\u205fbe\u2004argued\u2009that\u2008the\u2004computer\u2009is humanity\u2019s\u2008attempt to\u2007replicate\u2007the\u2008human\u2004brain.\nThis\u2005is perhaps\u2009an unattainable goal. \nHowever, unattainable goals often lead to outstanding accomplishment.\n```\n\n# UnicodeDB #\nThe unicode data that composes dirtytext database are extracted from unicode consortium, \nin particular there are two database files into dirtytext/data directory:\n\n- categories.json: built from data extracted from [here](https://unicode.org/Public/UNIDATA/Scripts.txt)\n- confusables.json: built from data extracted from [here](https://unicode.org/Public/security/latest/confusables.txt)\n\nIf dirtytext/data doesn't exist, DT downloads and build database before performing the required operations, \nafter which you can force the database update by adding the --update option\n\n# License #\nReleased under GPL-3.0", "description_content_type": "text/markdown", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/jacopodl/dirtytext", "keywords": "dirty,text,tool,unicode,UTF-8,glyph", "license": "GNU General Public License v3", "maintainer": "", "maintainer_email": "", "name": "dirtytext", "package_url": "https://pypi.org/project/dirtytext/", "platform": "", "project_url": "https://pypi.org/project/dirtytext/", "project_urls": { "Homepage": "https://github.com/jacopodl/dirtytext" }, "release_url": "https://pypi.org/project/dirtytext/1.0.0/", "requires_dist": null, "requires_python": "", "summary": "Searches for [ab]using of Unicode glyphs", "version": "1.0.0" }, "last_serial": 4000344, "releases": { "1.0.0": [ { "comment_text": "", "digests": { "md5": "36689a57b99fc1e91b70128c5351e87c", "sha256": "559b2f0d04890070230639352d0eb40c64eb8bbf125a408bda6fc521ce91e3a6" }, "downloads": -1, "filename": "dirtytext-1.0.0.tar.gz", "has_sig": false, "md5_digest": "36689a57b99fc1e91b70128c5351e87c", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 111415, "upload_time": "2018-06-25T13:39:41", "url": "https://files.pythonhosted.org/packages/e0/e6/776b07af3b3d0c61765f0d8a8a9c139e984310c061d7687ac13ea5710cf8/dirtytext-1.0.0.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "36689a57b99fc1e91b70128c5351e87c", "sha256": "559b2f0d04890070230639352d0eb40c64eb8bbf125a408bda6fc521ce91e3a6" }, "downloads": -1, "filename": "dirtytext-1.0.0.tar.gz", "has_sig": false, "md5_digest": "36689a57b99fc1e91b70128c5351e87c", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 111415, "upload_time": "2018-06-25T13:39:41", "url": "https://files.pythonhosted.org/packages/e0/e6/776b07af3b3d0c61765f0d8a8a9c139e984310c061d7687ac13ea5710cf8/dirtytext-1.0.0.tar.gz" } ] }