{ "info": { "author": "Ahmed TAHRI @Ousret", "author_email": "ahmed.tahri@cloudnursery.dev", "bugtrack_url": null, "classifiers": [ "Intended Audience :: Developers", "License :: OSI Approved :: MIT License", "Operating System :: OS Independent", "Programming Language :: Python", "Programming Language :: Python :: 3", "Programming Language :: Python :: 3.5", "Programming Language :: Python :: 3.6", "Programming Language :: Python :: 3.7", "Programming Language :: Python :: 3.8", "Programming Language :: Python :: Implementation :: PyPy", "Topic :: Software Development :: Libraries :: Python Modules", "Topic :: Text Processing :: Linguistic", "Topic :: Utilities" ], "description": "

Welcome to Charset Detection for Humans \ud83d\udc4b

\n\n

\n The Real First Universal Charset Detector
\n \n \n \n \n \n \"Download\n \n \n \"License:\n \n \n \"Code\n \n \n \n \n \n Documentation Status\n \n \"Download\n

\n\n> A library that helps you read text from an unknown charset encoding.
Motivated by `chardet`,\n> I'm trying to resolve the issue by taking a new approach.\n> All IANA character set names for which the Python core library provides codecs are supported.\n\n

\n >>>>> \u2764\ufe0f Try Me Online Now, Then Adopt Me \u2764\ufe0f <<<<<\n

\n\nThis project offers you an alternative to **Universal Charset Encoding Detector**, also known as **Chardet**.\n\n| Feature | [Chardet](https://github.com/chardet/chardet) | Charset Normalizer | [cChardet](https://github.com/PyYoshi/cChardet) |\n| ------------- | :-------------: | :------------------: | :------------------: |\n| `Fast` | \u274c
| \u274c
| \u2705
|\n| `Universal**` | \u274c | \u2705 | \u274c |\n| `Reliable` **without** distinguishable standards | \u274c | \u2705 | \u2705 |\n| `Reliable` **with** distinguishable standards | \u2705 | \u2705 | \u2705 |\n| `Free & Open` | \u2705 | \u2705 | \u2705 |\n| `Native Python` | \u2705 | \u2705 | \u274c |\n| `Detect spoken language` | \u274c | \u2705 | N/A |\n| `Supported Encoding` | 30 | :tada: [90](https://charset-normalizer.readthedocs.io/en/latest/support.html) | 40\n\n| Package | Accuracy | Mean per file (ns) | File per sec (est) |\n| ------------- | :-------------: | :------------------: | :------------------: |\n| [chardet](https://github.com/chardet/chardet) | 93.5 % | 126 081 168 ns | 7.931 file/sec |\n| [cchardet](https://github.com/PyYoshi/cChardet) | 97.0 % | 1 668 145 ns | **599.468 file/sec** |\n| charset-normalizer | **97.25 %** | 209 503 253 ns | 4.773 file/sec |\n\n

\n\"Reading\"Cat\n\n*\\*\\* : They are clearly using specific code for a specific encoding even if covering most of used one*
\n\n## Your support\n\nPlease \u2b50 this repository if this project helped you!\n\n## \u2728 Installation\n\nUsing PyPi\n```sh\npip install charset_normalizer\n```\n\n## \ud83d\ude80 Basic Usage\n\n### CLI\nThis package comes with a CLI\n\n```\nusage: normalizer [-h] [--verbose] [--normalize] [--replace] [--force]\n file [file ...]\n```\n\n```bash\nnormalizer ./data/sample.1.fr.srt\n```\n\n```\n+----------------------+----------+----------+------------------------------------+-------+-----------+\n| Filename | Encoding | Language | Alphabets | Chaos | Coherence |\n+----------------------+----------+----------+------------------------------------+-------+-----------+\n| data/sample.1.fr.srt | cp1252 | French | Basic Latin and Latin-1 Supplement | 0.0 % | 84.924 % |\n+----------------------+----------+----------+------------------------------------+-------+-----------+\n```\n\n### Python\n*Just print out normalized text*\n```python\nfrom charset_normalizer import CharsetNormalizerMatches as CnM\nprint(CnM.from_path('./my_subtitle.srt').best().first())\n```\n\n*Normalize any text file*\n```python\nfrom charset_normalizer import CharsetNormalizerMatches as CnM\ntry:\n CnM.normalize('./my_subtitle.srt') # should write to disk my_subtitle-***.srt\nexcept IOError as e:\n print('Sadly, we are unable to perform charset normalization.', str(e))\n```\n\n*Upgrade your code without effort*\n```python\nfrom charset_normalizer import detect\n```\n\nThe above code will behave the same as **chardet**.\n\nSee the docs for advanced usage : [readthedocs.io](https://charset-normalizer.readthedocs.io/en/latest/)\n\n## \ud83d\ude07 Why\n\nWhen I started using Chardet, I noticed that it was unreliable nowadays and also\nit's unmaintained, and most likely will never be.\n\nI **don't care** about the **originating charset** encoding, because **two different tables** can\nproduce **two identical files.**\nWhat I want is to get readable text, the best I can. \n\nIn a way, **I'm brute forcing text decoding.** How cool is that ? \ud83d\ude0e\n\nDon't confuse package **ftfy** with charset-normalizer or chardet. ftfy goal is to repair unicode string whereas charset-normalizer to convert raw file in unknown encoding to unicode.\n\n## \ud83c\udf70 How\n\n - Discard all charset encoding table that could not fit the binary content.\n - Measure chaos, or the mess once opened with a corresponding charset encoding.\n - Extract matches with the lowest mess detected.\n - Finally, if there is too much match left, we measure coherence.\n\n**Wait a minute**, what is chaos/mess and coherence according to **YOU ?**\n\n*Chaos :* I opened hundred of text files, **written by humans**, with the wrong encoding table. **I observed**, then\n**I established** some ground rules about **what is obvious** when **it seems like** a mess.\n I know that my interpretation of what is chaotic is very subjective, feel free to contribute in order to\n improve or rewrite it.\n\n*Coherence :* For each language there is on earth, we have computed ranked letter appearance occurrences (the best we can). So I thought\nthat intel is worth something here. So I use those records against decoded text to check if I can detect intelligent design.\n\n## \u26a1 Known limitations\n\n - Not intended to work on non (human) speakable language text content. eg. crypted text.\n - Language detection is unreliable when text contains two or more languages sharing identical letters.\n - Not well tested with tiny content.\n\n## \ud83d\udc64 Contributing\n\nContributions, issues and feature requests are very much welcome.
\nFeel free to check [issues page](https://github.com/ousret/charset_normalizer/issues) if you want to contribute.\n\n## \ud83d\udcdd License\n\nCopyright \u00a9 2019 [Ahmed TAHRI @Ousret](https://github.com/Ousret).
\nThis project is [MIT](https://github.com/Ousret/charset_normalizer/blob/master/LICENSE) licensed.\n\nLetter appearances frequencies used in this project \u00a9 2012 [Denny Vrande\u010di\u0107](http://denny.vrandecic.de)", "description_content_type": "text/markdown", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/ousret/charset_normalizer", "keywords": "encoding,i18n,txt,text,charset,charset-detector,normalization,unicode,chardet", "license": "MIT", "maintainer": "", "maintainer_email": "", "name": "charset-normalizer", "package_url": "https://pypi.org/project/charset-normalizer/", "platform": "", "project_url": "https://pypi.org/project/charset-normalizer/", "project_urls": { "Homepage": "https://github.com/ousret/charset_normalizer" }, "release_url": "https://pypi.org/project/charset-normalizer/1.3.1/", "requires_dist": null, "requires_python": ">=3.5.0", "summary": "The Real First Universal Charset Detector. No Cpp Bindings, Using Voodoo and Magical Artifacts.", "version": "1.3.1" }, "last_serial": 5961372, "releases": { "0.1.1a0": [ { "comment_text": "", "digests": { "md5": "5df64817b06caa5063e228b58b22dd89", "sha256": "a230d9d0c39ea5f23325e69ef60c52b1a563f74c06673cb1ecdd7ce41089da03" }, "downloads": -1, "filename": "charset_normalizer-0.1.1a0.tar.gz", "has_sig": false, "md5_digest": "5df64817b06caa5063e228b58b22dd89", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.4.0", "size": 19747, "upload_time": "2019-08-03T15:54:08", "url": "https://files.pythonhosted.org/packages/bc/d2/19888cb9cd17d269aa3a95859a136f42d6755a691ad1b563caae709129c7/charset_normalizer-0.1.1a0.tar.gz" } ], "0.1.2b0": [ { "comment_text": "", "digests": { "md5": "505111a101390ebfa882b61bda082dd2", "sha256": "2e57d67d55af976be3e5e11fb1dc5a4b02e5e10fed0e0746bbe9de76dc0aba9b" }, "downloads": -1, "filename": "charset_normalizer-0.1.2b0.tar.gz", "has_sig": false, "md5_digest": "505111a101390ebfa882b61bda082dd2", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.6.0", "size": 35557, "upload_time": "2019-08-07T21:51:58", "url": "https://files.pythonhosted.org/packages/c1/74/a14be7cec44a9b04c4ee0e2be4486fad7a2a1ef09726a081af5528ec559d/charset_normalizer-0.1.2b0.tar.gz" } ], "0.1.4b0": [ { "comment_text": "", "digests": { "md5": "ad46440fab8bfac75f6e099f67d7d841", "sha256": "42dfc3e9ae1a9680394938e412a004f11ddfb9f544a064ac5a733f4d1307f308" }, "downloads": -1, "filename": "charset_normalizer-0.1.4b0.tar.gz", "has_sig": false, "md5_digest": "ad46440fab8bfac75f6e099f67d7d841", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.4.0", "size": 35145, "upload_time": "2019-08-08T20:21:57", "url": "https://files.pythonhosted.org/packages/d2/bc/bb704adfbc4ea947ea20b3cba487108527ead6ba77c0acdfd7fa808a1df3/charset_normalizer-0.1.4b0.tar.gz" } ], "0.1.5b0": [ { "comment_text": "", "digests": { "md5": "02c5d1f3e53779d15a943a3930bad794", "sha256": "72ee724392aeeaebac8eb2a79c3fcba2677efdbecf5b4873d7fd2e8181c32d00" }, "downloads": -1, "filename": "charset_normalizer-0.1.5b0.tar.gz", "has_sig": false, "md5_digest": "02c5d1f3e53779d15a943a3930bad794", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.4.0", "size": 35115, "upload_time": "2019-08-08T20:48:36", "url": "https://files.pythonhosted.org/packages/64/d2/61e1ec31b452d28156c5fc1d44bfd9701b555f4e9b9820344d1d281c793a/charset_normalizer-0.1.5b0.tar.gz" } ], "0.1.7": [ { "comment_text": "", "digests": { "md5": "46f1201020d4cbd211be1303256c0621", "sha256": "fed1bb228f058a50e5f59789b25bd960778719019d1adb8740954e4b077f7776" }, "downloads": -1, "filename": "charset_normalizer-0.1.7.tar.gz", "has_sig": false, "md5_digest": "46f1201020d4cbd211be1303256c0621", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.4.0", "size": 37668, "upload_time": "2019-08-27T17:42:10", "url": "https://files.pythonhosted.org/packages/70/e3/77cabec39aa08d4c91fa018eaa6cd8cd365144d0188313f027a3a6a33688/charset_normalizer-0.1.7.tar.gz" } ], "0.1.8": [ { "comment_text": "", "digests": { "md5": "1d0ea407171f9960cab2359f7a1fcdc7", "sha256": "f830db9291cce51366fc669033629d1a7dfbb3dbd431798b0e592d9d429e72cc" }, "downloads": -1, "filename": "charset_normalizer-0.1.8.tar.gz", "has_sig": false, "md5_digest": "1d0ea407171f9960cab2359f7a1fcdc7", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.4.0", "size": 343664, "upload_time": "2019-08-28T10:36:08", "url": "https://files.pythonhosted.org/packages/dc/e2/c098aedba1ca959389a40de15baf22db072d29c19c04213435a379f54859/charset_normalizer-0.1.8.tar.gz" } ], "0.1a0": [ { "comment_text": "", "digests": { "md5": "fda35165d4cab813112ec19383271541", "sha256": "2e9474d6ea0730c9e6b691423823fcc0a012ab5281e4cf451d047ccca593e185" }, "downloads": -1, "filename": "charset_normalizer-0.1a0.tar.gz", "has_sig": false, "md5_digest": "fda35165d4cab813112ec19383271541", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.4.0", "size": 19073, "upload_time": "2019-08-02T16:03:53", "url": "https://files.pythonhosted.org/packages/7e/8d/faa8cc13b03896e65dcd0f67d56ef70f4ee9c14301f5ac4540c8aaaaf1aa/charset_normalizer-0.1a0.tar.gz" } ], "0.2.0": [ { "comment_text": "", "digests": { "md5": "db573fd7fac1bbb2fa382d887bb52691", "sha256": "434b06617f57bdb88b8a597967d1d087ba0294b85a8dcef5207b4992f4b38f23" }, "downloads": -1, "filename": "charset_normalizer-0.2.0.tar.gz", "has_sig": false, "md5_digest": "db573fd7fac1bbb2fa382d887bb52691", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.4.0", "size": 334951, "upload_time": "2019-08-31T16:02:14", "url": "https://files.pythonhosted.org/packages/d1/b6/981818a28689fcedf8fa5f51e2919731a417d0e7e30305fb7e1697135abb/charset_normalizer-0.2.0.tar.gz" } ], "0.2.1": [ { "comment_text": "", "digests": { "md5": "cfbcbd8068376f1c84c1da04b9b20118", "sha256": "7f1bd3c3f67bd1551f1371a82f53a8924d0d82fddfc58ffdd93639bc744f5a00" }, "downloads": -1, "filename": "charset_normalizer-0.2.1.tar.gz", "has_sig": false, "md5_digest": "cfbcbd8068376f1c84c1da04b9b20118", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.4.0", "size": 335192, "upload_time": "2019-09-03T20:10:58", "url": "https://files.pythonhosted.org/packages/fb/7c/cbdf18cf2c022c0be552810028dff9d992a8b50655101c5d29fbe765ee0d/charset_normalizer-0.2.1.tar.gz" } ], "0.2.2": [ { "comment_text": "", "digests": { "md5": "51e515b7500a883235e06d0b92a93cdc", "sha256": "b94e704202fb1edeb0775046f98233465f4f5654b4db91a220789fb2b3f7714e" }, "downloads": -1, "filename": "charset_normalizer-0.2.2.tar.gz", "has_sig": false, "md5_digest": "51e515b7500a883235e06d0b92a93cdc", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.4.0", "size": 336341, "upload_time": "2019-09-04T17:12:39", "url": "https://files.pythonhosted.org/packages/85/93/e7b0d12dbb8a1cb95d9784a11ff83f92fa01e5d1793cc39adac17bfae4e6/charset_normalizer-0.2.2.tar.gz" } ], "0.2.3": [ { "comment_text": "", "digests": { "md5": "448132cabc215b5ba80c34967d943082", "sha256": "d7d69887f824b34c750a2ae62094cb4f2d856a4c79153067273ad3e36136d172" }, "downloads": -1, "filename": "charset_normalizer-0.2.3.tar.gz", "has_sig": false, "md5_digest": "448132cabc215b5ba80c34967d943082", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.4.0", "size": 337656, "upload_time": "2019-09-06T20:18:33", "url": "https://files.pythonhosted.org/packages/57/04/1d7d583b1dfb19c6dec3acad8dad7d3cf3b50fcd330607a957ef4ec3ffbc/charset_normalizer-0.2.3.tar.gz" } ], "0.3.0": [ { "comment_text": "", "digests": { "md5": "ef96e193b0879e86bde1f6fb8ecca4cd", "sha256": "a51dbca96758edbb2cadf0b03fd52a0bdb090063851c84053b617763c346a8f3" }, "downloads": -1, "filename": "charset_normalizer-0.3.0.tar.gz", "has_sig": false, "md5_digest": "ef96e193b0879e86bde1f6fb8ecca4cd", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.5.0", "size": 339517, "upload_time": "2019-09-12T17:07:14", "url": "https://files.pythonhosted.org/packages/95/46/097469b432eccd421982be45b60fabbdb923b0ae0d8971732116ef01b234/charset_normalizer-0.3.0.tar.gz" } ], "1.0.0": [ { "comment_text": "", "digests": { "md5": "74ecc49185ab45b10f6efda76d561976", "sha256": "1d0bff871cfdc0d45402e0d4b776c0cb87271cacc648b990bc2d8eba83c4f70e" }, "downloads": -1, "filename": "charset_normalizer-1.0.0.tar.gz", "has_sig": false, "md5_digest": "74ecc49185ab45b10f6efda76d561976", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.5.0", "size": 341984, "upload_time": "2019-09-17T17:13:04", "url": "https://files.pythonhosted.org/packages/11/59/92a0165a32588f87f242344a4c58d2d188a8509b497d0b296120c21045ea/charset_normalizer-1.0.0.tar.gz" } ], "1.1.0": [ { "comment_text": "", "digests": { "md5": "90f1e25f274e8df6354d5ec67a6c3168", "sha256": "c0f1c7447a41c79fe8f267cb155d350af2c9f5e526c3b19d42f8c846ac06549f" }, "downloads": -1, "filename": "charset_normalizer-1.1.0.tar.gz", "has_sig": false, "md5_digest": "90f1e25f274e8df6354d5ec67a6c3168", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.5.0", "size": 342327, "upload_time": "2019-09-21T16:13:57", "url": "https://files.pythonhosted.org/packages/8f/93/0dfe9cb2c68e2f8cc13697d50aff67977156a8937dbecaa3dbe835868e88/charset_normalizer-1.1.0.tar.gz" } ], "1.1.1": [ { "comment_text": "", "digests": { "md5": "fcb30bcd6429d27393003bbd791a8edd", "sha256": "1537f9cc91b1875ab27dfdd91ec27491d0d003eaedc4b11de704c6a4f292cfd3" }, "downloads": -1, "filename": "charset_normalizer-1.1.1.tar.gz", "has_sig": false, "md5_digest": "fcb30bcd6429d27393003bbd791a8edd", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.5.0", "size": 342796, "upload_time": "2019-09-23T19:07:38", "url": "https://files.pythonhosted.org/packages/b7/c2/8976bc70a6d8869c91ba93c983cf088f92cb057e5a111b5e0495acf7f2f4/charset_normalizer-1.1.1.tar.gz" } ], "1.2.0": [ { "comment_text": "", "digests": { "md5": "7746d2d64f4b2b864939829ca0225da9", "sha256": "ceb0cd1be394b6cfc90a55da90d86e5c6724658cb13182165ff62e97f27640ab" }, "downloads": -1, "filename": "charset_normalizer-1.2.0.tar.gz", "has_sig": false, "md5_digest": "7746d2d64f4b2b864939829ca0225da9", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.5.0", "size": 393578, "upload_time": "2019-09-28T19:18:21", "url": "https://files.pythonhosted.org/packages/14/c7/4d2ab0289ee2d27830f139f8ac581844b8217fb20c2423c12793385025b3/charset_normalizer-1.2.0.tar.gz" } ], "1.3.0": [ { "comment_text": "", "digests": { "md5": "a8bf171846647aed2b40f52384843397", "sha256": "d9eacb91d83ca5f39df63be74183b42b14af7a37a6dc1a8b536ec522644cf555" }, "downloads": -1, "filename": "charset_normalizer-1.3.0.tar.gz", "has_sig": false, "md5_digest": "a8bf171846647aed2b40f52384843397", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.5.0", "size": 394671, "upload_time": "2019-09-30T18:20:35", "url": "https://files.pythonhosted.org/packages/a6/68/331ab9666a76ebb91aff855160fe89c8234e011dc295de75e3fc4f4eee03/charset_normalizer-1.3.0.tar.gz" } ], "1.3.1": [ { "comment_text": "", "digests": { "md5": "95427576e1fc13db870cafddcbc9e542", "sha256": "70b903da5a9329aa42487050def981be18ed6acc313fb7430514b2c4beb05f9e" }, "downloads": -1, "filename": "charset_normalizer-1.3.1.tar.gz", "has_sig": false, "md5_digest": "95427576e1fc13db870cafddcbc9e542", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.5.0", "size": 393311, "upload_time": "2019-10-11T17:31:46", "url": "https://files.pythonhosted.org/packages/ca/48/a6211e2eec832176837d08dfc1a02799705618eef389e9937ba7a4b0c38d/charset_normalizer-1.3.1.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "95427576e1fc13db870cafddcbc9e542", "sha256": "70b903da5a9329aa42487050def981be18ed6acc313fb7430514b2c4beb05f9e" }, "downloads": -1, "filename": "charset_normalizer-1.3.1.tar.gz", "has_sig": false, "md5_digest": "95427576e1fc13db870cafddcbc9e542", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.5.0", "size": 393311, "upload_time": "2019-10-11T17:31:46", "url": "https://files.pythonhosted.org/packages/ca/48/a6211e2eec832176837d08dfc1a02799705618eef389e9937ba7a4b0c38d/charset_normalizer-1.3.1.tar.gz" } ] }