{ "info": { "author": "Ahmed TAHRI @Ousret", "author_email": "ahmed.tahri@cloudnursery.dev", "bugtrack_url": null, "classifiers": [ "Intended Audience :: Developers", "License :: OSI Approved :: MIT License", "Operating System :: OS Independent", "Programming Language :: Python", "Programming Language :: Python :: 3", "Programming Language :: Python :: 3.5", "Programming Language :: Python :: 3.6", "Programming Language :: Python :: 3.7", "Programming Language :: Python :: 3.8", "Programming Language :: Python :: Implementation :: PyPy", "Topic :: Software Development :: Libraries :: Python Modules", "Topic :: Text Processing :: Linguistic", "Topic :: Utilities" ], "description": "
\n The Real First Universal Charset Detector
\n \n \n \n
\n \n
\n \n \n
\n \n \n
\n \n \n
\n \n \n
\n \n
\n
\n >>>>> \u2764\ufe0f Try Me Online Now, Then Adopt Me \u2764\ufe0f <<<<<\n
\n\nThis project offers you an alternative to **Universal Charset Encoding Detector**, also known as **Chardet**.\n\n| Feature | [Chardet](https://github.com/chardet/chardet) | Charset Normalizer | [cChardet](https://github.com/PyYoshi/cChardet) |\n| ------------- | :-------------: | :------------------: | :------------------: |\n| `Fast` | \u274c\n
\n\n*\\*\\* : They are clearly using specific code for a specific encoding even if covering most of used one*
\n\n## Your support\n\nPlease \u2b50 this repository if this project helped you!\n\n## \u2728 Installation\n\nUsing PyPi\n```sh\npip install charset_normalizer\n```\n\n## \ud83d\ude80 Basic Usage\n\n### CLI\nThis package comes with a CLI\n\n```\nusage: normalizer [-h] [--verbose] [--normalize] [--replace] [--force]\n file [file ...]\n```\n\n```bash\nnormalizer ./data/sample.1.fr.srt\n```\n\n```\n+----------------------+----------+----------+------------------------------------+-------+-----------+\n| Filename | Encoding | Language | Alphabets | Chaos | Coherence |\n+----------------------+----------+----------+------------------------------------+-------+-----------+\n| data/sample.1.fr.srt | cp1252 | French | Basic Latin and Latin-1 Supplement | 0.0 % | 84.924 % |\n+----------------------+----------+----------+------------------------------------+-------+-----------+\n```\n\n### Python\n*Just print out normalized text*\n```python\nfrom charset_normalizer import CharsetNormalizerMatches as CnM\nprint(CnM.from_path('./my_subtitle.srt').best().first())\n```\n\n*Normalize any text file*\n```python\nfrom charset_normalizer import CharsetNormalizerMatches as CnM\ntry:\n CnM.normalize('./my_subtitle.srt') # should write to disk my_subtitle-***.srt\nexcept IOError as e:\n print('Sadly, we are unable to perform charset normalization.', str(e))\n```\n\n*Upgrade your code without effort*\n```python\nfrom charset_normalizer import detect\n```\n\nThe above code will behave the same as **chardet**.\n\nSee the docs for advanced usage : [readthedocs.io](https://charset-normalizer.readthedocs.io/en/latest/)\n\n## \ud83d\ude07 Why\n\nWhen I started using Chardet, I noticed that it was unreliable nowadays and also\nit's unmaintained, and most likely will never be.\n\nI **don't care** about the **originating charset** encoding, because **two different tables** can\nproduce **two identical files.**\nWhat I want is to get readable text, the best I can. \n\nIn a way, **I'm brute forcing text decoding.** How cool is that ? \ud83d\ude0e\n\nDon't confuse package **ftfy** with charset-normalizer or chardet. ftfy goal is to repair unicode string whereas charset-normalizer to convert raw file in unknown encoding to unicode.\n\n## \ud83c\udf70 How\n\n - Discard all charset encoding table that could not fit the binary content.\n - Measure chaos, or the mess once opened with a corresponding charset encoding.\n - Extract matches with the lowest mess detected.\n - Finally, if there is too much match left, we measure coherence.\n\n**Wait a minute**, what is chaos/mess and coherence according to **YOU ?**\n\n*Chaos :* I opened hundred of text files, **written by humans**, with the wrong encoding table. **I observed**, then\n**I established** some ground rules about **what is obvious** when **it seems like** a mess.\n I know that my interpretation of what is chaotic is very subjective, feel free to contribute in order to\n improve or rewrite it.\n\n*Coherence :* For each language there is on earth, we have computed ranked letter appearance occurrences (the best we can). So I thought\nthat intel is worth something here. So I use those records against decoded text to check if I can detect intelligent design.\n\n## \u26a1 Known limitations\n\n - Not intended to work on non (human) speakable language text content. eg. crypted text.\n - Language detection is unreliable when text contains two or more languages sharing identical letters.\n - Not well tested with tiny content.\n\n## \ud83d\udc64 Contributing\n\nContributions, issues and feature requests are very much welcome.
\nFeel free to check [issues page](https://github.com/ousret/charset_normalizer/issues) if you want to contribute.\n\n## \ud83d\udcdd License\n\nCopyright \u00a9 2019 [Ahmed TAHRI @Ousret](https://github.com/Ousret).
\nThis project is [MIT](https://github.com/Ousret/charset_normalizer/blob/master/LICENSE) licensed.\n\nLetter appearances frequencies used in this project \u00a9 2012 [Denny Vrande\u010di\u0107](http://denny.vrandecic.de)",
"description_content_type": "text/markdown",
"docs_url": null,
"download_url": "",
"downloads": {
"last_day": -1,
"last_month": -1,
"last_week": -1
},
"home_page": "https://github.com/ousret/charset_normalizer",
"keywords": "encoding,i18n,txt,text,charset,charset-detector,normalization,unicode,chardet",
"license": "MIT",
"maintainer": "",
"maintainer_email": "",
"name": "charset-normalizer",
"package_url": "https://pypi.org/project/charset-normalizer/",
"platform": "",
"project_url": "https://pypi.org/project/charset-normalizer/",
"project_urls": {
"Homepage": "https://github.com/ousret/charset_normalizer"
},
"release_url": "https://pypi.org/project/charset-normalizer/1.3.1/",
"requires_dist": null,
"requires_python": ">=3.5.0",
"summary": "The Real First Universal Charset Detector. No Cpp Bindings, Using Voodoo and Magical Artifacts.",
"version": "1.3.1"
},
"last_serial": 5961372,
"releases": {
"0.1.1a0": [
{
"comment_text": "",
"digests": {
"md5": "5df64817b06caa5063e228b58b22dd89",
"sha256": "a230d9d0c39ea5f23325e69ef60c52b1a563f74c06673cb1ecdd7ce41089da03"
},
"downloads": -1,
"filename": "charset_normalizer-0.1.1a0.tar.gz",
"has_sig": false,
"md5_digest": "5df64817b06caa5063e228b58b22dd89",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.4.0",
"size": 19747,
"upload_time": "2019-08-03T15:54:08",
"url": "https://files.pythonhosted.org/packages/bc/d2/19888cb9cd17d269aa3a95859a136f42d6755a691ad1b563caae709129c7/charset_normalizer-0.1.1a0.tar.gz"
}
],
"0.1.2b0": [
{
"comment_text": "",
"digests": {
"md5": "505111a101390ebfa882b61bda082dd2",
"sha256": "2e57d67d55af976be3e5e11fb1dc5a4b02e5e10fed0e0746bbe9de76dc0aba9b"
},
"downloads": -1,
"filename": "charset_normalizer-0.1.2b0.tar.gz",
"has_sig": false,
"md5_digest": "505111a101390ebfa882b61bda082dd2",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.6.0",
"size": 35557,
"upload_time": "2019-08-07T21:51:58",
"url": "https://files.pythonhosted.org/packages/c1/74/a14be7cec44a9b04c4ee0e2be4486fad7a2a1ef09726a081af5528ec559d/charset_normalizer-0.1.2b0.tar.gz"
}
],
"0.1.4b0": [
{
"comment_text": "",
"digests": {
"md5": "ad46440fab8bfac75f6e099f67d7d841",
"sha256": "42dfc3e9ae1a9680394938e412a004f11ddfb9f544a064ac5a733f4d1307f308"
},
"downloads": -1,
"filename": "charset_normalizer-0.1.4b0.tar.gz",
"has_sig": false,
"md5_digest": "ad46440fab8bfac75f6e099f67d7d841",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.4.0",
"size": 35145,
"upload_time": "2019-08-08T20:21:57",
"url": "https://files.pythonhosted.org/packages/d2/bc/bb704adfbc4ea947ea20b3cba487108527ead6ba77c0acdfd7fa808a1df3/charset_normalizer-0.1.4b0.tar.gz"
}
],
"0.1.5b0": [
{
"comment_text": "",
"digests": {
"md5": "02c5d1f3e53779d15a943a3930bad794",
"sha256": "72ee724392aeeaebac8eb2a79c3fcba2677efdbecf5b4873d7fd2e8181c32d00"
},
"downloads": -1,
"filename": "charset_normalizer-0.1.5b0.tar.gz",
"has_sig": false,
"md5_digest": "02c5d1f3e53779d15a943a3930bad794",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.4.0",
"size": 35115,
"upload_time": "2019-08-08T20:48:36",
"url": "https://files.pythonhosted.org/packages/64/d2/61e1ec31b452d28156c5fc1d44bfd9701b555f4e9b9820344d1d281c793a/charset_normalizer-0.1.5b0.tar.gz"
}
],
"0.1.7": [
{
"comment_text": "",
"digests": {
"md5": "46f1201020d4cbd211be1303256c0621",
"sha256": "fed1bb228f058a50e5f59789b25bd960778719019d1adb8740954e4b077f7776"
},
"downloads": -1,
"filename": "charset_normalizer-0.1.7.tar.gz",
"has_sig": false,
"md5_digest": "46f1201020d4cbd211be1303256c0621",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.4.0",
"size": 37668,
"upload_time": "2019-08-27T17:42:10",
"url": "https://files.pythonhosted.org/packages/70/e3/77cabec39aa08d4c91fa018eaa6cd8cd365144d0188313f027a3a6a33688/charset_normalizer-0.1.7.tar.gz"
}
],
"0.1.8": [
{
"comment_text": "",
"digests": {
"md5": "1d0ea407171f9960cab2359f7a1fcdc7",
"sha256": "f830db9291cce51366fc669033629d1a7dfbb3dbd431798b0e592d9d429e72cc"
},
"downloads": -1,
"filename": "charset_normalizer-0.1.8.tar.gz",
"has_sig": false,
"md5_digest": "1d0ea407171f9960cab2359f7a1fcdc7",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.4.0",
"size": 343664,
"upload_time": "2019-08-28T10:36:08",
"url": "https://files.pythonhosted.org/packages/dc/e2/c098aedba1ca959389a40de15baf22db072d29c19c04213435a379f54859/charset_normalizer-0.1.8.tar.gz"
}
],
"0.1a0": [
{
"comment_text": "",
"digests": {
"md5": "fda35165d4cab813112ec19383271541",
"sha256": "2e9474d6ea0730c9e6b691423823fcc0a012ab5281e4cf451d047ccca593e185"
},
"downloads": -1,
"filename": "charset_normalizer-0.1a0.tar.gz",
"has_sig": false,
"md5_digest": "fda35165d4cab813112ec19383271541",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.4.0",
"size": 19073,
"upload_time": "2019-08-02T16:03:53",
"url": "https://files.pythonhosted.org/packages/7e/8d/faa8cc13b03896e65dcd0f67d56ef70f4ee9c14301f5ac4540c8aaaaf1aa/charset_normalizer-0.1a0.tar.gz"
}
],
"0.2.0": [
{
"comment_text": "",
"digests": {
"md5": "db573fd7fac1bbb2fa382d887bb52691",
"sha256": "434b06617f57bdb88b8a597967d1d087ba0294b85a8dcef5207b4992f4b38f23"
},
"downloads": -1,
"filename": "charset_normalizer-0.2.0.tar.gz",
"has_sig": false,
"md5_digest": "db573fd7fac1bbb2fa382d887bb52691",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.4.0",
"size": 334951,
"upload_time": "2019-08-31T16:02:14",
"url": "https://files.pythonhosted.org/packages/d1/b6/981818a28689fcedf8fa5f51e2919731a417d0e7e30305fb7e1697135abb/charset_normalizer-0.2.0.tar.gz"
}
],
"0.2.1": [
{
"comment_text": "",
"digests": {
"md5": "cfbcbd8068376f1c84c1da04b9b20118",
"sha256": "7f1bd3c3f67bd1551f1371a82f53a8924d0d82fddfc58ffdd93639bc744f5a00"
},
"downloads": -1,
"filename": "charset_normalizer-0.2.1.tar.gz",
"has_sig": false,
"md5_digest": "cfbcbd8068376f1c84c1da04b9b20118",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.4.0",
"size": 335192,
"upload_time": "2019-09-03T20:10:58",
"url": "https://files.pythonhosted.org/packages/fb/7c/cbdf18cf2c022c0be552810028dff9d992a8b50655101c5d29fbe765ee0d/charset_normalizer-0.2.1.tar.gz"
}
],
"0.2.2": [
{
"comment_text": "",
"digests": {
"md5": "51e515b7500a883235e06d0b92a93cdc",
"sha256": "b94e704202fb1edeb0775046f98233465f4f5654b4db91a220789fb2b3f7714e"
},
"downloads": -1,
"filename": "charset_normalizer-0.2.2.tar.gz",
"has_sig": false,
"md5_digest": "51e515b7500a883235e06d0b92a93cdc",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.4.0",
"size": 336341,
"upload_time": "2019-09-04T17:12:39",
"url": "https://files.pythonhosted.org/packages/85/93/e7b0d12dbb8a1cb95d9784a11ff83f92fa01e5d1793cc39adac17bfae4e6/charset_normalizer-0.2.2.tar.gz"
}
],
"0.2.3": [
{
"comment_text": "",
"digests": {
"md5": "448132cabc215b5ba80c34967d943082",
"sha256": "d7d69887f824b34c750a2ae62094cb4f2d856a4c79153067273ad3e36136d172"
},
"downloads": -1,
"filename": "charset_normalizer-0.2.3.tar.gz",
"has_sig": false,
"md5_digest": "448132cabc215b5ba80c34967d943082",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.4.0",
"size": 337656,
"upload_time": "2019-09-06T20:18:33",
"url": "https://files.pythonhosted.org/packages/57/04/1d7d583b1dfb19c6dec3acad8dad7d3cf3b50fcd330607a957ef4ec3ffbc/charset_normalizer-0.2.3.tar.gz"
}
],
"0.3.0": [
{
"comment_text": "",
"digests": {
"md5": "ef96e193b0879e86bde1f6fb8ecca4cd",
"sha256": "a51dbca96758edbb2cadf0b03fd52a0bdb090063851c84053b617763c346a8f3"
},
"downloads": -1,
"filename": "charset_normalizer-0.3.0.tar.gz",
"has_sig": false,
"md5_digest": "ef96e193b0879e86bde1f6fb8ecca4cd",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.5.0",
"size": 339517,
"upload_time": "2019-09-12T17:07:14",
"url": "https://files.pythonhosted.org/packages/95/46/097469b432eccd421982be45b60fabbdb923b0ae0d8971732116ef01b234/charset_normalizer-0.3.0.tar.gz"
}
],
"1.0.0": [
{
"comment_text": "",
"digests": {
"md5": "74ecc49185ab45b10f6efda76d561976",
"sha256": "1d0bff871cfdc0d45402e0d4b776c0cb87271cacc648b990bc2d8eba83c4f70e"
},
"downloads": -1,
"filename": "charset_normalizer-1.0.0.tar.gz",
"has_sig": false,
"md5_digest": "74ecc49185ab45b10f6efda76d561976",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.5.0",
"size": 341984,
"upload_time": "2019-09-17T17:13:04",
"url": "https://files.pythonhosted.org/packages/11/59/92a0165a32588f87f242344a4c58d2d188a8509b497d0b296120c21045ea/charset_normalizer-1.0.0.tar.gz"
}
],
"1.1.0": [
{
"comment_text": "",
"digests": {
"md5": "90f1e25f274e8df6354d5ec67a6c3168",
"sha256": "c0f1c7447a41c79fe8f267cb155d350af2c9f5e526c3b19d42f8c846ac06549f"
},
"downloads": -1,
"filename": "charset_normalizer-1.1.0.tar.gz",
"has_sig": false,
"md5_digest": "90f1e25f274e8df6354d5ec67a6c3168",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.5.0",
"size": 342327,
"upload_time": "2019-09-21T16:13:57",
"url": "https://files.pythonhosted.org/packages/8f/93/0dfe9cb2c68e2f8cc13697d50aff67977156a8937dbecaa3dbe835868e88/charset_normalizer-1.1.0.tar.gz"
}
],
"1.1.1": [
{
"comment_text": "",
"digests": {
"md5": "fcb30bcd6429d27393003bbd791a8edd",
"sha256": "1537f9cc91b1875ab27dfdd91ec27491d0d003eaedc4b11de704c6a4f292cfd3"
},
"downloads": -1,
"filename": "charset_normalizer-1.1.1.tar.gz",
"has_sig": false,
"md5_digest": "fcb30bcd6429d27393003bbd791a8edd",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.5.0",
"size": 342796,
"upload_time": "2019-09-23T19:07:38",
"url": "https://files.pythonhosted.org/packages/b7/c2/8976bc70a6d8869c91ba93c983cf088f92cb057e5a111b5e0495acf7f2f4/charset_normalizer-1.1.1.tar.gz"
}
],
"1.2.0": [
{
"comment_text": "",
"digests": {
"md5": "7746d2d64f4b2b864939829ca0225da9",
"sha256": "ceb0cd1be394b6cfc90a55da90d86e5c6724658cb13182165ff62e97f27640ab"
},
"downloads": -1,
"filename": "charset_normalizer-1.2.0.tar.gz",
"has_sig": false,
"md5_digest": "7746d2d64f4b2b864939829ca0225da9",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.5.0",
"size": 393578,
"upload_time": "2019-09-28T19:18:21",
"url": "https://files.pythonhosted.org/packages/14/c7/4d2ab0289ee2d27830f139f8ac581844b8217fb20c2423c12793385025b3/charset_normalizer-1.2.0.tar.gz"
}
],
"1.3.0": [
{
"comment_text": "",
"digests": {
"md5": "a8bf171846647aed2b40f52384843397",
"sha256": "d9eacb91d83ca5f39df63be74183b42b14af7a37a6dc1a8b536ec522644cf555"
},
"downloads": -1,
"filename": "charset_normalizer-1.3.0.tar.gz",
"has_sig": false,
"md5_digest": "a8bf171846647aed2b40f52384843397",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.5.0",
"size": 394671,
"upload_time": "2019-09-30T18:20:35",
"url": "https://files.pythonhosted.org/packages/a6/68/331ab9666a76ebb91aff855160fe89c8234e011dc295de75e3fc4f4eee03/charset_normalizer-1.3.0.tar.gz"
}
],
"1.3.1": [
{
"comment_text": "",
"digests": {
"md5": "95427576e1fc13db870cafddcbc9e542",
"sha256": "70b903da5a9329aa42487050def981be18ed6acc313fb7430514b2c4beb05f9e"
},
"downloads": -1,
"filename": "charset_normalizer-1.3.1.tar.gz",
"has_sig": false,
"md5_digest": "95427576e1fc13db870cafddcbc9e542",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.5.0",
"size": 393311,
"upload_time": "2019-10-11T17:31:46",
"url": "https://files.pythonhosted.org/packages/ca/48/a6211e2eec832176837d08dfc1a02799705618eef389e9937ba7a4b0c38d/charset_normalizer-1.3.1.tar.gz"
}
]
},
"urls": [
{
"comment_text": "",
"digests": {
"md5": "95427576e1fc13db870cafddcbc9e542",
"sha256": "70b903da5a9329aa42487050def981be18ed6acc313fb7430514b2c4beb05f9e"
},
"downloads": -1,
"filename": "charset_normalizer-1.3.1.tar.gz",
"has_sig": false,
"md5_digest": "95427576e1fc13db870cafddcbc9e542",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.5.0",
"size": 393311,
"upload_time": "2019-10-11T17:31:46",
"url": "https://files.pythonhosted.org/packages/ca/48/a6211e2eec832176837d08dfc1a02799705618eef389e9937ba7a4b0c38d/charset_normalizer-1.3.1.tar.gz"
}
]
}