{ "info": { "author": "Joo-Won Jung", "author_email": "sanori@gmail.com", "bugtrack_url": null, "classifiers": [ "Development Status :: 3 - Alpha", "License :: OSI Approved :: Apache Software License", "Programming Language :: Python :: 2", "Programming Language :: Python :: 3" ], "description": "UnZip for non-UTF8 encoding\r\n===========================\r\n\r\nExtract zip files that MBCS(multi-byte character set) encoded file\r\nnames, such as ZIP files created in MS Windows, especially East Asian\r\nenvironment.\r\n\r\nMajor non-UTF8 encodings by languages: \\* Korean: cp949, euc-kr \\*\r\nJapanese: sjis (shift\\_jis), cp932, euc-jp \\* Chinese: gbk, gb18030,\r\ngb2312, cp936, hkscs, big5, cp950\r\n\r\nInstall\r\n-------\r\n\r\n::\r\n\r\n pip install unzipmbcs\r\n\r\nCLI Usage\r\n---------\r\n\r\n::\r\n\r\n usage: unzipmbcs [-h] [-e ENCODING] cmd zipfile [target [target ...]]\r\n\r\n unzip for non-UTF8 filenames in zip archive\r\n\r\n positional arguments:\r\n cmd commands: l(list), x(extract)\r\n zipfile .zip file to unzip\r\n target file prefix to extract\r\n\r\n optional arguments:\r\n -h, --help show this help message and exit\r\n -e ENCODING, --encoding ENCODING\r\n character encoding of filename in the .zip\r\n\r\nAPI\r\n---\r\n\r\nlistZip(filename, encoding='utf-8')\r\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\r\n\r\nReturn the information of the files in zip archive ``filename`` with\r\ncharacter ``encoding``\r\n\r\nextractZip(filename, encoding='utf-8', filters=None)\r\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\r\n\r\nExtract files in zip archive ``filename`` on current directory. Assume\r\nthat the file names in zip archive are encoded as ``encoding``. Only the\r\nfiles prefixed the values of ``filters`` list are extracted if\r\n``filters`` are provided.\r\n\r\nfixZipFilename(filename, enc)\r\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\r\n\r\nFix ``filename`` as UNICODE string which is originally encoded as\r\n``enc``. Works for both Python 2 and 3.\r\n\r\nMotivation\r\n----------\r\n\r\nThe .ZIP format, PKZIP compression, have been widely used. Some valuable\r\ndata are archived as .zip file. But, in non-ASCII, non-Western\r\nenvironment, it makes trouble due to filenames.\r\n\r\nSince ZIP format was created too old (1993), there is no standard\r\ncharacter encoding about the file name of zip archive entries. Most of\r\nzip file entries are encoded as legacy character encoding, local\r\ncharset.\r\n\r\nIn modern UNICODE based environment or global data processing\r\nenvironment such as Linux, this makes inconvinience, less portability,\r\nmangled file names, fail to extract the file, and so on.\r\n\r\nThis module may mitigate the inconviniences.", "description_content_type": null, "docs_url": null, "download_url": "UNKNOWN", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/sanori/unzip-mbcs", "keywords": "unzip,pkzip,non-UTF8,mbcs,cp949,sjis,shift_jis,gbk,gb18030", "license": "UNKNOWN", "maintainer": null, "maintainer_email": null, "name": "unzipmbcs", "package_url": "https://pypi.org/project/unzipmbcs/", "platform": "UNKNOWN", "project_url": "https://pypi.org/project/unzipmbcs/", "project_urls": { "Download": "UNKNOWN", "Homepage": "https://github.com/sanori/unzip-mbcs" }, "release_url": "https://pypi.org/project/unzipmbcs/0.1.1/", "requires_dist": null, "requires_python": null, "summary": "UnZip for non-UTF8 encoding such as cp949, sjis, gbk, euc-kr, euc-jp, and gb2312", "version": "0.1.1" }, "last_serial": 2380331, "releases": { "0.1.1": [] }, "urls": [] }