{ "info": { "author": "Ivan Begtin", "author_email": "ivan@begtin.tech", "bugtrack_url": null, "classifiers": [ "Development Status :: 2 - Pre-Alpha", "Intended Audience :: Developers", "License :: OSI Approved :: BSD License", "Natural Language :: English", "Programming Language :: Python :: 3", "Programming Language :: Python :: 3.3", "Programming Language :: Python :: 3.4", "Programming Language :: Python :: 3.5", "Programming Language :: Python :: 3.6", "Programming Language :: Python :: Implementation :: CPython", "Programming Language :: Python :: Implementation :: PyPy" ], "description": "# Russian Names\n\n\n`russiannames` is a Python 3 library dedicated to parse Russian names, surnames and midnames, identify person gender by fullname and how name is written. It uses MongoDB as backend to speed-up name parsing.\n\n\n\n## Documentation\n\nDocumentation is built automatically and can be found on\nhttps://russiannames.readthedocs.org/en/latest/\n\n## Installation\n\nTo install Python library use `pip install russiannames` via pip or `python setup.py install` \n\nTo use database you need MongoDB instance. \nUnpack db_data_bson.zip file from https://github.com/datacoon/russiannames/blob/master/data/bson/db_dump_bson.zip\n\nand use `mongorestore` command to restore `names` database with 3 collections: names, surnames and midnames\n\n## Features\n\nDatabase of names used for identification\n\n* 375449 surnames - collection: surnames\n* 32134 first names - collection: names\n* 48274 midnames - collection: midnames\n\nDetailed database statistics by gender and collection\n\n| collection| total | males|females|universal or unidentified |\n| --- | --- | --- | --- | --- |\n| names | 32134 | 19297 | 8278 | 1196 |\n| midnames | 48274 | 30114 | 16143 | 0 |\n| surnames | 375274 | 124662 | 111534 | 38827 |\n\n\nSupports 12 formats of Russian full names writing style\n\n| Format | Example | Description |\n| ------ | -------------- | ------------ |\n| f | \u041e\u043b\u044c\u0433\u0430 | only first name |\n| s | \u041f\u0435\u0442\u0440\u043e\u0432 | only surname |\n| Fs | \u041e. \u0421\u0438\u0434\u043e\u0440\u043e\u0432\u0430 | first letter of first name and full surname |\n| sF | \u041d\u0438\u043a\u043e\u043b\u0430\u0435\u0432 \u0421. | full surname and first letter of surname |\n| sf | \u0410\u0431\u0440\u0430\u043c\u043e\u0432 \u0421\u0435\u043c\u0435\u043d | full surname and full first name |\n| fs | \u0421\u043e\u043d\u044f \u041a\u0430\u043c\u0438\u0443\u043b\u043b\u0438\u043d\u0430 | full first name and full surname |\n| fm | \u0418\u0432\u0430\u043d \u041f\u0435\u0442\u0440\u043e\u0432\u0438\u0447 | full first name and full middlename |\n| SFM | \u041c.\u0414.\u041c. | first letters of surname, first name, middlename |\n| FMs | \u0410.\u041d. \u0415\u0433\u043e\u0440\u043e\u0432\u0430 | first letters of first and middle name and full furname |\n| sFM | \u041d\u0438\u043a\u043e\u043b\u0430\u0435\u043d\u043a\u043e \u0421.\u041f. | full surname and first letters of first and middle names |\n| sfM | \u041f\u0435\u0442\u0440\u0430\u043a\u043e\u0432\u0430 \u0417\u0438\u043d\u0430\u0438\u0434\u0430 \u041c. | full surname, first name and first letter of middle name |\n| sfm | \u041a\u0430\u0437\u0430\u043a\u043e\u0432 \u0420\u0438\u043d\u0430\u0442 \u0410\u0440\u0442\u0443\u0440\u043e\u0432\u0438\u0447 | full name as surname, first name and middle name |\n| fms | \u0421\u0432\u0435\u0442\u043b\u0430\u043d\u0430 \u0410\u0440\u0445\u0438\u043f\u043e\u0432\u043d\u0430 \u0412\u043e\u043b\u043a\u043e\u0432\u0430 | full name as first name, middle name and surname |\n\n\nSupports names with following ethnics identification\n\n9 ethnic types in names, surnames and middle names supported\n\n| key | name (en) | name (rus)\n| ---- | --------- | ----------\n| arab | Arabic | \u0410\u0440\u0430\u0431\u0441\u043a\u043e\u0435\n| arm | Armenian | \u0410\u0440\u043c\u044f\u043d\u0441\u043a\u043e\u0435\n| geor | Georgian | \u0413\u0440\u0443\u0437\u0438\u043d\u0441\u043a\u043e\u0435\n| germ | German | \u041d\u0435\u043c\u0435\u0446\u043a\u0438\u0435\n| greek | Greek | \u0413\u0440\u0435\u0447\u0435\u0441\u043a\u0438\u0435\n| jew | Jew | \u0415\u0432\u0440\u0435\u0439\u0441\u043a\u0438\u0435\n| polsk | Polish | \u041f\u043e\u043b\u044c\u0441\u043a\u0438\u0435\n| slav | Slavic (Russian) | \u0421\u043b\u0430\u0432\u044f\u043d\u0441\u043a\u0438\u0435\n| tur | Turkic | \u0422\u044e\u0440\u043a\u0441\u043a\u0438\u0435 (\u0442\u044e\u0440\u043a\u043e\u044f\u0437\u044b\u0447\u043d\u044b\u0435)\n\n\n## Limitations\n\n* very rare names, surnames or middlenames could be not parsed \n* ethnic identification is still on early stage\n\n\n## Speed optimization\n\n* preconfigured and preindexed MongoDb collections used\n\n\n## Usage and Examples\n\n### Parse name and identify gender\n\nParses names and returns: format, surname, first name, middle name, parsed (True/False) and gender \n\n >>> from russiannames.parser import NamesParser\n >>> parser = NamesParser()\n >>> parser.parse('\u041d\u0438\u0433\u043c\u0430\u0442\u0443\u043b\u043b\u0438\u043d \u0420\u0438\u043d\u0430\u0442 \u0410\u0445\u043c\u0435\u0442\u043e\u0432\u0438\u0447')\n {'format': 'sfm', 'sn': '\u041d\u0438\u0433\u043c\u0430\u0442\u0443\u043b\u043b\u0438\u043d', 'fn': '\u0420\u0438\u043d\u0430\u0442', 'mn': '\u0410\u0445\u043c\u0435\u0442\u043e\u0432\u0438\u0447', 'gender': 'm', 'text': '\u041d\u0438\u0433\u043c\u0430\u0442\u0443\u043b\u043b\u0438\u043d \u0420\u0438\u043d\u0430\u0442 \u0410\u0445\u043c\u0435\u0442\u043e\u0432\u0438\u0447', 'parsed': True}\n >>> parser.parse('\u041f\u0435\u0442\u0440\u043e\u0432\u0430 C.\u042f.')\n {'format': 'sFM', 'sn': '\u041f\u0435\u0442\u0440\u043e\u0432\u0430', 'fn_s': 'C', 'mn_s': '\u042f', 'gender': 'f', 'text': '\u041f\u0435\u0442\u0440\u043e\u0432\u0430 C.\u042f.', 'parsed': True}\n\nGender field could have one of following values:\n\n* m: Male\n* f: Female\n* u: Unknown / unidentified\n* -: Impossible to identify\n \n### Ethnic identification (experimental)\nParses surname, first name and middle name and tries to identify person ethic affilation of the person\n\n >>> from russiannames.parser import NamesParser\n >>> parser = NamesParser()\n >>> parser.classify('\u041d\u0438\u0433\u043c\u0430\u0442\u0443\u043b\u043b\u0438\u043d', '\u0420\u0438\u043d\u0430\u0442', '\u0410\u0445\u043c\u0435\u0442\u043e\u0432\u0438\u0447')\n {'ethnics': ['tur'], 'gender': 'm'}\n >>> parser.classify('\u0410\u043b\u0435\u043a\u0441\u0435\u0435\u0432\u0430', '\u041e\u043b\u044c\u0433\u0430', '\u0418\u0432\u0430\u043d\u043e\u0432\u043d\u0430')\n {'ethnics': ['slav'], 'gender': 'f'}\n\n\n\n\n## Supported languages\n* Russian\n\n\n\n## Requirements\n* pymongo\n* click\n\n\n## Acknowledgements", "description_content_type": "", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/datacoon/russiannames", "keywords": "russian names surnames gender", "license": "BSD", "maintainer": "", "maintainer_email": "", "name": "russiannames", "package_url": "https://pypi.org/project/russiannames/", "platform": "", "project_url": "https://pypi.org/project/russiannames/", "project_urls": { "Homepage": "https://github.com/datacoon/russiannames" }, "release_url": "https://pypi.org/project/russiannames/1.0/", "requires_dist": null, "requires_python": "", "summary": "Russian names parser, gender identification and processing tools", "version": "1.0" }, "last_serial": 4891226, "releases": { "1.0": [ { "comment_text": "", "digests": { "md5": "2fc3f936c5a6d1a66f1b37d2409c1006", "sha256": "1c38c91bac064535d318e593a55b9b8eb1c54f184c13d2e1516b88033db0bbae" }, "downloads": -1, "filename": "russiannames-1.0.tar.gz", "has_sig": false, "md5_digest": "2fc3f936c5a6d1a66f1b37d2409c1006", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 12746, "upload_time": "2019-03-03T16:10:52", "url": "https://files.pythonhosted.org/packages/01/e9/d89e9b3f6e7dbbbb0fa565c197ac024ea86fc741805b2e0c9572e677779c/russiannames-1.0.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "2fc3f936c5a6d1a66f1b37d2409c1006", "sha256": "1c38c91bac064535d318e593a55b9b8eb1c54f184c13d2e1516b88033db0bbae" }, "downloads": -1, "filename": "russiannames-1.0.tar.gz", "has_sig": false, "md5_digest": "2fc3f936c5a6d1a66f1b37d2409c1006", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 12746, "upload_time": "2019-03-03T16:10:52", "url": "https://files.pythonhosted.org/packages/01/e9/d89e9b3f6e7dbbbb0fa565c197ac024ea86fc741805b2e0c9572e677779c/russiannames-1.0.tar.gz" } ] }