{ "info": { "author": "Mustafa Atik", "author_email": "muatik@gmail.com", "bugtrack_url": null, "classifiers": [], "description": "Genderizer\n======================\n\nGenderizer is a language independent module which tries to detect gender by looking given first names and/or analyzing sample texts. \n\n##Installation\nYou can install this package using the following ```pip``` command:\n\n```sh\n$ sudo pip install genderizer\n```\n\n\n##Examples\n\n```python\nfrom genderizer.genderizer import Genderizer\n\nprint Genderizer.detect(firstName = 'John')\n# >>> male\n\nprint Genderizer.detect(firstName = 'Marry')\n# >>> female\n\nprint Genderizer.detect(firstName = 'mustafa')\n# >>> male\n\nprint Genderizer.detect(firstName = 'fatma')\n# >>> female\n\nprint Genderizer.detect(firstName = 'fikret')\n# >>> None\n\nprint Genderizer.detect(firstName = 'fikret', text='galatasary ma\u00e7\u0131 ka\u00e7maz')\n# >>> male\n\nprint Genderizer.detect(firstName = 'fikret', text='annemlerle yemek yedik')\n# >>> female\n\nprint Genderizer.detect(text='askerlik yoklamas\u0131n\u0131 ka\u00e7\u0131rd\u0131m mk')\n# >>> male\n\nprint Genderizer.detect(text='bana \u00e7i\u00e7ek alan erkek i\u00e7in can\u0131m feda')\n# >>> female\n\nprint Genderizer.detect(firstName='fatma', text='askerlik yoklamas\u0131n\u0131 ka\u00e7\u0131rd\u0131m mk')\n# >>> female\n\nprint Genderizer.detect(text='futbol sevgi')\n# >>> None\n\nprint Genderizer.detect(text='lan bi siktir git')\n# >>> male\n\n```\n***Note***: You may claim that women can say ```lan bi siktir git```, of course. But the probability of being female is less than the probability of being male according the trained data of the classifier.\n\nBy the way, in Turkish saying 'lan bi siktir git' makes you quite rude.\n\n\n## How It Works\nGenderizer is a module which tries to detect gender by looking given first names and/or analyzing sample text of a person. \n\nIf a first name is definitely used for only one gender, the system will accept this gender and will not make any further analysis. For example, while 'Mustafa', 'Osman', 'Hasan' is used in Turkish only for male; 'Fatma', 'Ceyda', 'Elif' only for female.\n\nWhen looking at first names does not infer any gender for sure, the system will make text analysis if it is given. For example; 'Ekim', 'Meric', or 'T\u00fcmay' is used for both male and female.\n\nThe text analysis is the classification of sample texts. It simply try to compute the probability of being male or female mining the sample text. In this system Naive Bayesian Classification is adopted and [naiveBayesClassifier][1] is used.\n\n## How To Improve It\nTODO: write a step by step guide\n\n##Preparing Language Dependent Training Sets\nTODO: give a few examples\n\n## Customization and Optimization\nTODO: Tell what options are there and when to choose\n```python\nfrom genderizer.memcachedNamesCollection import MemcachedNamesCollection\nfrom genderizer.mongoNamesCollection import MongoNamesCollection\n\nMongoNamesCollection.mongodbURL = 'mongodb://192.168.1.170'\nGenderizer.init(\n lang='tr',\n namesCollection=MongoNamesCollection\n)\n\n#For memcached, you need to setup a memcached server.\nMemcachedNamesCollection.memcacheHost = '127.0.0.1:11211'\nGenderizer.init(\n lang='tr',\n namesCollection=MemcachedNamesCollection\n)\n\nGenderizer.init(\n lang='tr',\n namesCollection=NamesCollection,\n classifier=Classifier(CachedModel.get('tr'), tokenizer)\n)\n\nprint Genderizer.detect(firstName = 'fikret', text='annem')\n\n```\n\n## TODO\n* inline docs\n* unit-tests\n\n## AUTHORS\n* Mustafa Atik [@muatik][2]\n* Nejdet Yucesoy @nejdetckenobi\n\n\n [1]: https://github.com/muatik/naive-bayes-classifier\n [2]: https://twitter.com/muatik2", "description_content_type": null, "docs_url": null, "download_url": "UNKNOWN", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/muatik/genderizer", "keywords": null, "license": "MIT", "maintainer": null, "maintainer_email": null, "name": "genderizer", "package_url": "https://pypi.org/project/genderizer/", "platform": "any", "project_url": "https://pypi.org/project/genderizer/", "project_urls": { "Download": "UNKNOWN", "Homepage": "https://github.com/muatik/genderizer" }, "release_url": "https://pypi.org/project/genderizer/0.1.2.3/", "requires_dist": null, "requires_python": null, "summary": "Genderizer tries to infer gender information looking at first name and/or making text analysis", "version": "0.1.2.3" }, "last_serial": 1063867, "releases": { "0.1.1": [], "0.1.2.3": [ { "comment_text": "", "digests": { "md5": "9a7f1c6ca76990c24caf9dbd04e5faf1", "sha256": "d003c4c0742988d159e17547c554d6c581d5fdf57d34df50bd1985a50bbe9f48" }, "downloads": -1, "filename": "genderizer-0.1.2.3.tar.gz", "has_sig": false, "md5_digest": "9a7f1c6ca76990c24caf9dbd04e5faf1", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 9430249, "upload_time": "2014-04-17T20:26:28", "url": "https://files.pythonhosted.org/packages/64/25/bd9c6b986ead941ed23a81f300506b73d07fc71e479bbbf70d865725f387/genderizer-0.1.2.3.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "9a7f1c6ca76990c24caf9dbd04e5faf1", "sha256": "d003c4c0742988d159e17547c554d6c581d5fdf57d34df50bd1985a50bbe9f48" }, "downloads": -1, "filename": "genderizer-0.1.2.3.tar.gz", "has_sig": false, "md5_digest": "9a7f1c6ca76990c24caf9dbd04e5faf1", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 9430249, "upload_time": "2014-04-17T20:26:28", "url": "https://files.pythonhosted.org/packages/64/25/bd9c6b986ead941ed23a81f300506b73d07fc71e479bbbf70d865725f387/genderizer-0.1.2.3.tar.gz" } ] }