{ "info": { "author": "Josh Carroll, Mark Dredze and R. Knowles", "author_email": "mdredze@cs.jhu.edu", "bugtrack_url": null, "classifiers": [], "description": "# Demographer\n### Extremely simple demographics from Twitter names\n### Authors: Josh Carroll , Mark Dredze , Rebecca Knowles \n##### Supports: Python 2.7+\n\n> **Demographer**: one who studies subjects including the geographical distribution of people, birth and death rates, socioeconomic status, and age and sex distributions in order to identify the influences on population growth, structure, and development. (Dictionary.com)\n\nDemographer is a Python package that identifies demographic characteristics based on a name. It's designed for Twitter, where it takes the name of the user and returns information about his or her likely demographics.\n\n###Why demographics?\nMany downstream applications that consume Twitter data benefit from knowing information about a user. Analyzing opinions and trends based on demographics is a cornerstone analysis common in many areas of social science. While some social media platforms provide demographics for users, Twitter does not.\n\n### Why only look at the name?\nThere are *many* papers that have proposed methods for demographic identification using what a user's posts or who they follow. These methods can be quite good, and likely do better than only looking at the name for many tasks. Why only look at the name? Often times you only have a single tweet for a user, and it's not feasible to get more data. In these settings, content analysis isn't feasible; one tweet isn't enough. Demographer is designed for these settings. We can produce a reasonable guess even if we only have a single tweet.\n\n### Name or Username?\nWe use the \"name\" field within the \"user\" field, rather than the \"username\" field. This is because even twitter users who have usernames wholly unrelated to their own names sometimes use their real names (or some part of their name, or something name-like) in the \"name\" field.\n\n### How good is it?\nOverall, the tool is quite good, possibly the best tool available for this task (see our paper cited below.) for specific names, it depends. Some names make it easy to identify some demographic characteristics, and some make it impossible. The distribution over these names changes by task, so its hard to give accuracy guarantees across datasets. The tool is focused on western names, so it may do poorly on Chinese or Arabic names. \n\n### Can I extend demographer?\nYes! We designed the package to be highly extensible. If you have new types of training data, or a different approach entirely, you should be able to add it to the package. You need to subclass `Demographer`.\n\n### What demographics does it include?\nThe current release includes gender based on US census data. You can often guess gender accurately as many names are almost exclusively used with one gender.\n\n### I want to learn more!\nTo find out more, check out our paper:\n\n Rebecca Knowles, Josh Carroll, Mark Dredze. Demographer: Extremely Simple Name Demographics. EMNLP Workshop on Natural Language Processing and Computational Social Science, 2016. [[PDF](http://aclweb.org/anthology/W16-5614)]\n\n\n### Gender.c\nWe also include code that (roughly) implements `gender.c`, which is described here: \n[http://www.heise.de/ct/ftp/07/17/182/](http://www.heise.de/ct/ftp/07/17/182/)\n\nThe code is available in `demographer/gender_c.py`. The data for `gender.c` is here:\n```\ndemographer/data/nam_dict.txt.gz\n```\nand is redistributed under a GNU Free Documentation License, Version 1.2.\n\n### Please cite our work\nIf you use demographer in a paper, you should cite it as follows:\n\n```\n@inproceedings{W16-5614,\n author = \t\"Knowles, Rebecca and Carroll, Josh and Dredze, Mark\",\n title = \t\"Demographer: Extremely Simple Name Demographics\",\n booktitle = \t\"EMNLP Workshop on NLP and Computational Social Science\",\n year = \t\"2016\",\n publisher = \t\"Association for Computational Linguistics\",\n pages = \t\"108--113\",\n location = \t\"Austin, Texas\",\n url = \t\"http://aclweb.org/anthology/W16-5614\"\n}\n```\n\n\n## Installation\nWith pip:\n\n```\npip install demographer\n```\nFrom source, you can use `setuptools`\n\n```\npython setup.py install\n```\n\nThis uses pre-built data files and classifiers. If you want to build your own data files and classifiers from scratch, run the script `get_data.sh`\n\n\n## Examples\n\n### API Access\n\nWe provide a simple API to demographer. Here is an example of how you annotate a single tweet. Note that this examples uses a sample tweet file distributed with the library (users installing via pip can download `faketweets.txt` from the `data` directory at the root of this repository).\n \n> from demographer import process_tweet\n> \n> import json\n> \n> tweet = json.loads(open('data/faketweets.txt').readline())\n> \n> result = process_tweet(tweet)\n\nThe first time you call `process_tweet` expect to wait a bit. This command loads the data to initialize the underlying demographers.\n\n`result` stores a list of dictionary objects, each corresponding to the output from one demographer. Here is an example of `result`:\n\n> [{u'gender': [{u'count': 1099936, u'prob': 0.9977160044772844, u'value': u'F'}, {u'count': 2518, u'prob': 0.002283995522715687, u'value': u'M'}]}]\n\nIn this example, the first dictionary in the list is the output for gender. It contains probability estimates (M)ale and (F)emale.\n\n### Command Line Access\nWe also provide a script that processes a file containing tweets and adds demographic information to each one. The input file contains tweets, each encoded in json, one object per line. The output file contains the same tweets with a new field `demographics` which contains the list from above.\n\n> python -m demographer.cli.process\\_tweets --input INPUT_FILE --output OUTPUT_FILE\n> ", "description_content_type": null, "docs_url": null, "download_url": "https://bitbucket.org/mdredze/demographer/get/0.1.3.zip", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://bitbucket.org/mdredze/demographer", "keywords": null, "license": "LICENSE.txt", "maintainer": null, "maintainer_email": null, "name": "demographer", "package_url": "https://pypi.org/project/demographer/", "platform": "UNKNOWN", "project_url": "https://pypi.org/project/demographer/", "project_urls": { "Download": "https://bitbucket.org/mdredze/demographer/get/0.1.3.zip", "Homepage": "https://bitbucket.org/mdredze/demographer" }, "release_url": "https://pypi.org/project/demographer/0.1.3/", "requires_dist": null, "requires_python": null, "summary": "Extremely simple name demographics for Twitter names", "version": "0.1.3" }, "last_serial": 2463310, "releases": { "0.1.0": [], "0.1.2": [ { "comment_text": "", "digests": { "md5": "b69702e93d387c6341269504e2d221e1", "sha256": "81439673729f0631418f92866d057426a8ac630b182e59e84e51d055514f6049" }, "downloads": -1, "filename": "demographer-0.1.2.tar.gz", "has_sig": false, "md5_digest": "b69702e93d387c6341269504e2d221e1", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 10406643, "upload_time": "2016-11-04T04:46:02", "url": "https://files.pythonhosted.org/packages/2c/ed/ac0ce7b1f900bbf9cd188356a3b2efb288a9dd864ef8f7cd51f987f8f7f7/demographer-0.1.2.tar.gz" } ], "0.1.3": [ { "comment_text": "", "digests": { "md5": "ad10c28beb2a6ed4ab943da14b0c05b1", "sha256": "b0439ef6689f87c5086c9f58c2a73a5d4200b6b98553a8617fab552e51873e8b" }, "downloads": -1, "filename": "demographer-0.1.3.tar.gz", "has_sig": false, "md5_digest": "ad10c28beb2a6ed4ab943da14b0c05b1", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 10406670, "upload_time": "2016-11-16T03:22:16", "url": "https://files.pythonhosted.org/packages/c5/8a/83d7d05caa7bb919a393efab3480b28f639d27321bd963e1c126905dc037/demographer-0.1.3.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "ad10c28beb2a6ed4ab943da14b0c05b1", "sha256": "b0439ef6689f87c5086c9f58c2a73a5d4200b6b98553a8617fab552e51873e8b" }, "downloads": -1, "filename": "demographer-0.1.3.tar.gz", "has_sig": false, "md5_digest": "ad10c28beb2a6ed4ab943da14b0c05b1", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 10406670, "upload_time": "2016-11-16T03:22:16", "url": "https://files.pythonhosted.org/packages/c5/8a/83d7d05caa7bb919a393efab3480b28f639d27321bd963e1c126905dc037/demographer-0.1.3.tar.gz" } ] }