{ "info": { "author": "Jonathan Soma", "author_email": "jonathan.soma@gmail.com", "bugtrack_url": null, "classifiers": [ "Intended Audience :: Developers", "Intended Audience :: End Users/Desktop", "Intended Audience :: Science/Research", "Natural Language :: English", "Operating System :: OS Independent", "Programming Language :: Python :: 3.7", "Topic :: Scientific/Engineering :: Information Analysis", "Topic :: Utilities" ], "description": "# fuzzy_pandas\n\nA razor-thin layer over [csvmatch](https://github.com/maxharlow/csvmatch/) that allows you to do fuzzy mathing with pandas dataframes.\n\n## Installation\n\n```\npip install fuzzy_pandas\n```\n\n## Usage\n\nTo borrow 100% from the [original repo](https://github.com/maxharlow/csvmatch), say you have one CSV file such as:\n\n```\nname,location,codename\nGeorge Smiley,London,Beggerman\nPercy Alleline,London,Tinker\nRoy Bland,London,Soldier\nToby Esterhase,Vienna,Poorman\nPeter Guillam,Brixton,none\nBill Haydon,London,Tailor\nOliver Lacon,London,none\nJim Prideaux,Slovakia,none\nConnie Sachs,Oxford,none\n```\n\nAnd another such as:\n\n```\nPerson Name,Location\nMaria Andreyevna Ostrakova,Russia\nOtto Leipzig,Estonia\nGeorge SMILEY,London\nPeter Guillam,Brixton\nKonny Saks,Oxford\nSaul Enderby,London\nSam Collins,Vietnam\nTony Esterhase,Vienna\nClaus Kretzschmar,Hamburg\n```\n\nYou can then find which names are in both files:\n\n```python\nimport pandas as pd\nimport fuzzy_pandas as fpd\n\ndf1 = pd.read_csv(\"data1.csv\")\ndf2 = pd.read_csv(\"data2.csv\")\n\nmatches = fpd.fuzzy_merge(df1, df2,\n left_on=['name'],\n right_on=['Person Name'],\n ignore_case=True,\n keep='match')\n\nprint(matches)\n```\n\n|.|name|Person Name|\n|---|---|---|\n|0|George Smiley|George SMILEY|\n|1|Peter Guillam|Peter Guillam|\n\n### Options\n\nDumping this out of the code itself, apologies for lack of pretty formatting.\n\n* **left** : DataFrame\n* **right** : DataFrame\n - Object to merge left with\n* **on** : str or list\n - Column names to compare. These must be found in both DataFrames.\n* **left_on** : str or list\n - Column names to compare in the left DataFrame.\n* **right_on** : str or list\n - Column names to compare in the right DataFrame.\n* **left_cols** : list, default None\n - List of columns to preserve from the left DataFrame.\n - Defaults to `left_on`.\n* **right_cols** : list, default None\n - List of columns to preserve from the right DataFrame. \n - Defaults to `right_on`.\n* **method** : str or list, default 'exact'\n - Perform a fuzzy match, and an optional specified algorithm.\n - Multiple algorithms can be specified which will apply to each field\n respectively.\n - Options:\n * **exact**: exact matches\n * **levenshtein**: string distance metric\n * **jaro**: string distance metric\n * **metaphone**: phoenetic matching algorithm\n * **bilenko**: prompts for matches\n* **threshold** : float or list, default `0.6`\n - The threshold for a fuzzy match as a number between 0 and 1. Multiple numbers will be applied to each field respectively.\n* **ignore_case** : bool, default False\n - Ignore case (default is case-sensitive)\n* **ignore_nonalpha** : bool, default False\n - Ignore non-alphanumeric characters\n* **ignore_nonlatin** : bool, default False\n - Ignore characters from non-latin alphabets. Accented characters are compared to their unaccented equivalent\n* **ignore_order_words** : bool, default False\n - Ignore the order words are given in\n* **ignore_order_letters** : bool, default False\n - Ignore the order the letters are given in, regardless of word order\n* **ignore_titles** : bool, default False\n - Ignore a predefined list of name titles (such as Mr, Ms, etc)\n* **join** : { 'inner', 'left-outer', 'right-outer', 'full-outer' }\n```\n\nFor more how-to information, check out [the examples folder](https://github.com/jsoma/fuzzy_pandas/tree/master/examples) or the [the original repo](https://github.com/maxharlow/csvmatch).\n\n", "description_content_type": "text/markdown", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "http://github.com/jsoma/fuzzy_pandas", "keywords": "fuzzy matching pandas", "license": "MIT", "maintainer": "", "maintainer_email": "", "name": "fuzzy-pandas", "package_url": "https://pypi.org/project/fuzzy-pandas/", "platform": "", "project_url": "https://pypi.org/project/fuzzy-pandas/", "project_urls": { "Homepage": "http://github.com/jsoma/fuzzy_pandas" }, "release_url": "https://pypi.org/project/fuzzy-pandas/0.1/", "requires_dist": [ "pandas", "csvmatch" ], "requires_python": "", "summary": "Fuzzy matching in pandas using csvmatch", "version": "0.1" }, "last_serial": 5635868, "releases": { "0.1": [ { "comment_text": "", "digests": { "md5": "4bc903e326cc9755d30a417b03787bbe", "sha256": "d4454dac4c9cf2e6d8fe9245eebcdde9312d52e4caea814b79a901ef6255558d" }, "downloads": -1, "filename": "fuzzy_pandas-0.1-py3-none-any.whl", "has_sig": false, "md5_digest": "4bc903e326cc9755d30a417b03787bbe", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 5206, "upload_time": "2019-08-05T19:04:23", "url": "https://files.pythonhosted.org/packages/dc/70/c9de848dea88eb02ecb7b4993d188789fe35857f5b3b6dc616d957c55769/fuzzy_pandas-0.1-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "70b0e9aa8b147a283bbcf0e2e7f6b61f", "sha256": "a9ffdfb327829d11dc4a15e3b049dcdebd7622771cdd33bad6adb56b848fced3" }, "downloads": -1, "filename": "fuzzy_pandas-0.1.tar.gz", "has_sig": false, "md5_digest": "70b0e9aa8b147a283bbcf0e2e7f6b61f", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 3946, "upload_time": "2019-08-05T19:04:25", "url": "https://files.pythonhosted.org/packages/37/1c/e0e1ea616ff1d09a33b53915258dd5e4cf586aed6237358e3312a5c90be6/fuzzy_pandas-0.1.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "4bc903e326cc9755d30a417b03787bbe", "sha256": "d4454dac4c9cf2e6d8fe9245eebcdde9312d52e4caea814b79a901ef6255558d" }, "downloads": -1, "filename": "fuzzy_pandas-0.1-py3-none-any.whl", "has_sig": false, "md5_digest": "4bc903e326cc9755d30a417b03787bbe", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 5206, "upload_time": "2019-08-05T19:04:23", "url": "https://files.pythonhosted.org/packages/dc/70/c9de848dea88eb02ecb7b4993d188789fe35857f5b3b6dc616d957c55769/fuzzy_pandas-0.1-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "70b0e9aa8b147a283bbcf0e2e7f6b61f", "sha256": "a9ffdfb327829d11dc4a15e3b049dcdebd7622771cdd33bad6adb56b848fced3" }, "downloads": -1, "filename": "fuzzy_pandas-0.1.tar.gz", "has_sig": false, "md5_digest": "70b0e9aa8b147a283bbcf0e2e7f6b61f", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 3946, "upload_time": "2019-08-05T19:04:25", "url": "https://files.pythonhosted.org/packages/37/1c/e0e1ea616ff1d09a33b53915258dd5e4cf586aed6237358e3312a5c90be6/fuzzy_pandas-0.1.tar.gz" } ] }