{ "info": { "author": "dchaplinsky", "author_email": "chaplinsky.dmitry@gmail.com", "bugtrack_url": null, "classifiers": [ "Development Status :: 4 - Beta", "Intended Audience :: Developers", "License :: OSI Approved :: MIT License", "Natural Language :: English", "Natural Language :: Russian", "Natural Language :: Ukrainian", "Programming Language :: Python :: 3", "Programming Language :: Python :: 3.3", "Programming Language :: Python :: 3.4", "Programming Language :: Python :: 3.5", "Programming Language :: Python :: 3.6", "Programming Language :: Python :: Implementation :: PyPy", "Topic :: Text Processing :: General", "Topic :: Text Processing :: Indexing", "Topic :: Text Processing :: Linguistic" ], "description": "comparator\n================\n\nSimple tool to compare almost similar names which are coming **from the same source** (for example list of all company owners and officers of that company). Helps to cluster together persons with a slight difference in name spelling/typos. Better suited for Cyrillic names, but should work everywhere.\n\nIt's heuristic based and as any algorithm of such a nature it **make errors** sometime (see Accuracy section below for details). \n\nInstallation\n==================================\nInstall from PyPI.\n\n.. code-block:: bash\n\n $ pip install names_comparator\n\nAccuracy\n==================================\n\nYou can test your installation on a ground truth data and check confusion matrix by running:\n\n.. code-block:: bash\n\n $ python comparator/__init__.py comparator/data/ground_truth.csv\n\t+--------------------+----------+----------+\n\t| | Positive | Negative |\n\t+--------------------+----------+----------+\n\t| Predicted positive | 290 | 9 |\n\t| Predicted negative | 14 | 687 |\n\t+--------------------+----------+----------+\n\tPrecision: 0.97\n\tRecall: 0.95\n\tF1 score: 0.96\n\nYou can also run it with debug flag to see all the errors algorithm made:\n\n.. code-block:: bash\n\n $ python comparator/__init__.py comparator/data/ground_truth.csv yes\n\nUsage\n==================================\n\n.. code-block:: python\n\n >>> from comparator import full_compare\n >>> full_compare(\"Barack Hussein Obama\", \"Obama, Barak\")\n True\n >>> full_compare(\"\u041f\u0435\u0442\u0440\u043e \u041c\u0430\u0437\u0435\u043f\u0430\", \"\u041c\u0430\u0437\u0435\u043f\u0430 \u041f\u0435\u0442\u0440\u043e\")\n True\n >>> full_compare(\"\u041c\u0430\u0440\u0447\u0435\u043d\u043a\u043e \u041f\u0435\u0442\u0440\u043e \u041c\u0438\u043a\u043e\u043b\u0430\u0439\u043e\u0432\u0438\u0447\", \"\u041f\u0430\u043d\u0447\u0435\u043d\u043a\u043e \u041f\u0435\u0442\u0440\u043e \u041c\u0438\u043a\u043e\u043b\u0430\u0439\u043e\u0432\u0438\u0447\")\n False\n >>> full_compare(\"\u041e\u0432\u0434\u0456\u0454\u043d\u043a\u043e \u0421\u0435\u0440\u0433\u0456\u0439 \u041a\u043e\u0441\u0442\u0430\u043d\u0442\u0438\u043d\u043e\u0432\u0438\u0447\", \"\u041e\u0432\u0434\u0456\u0454\u043d\u043a\u043e \u0421\u0435\u0440\u0433\u0456\u0439 \u041a\u043e\u0441\u0442\u044f\u043d\u0442\u0438\u043d\u043e\u0432\u0438\u0447\")\n True\n >>> full_compare(\"\u0406\u0432\u0430\u043d\u043e\u0432 \u041c\u0438\u0445\u0430\u0439\u043b\u043e \u042e\u0440\u0456\u0439\u043e\u0432\u0438\u0447\", \"\u0406\u0432\u0430\u043d\u043e\u0432 \u042e\u0440\u0456\u0439 \u041c\u0438\u0445\u0430\u0439\u043b\u043e\u0432\u0438\u0447\")\n False\n", "description_content_type": "", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/dchaplinsky/comparator", "keywords": "names fuzzy comparison levenshtein", "license": "MIT", "maintainer": "", "maintainer_email": "", "name": "names_comparator", "package_url": "https://pypi.org/project/names_comparator/", "platform": "", "project_url": "https://pypi.org/project/names_comparator/", "project_urls": { "Homepage": "https://github.com/dchaplinsky/comparator" }, "release_url": "https://pypi.org/project/names_comparator/1.0.0/", "requires_dist": null, "requires_python": "", "summary": "Simple tool to compare almost similar names which are coming from the same source (for example list of all company owners and officers of that company). Helps to cluster together persons with a slight difference in name spelling/typos. Better suited for Cyrillic names, but should work everywhere", "version": "1.0.0" }, "last_serial": 5896580, "releases": { "1.0.0": [ { "comment_text": "", "digests": { "md5": "ff4c3af4a4c38a588932078663663ffe", "sha256": "b4c036215cae08efd07d110b84e01d75b5815ec0cbadde88fc495904b5b7c9b8" }, "downloads": -1, "filename": "names_comparator-1.0.0.tar.gz", "has_sig": false, "md5_digest": "ff4c3af4a4c38a588932078663663ffe", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 30462, "upload_time": "2019-09-27T14:56:11", "url": "https://files.pythonhosted.org/packages/71/06/d29a804841b19961339ca6c2d86ae85208d83f5fe2c93ff612097e4a0b80/names_comparator-1.0.0.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "ff4c3af4a4c38a588932078663663ffe", "sha256": "b4c036215cae08efd07d110b84e01d75b5815ec0cbadde88fc495904b5b7c9b8" }, "downloads": -1, "filename": "names_comparator-1.0.0.tar.gz", "has_sig": false, "md5_digest": "ff4c3af4a4c38a588932078663663ffe", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 30462, "upload_time": "2019-09-27T14:56:11", "url": "https://files.pythonhosted.org/packages/71/06/d29a804841b19961339ca6c2d86ae85208d83f5fe2c93ff612097e4a0b80/names_comparator-1.0.0.tar.gz" } ] }