{ "info": { "author": "Sidharth Mudgal, Han Li", "author_email": "uwmagellan@gmail.com", "bugtrack_url": null, "classifiers": [], "description": "DeepMatcher\n=============\n\n.. image:: https://travis-ci.org/anhaidgroup/deepmatcher.svg?branch=master\n :target: https://travis-ci.org/anhaidgroup/deepmatcher\n\n.. image:: https://img.shields.io/badge/License-BSD%203--Clause-blue.svg\n :target: https://opensource.org/licenses/BSD-3-Clause\n\nDeepMatcher is a Python package for performing entity and text matching using deep learning.\nIt provides built-in neural networks and utilities that enable you to train and apply\nstate-of-the-art deep learning models for entity matching in less than 10 lines of code.\nThe models are also easily customizable - the modular design allows any subcomponent to be\naltered or swapped out for a custom implementation.\n\nAs an example, given labeled tuple pairs such as the following:\n\n.. image:: https://raw.githubusercontent.com/anhaidgroup/deepmatcher/master/docs/source/_static/match_input_ex.png\n\nDeepMatcher uses labeled tuple pairs and trains a neural network to perform matching, i.e., to\npredict match / non-match labels. The trained network can then be used to obtain labels for\nunlabeled tuple pairs.\n\nPaper and Data\n****************\n\nFor details on the architecture of the models used, take a look at our paper `Deep\nLearning for Entity Matching`_ (SIGMOD '18). All public datasets used in\nthe paper can be downloaded from the `datasets page `__.\n\nQuick Start: DeepMatcher in 30 seconds\n******************************************\n\nThere are four main steps in using DeepMatcher:\n\n1. Data processing: Load and process labeled training, validation and test CSV data.\n\n.. code-block:: python\n\n import deepmatcher as dm\n train, validation, test = dm.data.process(path='data_directory',\n train='train.csv', validation='validation.csv', test='test.csv')\n\n2. Model definition: Specify neural network architecture. Uses the built-in hybrid\n model (as discussed in section 4.4 of `our paper\n `__) by default. Can\n be customized to your heart's desire.\n\n.. code-block:: python\n\n model = dm.MatchingModel()\n\n3. Model training: Train neural network.\n\n.. code-block:: python\n\n model.run_train(train, validation, best_save_path='best_model.pth')\n\n4. Application: Evaluate model on test set and apply to unlabeled data.\n\n.. code-block:: python\n\n model.run_eval(test)\n\n unlabeled = dm.data.process_unlabeled(path='data_directory/unlabeled.csv', trained_model=model)\n model.run_prediction(unlabeled)\n\nInstallation\n**************\n\nWe currently support only Python versions 3.5 and 3.6. Installing using pip is recommended:\n\n.. code-block::\n\n pip install deepmatcher\n\nNote that during installation you may see an error message that says \"Failed building wheel for fasttextmirror\". You can safely ignore this - it does NOT mean that there are any problems with installation.\n\nTutorials\n**********\n\n**Using DeepMatcher:**\n\n1. `Getting Started`_: A more in-depth guide to help you get familiar with the basics of\n using DeepMatcher.\n2. `Data Processing`_: Advanced guide on what data processing involves and how to\n customize it.\n3. `Matching Models`_: Advanced guide on neural network architecture for entity matching\n and how to customize it.\n\n**Entity Matching Workflow:**\n\n`End to End Entity Matching`_: A guide to develop a complete entity\nmatching workflow. The tutorial discusses how to use DeepMatcher with `Magellan`_ to\nperform blocking, sampling, labeling and matching to obtain matching tuple pairs from two\ntables.\n\n**DeepMatcher for other matching tasks:**\n\n`Question Answering with DeepMatcher`_: A tutorial on how to use DeepMatcher for question\nanswering. Specifically, we will look at `WikiQA`_, a benchmark dataset for the task of\nAnswer Selection.\n\nAPI Reference\n***************\n\nAPI docs `are here`_.\n\nSupport\n**********\n\nThis package is under active development. If you run into any issues or have questions,\nplease `file GitHub issues`_.\n\nThe Team\n**********\n\nDeepMatcher was developed by University of Wisconsin-Madison grad students Sidharth Mudgal\nand Han Li, under the supervision of Prof. AnHai Doan and Prof. Theodoros Rekatsinas.\n\n.. _`Deep Learning for Entity Matching`: http://pages.cs.wisc.edu/~anhai/papers1/deepmatcher-sigmod18.pdf\n.. _`Prof. AnHai Doan's data repository`: https://sites.google.com/site/anhaidgroup/useful-stuff/data\n.. _`Magellan`: https://sites.google.com/site/anhaidgroup/projects/magellan\n.. _`Getting Started`: https://nbviewer.jupyter.org/github/anhaidgroup/deepmatcher/blob/master/examples/getting_started.ipynb\n.. _`Data Processing`: https://nbviewer.jupyter.org/github/anhaidgroup/deepmatcher/blob/master/examples/data_processing.ipynb\n.. _`Matching Models`: https://nbviewer.jupyter.org/github/anhaidgroup/deepmatcher/blob/master/examples/matching_models.ipynb\n.. _`End to End Entity Matching`: https://nbviewer.jupyter.org/github/anhaidgroup/deepmatcher/blob/master/examples/end_to_end_em.ipynb\n.. _`are here`: https://anhaidgroup.github.io/deepmatcher/html/\n.. _`Question Answering with DeepMatcher`: https://nbviewer.jupyter.org/github/anhaidgroup/deepmatcher/blob/master/examples/question_answering.ipynb\n.. _`WikiQA`: https://aclweb.org/anthology/D15-1237\n.. _`file GitHub issues`: https://github.com/anhaidgroup/deepmatcher/issues", "description_content_type": "", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "http://deepmatcher.ml", "keywords": "", "license": "BSD", "maintainer": "", "maintainer_email": "", "name": "deepmatcher", "package_url": "https://pypi.org/project/deepmatcher/", "platform": "", "project_url": "https://pypi.org/project/deepmatcher/", "project_urls": { "Homepage": "http://deepmatcher.ml" }, "release_url": "https://pypi.org/project/deepmatcher/0.1.0.post1/", "requires_dist": null, "requires_python": ">=3.5", "summary": "A deep learning package for entity matching", "version": "0.1.0.post1" }, "last_serial": 4041647, "releases": { "0.0.1a2": [ { "comment_text": "", "digests": { "md5": "195bd151e3155a18b49f0fcadeeab283", "sha256": "ef319ca202d78f8f50e525c8e6bf83ecce22e46f9b920b5eb27ea649af425c81" }, "downloads": -1, "filename": "deepmatcher-0.0.1a2.tar.gz", "has_sig": false, "md5_digest": "195bd151e3155a18b49f0fcadeeab283", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 33995, "upload_time": "2018-04-24T20:15:51", "url": "https://files.pythonhosted.org/packages/1d/cc/326b9eb7e6fde17234cea109b7703a3719bb216b0f4b6604d3e828b2402f/deepmatcher-0.0.1a2.tar.gz" } ], "0.1.0.post1": [ { "comment_text": "", "digests": { "md5": "c15cee57cff1f004dabbef43c26a09c9", "sha256": "198b4a52a8aaf9f7f0a47582df2368b511533076cc1b81ce26aec8ca34a70da4" }, "downloads": -1, "filename": "deepmatcher-0.1.0.post1.tar.gz", "has_sig": false, "md5_digest": "c15cee57cff1f004dabbef43c26a09c9", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.5", "size": 51948, "upload_time": "2018-07-08T21:54:27", "url": "https://files.pythonhosted.org/packages/56/79/0b5d108fc2c8d4ac9d0917315a85578a07e797efc5bb05d5654185c86d3e/deepmatcher-0.1.0.post1.tar.gz" } ], "0.1.0rc1": [ { "comment_text": "", "digests": { "md5": "353333d0ef51647395ee4ee196f14b57", "sha256": "a20885324ca3d25f5919f2f687d6ecf116f5cef056acf3c5d3f68ebd249a8d89" }, "downloads": -1, "filename": "deepmatcher-0.1.0rc1.tar.gz", "has_sig": false, "md5_digest": "353333d0ef51647395ee4ee196f14b57", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 51051, "upload_time": "2018-06-06T02:40:50", "url": "https://files.pythonhosted.org/packages/a1/fb/b8f554b9d71cc497a0d7e69ac66b8611bfe6e29e0af8e838b21b1d6700d3/deepmatcher-0.1.0rc1.tar.gz" } ], "0.1.0rc2": [ { "comment_text": "", "digests": { "md5": "cf8cabef313967ae4379b9ff1c3e009a", "sha256": "8a76c754b9bd76e252093c915bbef59f844c92bf0cb99d99111ff7c5686f2a38" }, "downloads": -1, "filename": "deepmatcher-0.1.0rc2.tar.gz", "has_sig": false, "md5_digest": "cf8cabef313967ae4379b9ff1c3e009a", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.5", "size": 51350, "upload_time": "2018-07-04T18:34:18", "url": "https://files.pythonhosted.org/packages/83/03/2095f3d030a8bc09db1f3596dda4efd6a67f941175e951d2942c37aa23f4/deepmatcher-0.1.0rc2.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "c15cee57cff1f004dabbef43c26a09c9", "sha256": "198b4a52a8aaf9f7f0a47582df2368b511533076cc1b81ce26aec8ca34a70da4" }, "downloads": -1, "filename": "deepmatcher-0.1.0.post1.tar.gz", "has_sig": false, "md5_digest": "c15cee57cff1f004dabbef43c26a09c9", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.5", "size": 51948, "upload_time": "2018-07-08T21:54:27", "url": "https://files.pythonhosted.org/packages/56/79/0b5d108fc2c8d4ac9d0917315a85578a07e797efc5bb05d5654185c86d3e/deepmatcher-0.1.0.post1.tar.gz" } ] }