{ "info": { "author": "Dirko Coetsee", "author_email": "dpcoetsee@gmail.com", "bugtrack_url": null, "classifiers": [ "Intended Audience :: Science/Research", "License :: OSI Approved :: BSD License", "Operating System :: OS Independent", "Programming Language :: Python", "Topic :: Scientific/Engineering" ], "description": "pyhacrf\n=======\n\nHidden alignment conditional random field for classifying string pairs -\na learnable edit distance.\n\nThis package aims to implement the HACRF machine learning model with a\n``sklearn``-like interface. It includes ways to fit a model to training\nexamples and score new example.\n\nThe model takes string pairs as input and classify them into any number\nof classes. In McCallum's original paper the model was applied to the\ndatabase deduplication problem. Each database entry was paired with\nevery other entry and the model then classified whether the pair was a\n'match' or a 'mismatch' based on training examples of matches and\nmismatches.\n\nI also tried to use it as learnable string edit distance for normalizing\nnoisy text. See *A Conditional Random Field for Discriminatively-trained\nFinite-state String Edit Distance* by McCallum, Bellare, and Pereira,\nand the report *Conditional Random Fields for Noisy text normalisation*\nby Dirko Coetsee.\n\nExample\n-------\n\n.. code:: python\n\n from pyhacrf import StringPairFeatureExtractor, Hacrf\n\n training_X = [('helloooo', 'hello'), # Matching examples\n ('h0me', 'home'),\n ('krazii', 'crazy'),\n ('non matching string example', 'no really'), # Non-matching examples\n ('and another one', 'yep')]\n training_y = ['match',\n 'match',\n 'match',\n 'non-match',\n 'non-match']\n\n # Extract features\n feature_extractor = StringPairFeatureExtractor(match=True, numeric=True)\n training_X_extracted = feature_extractor.fit_transform(training_X)\n\n # Train model\n model = Hacrf(l2_regularization=1.0)\n model.fit(training_X_extracted, training_y)\n\n # Evaluate\n from sklearn.metrics import confusion_matrix\n predictions = model.predict(training_X_extracted)\n\n print(confusion_matrix(training_y, predictions))\n > [[0 3]\n > [2 0]]\n\n print(model.predict_proba(training_X_extracted))\n > [[ 0.94914812 0.05085188]\n > [ 0.92397711 0.07602289]\n > [ 0.86756034 0.13243966]\n > [ 0.05438812 0.94561188]\n > [ 0.02641275 0.97358725]]\n\nDependencies\n------------\n\nThis package depends on ``numpy``. The LBFGS optimizer in ``pylbfgs`` is\nused, but alternative optimizers can be passed.\n\nInstall\n-------\n\nInstall by running:\n\n::\n\n python setup.py install\n\nor from pypi:\n\n::\n\n pip install pyhacrf\n\nDeveloping\n----------\nClone from repository, then\n\n::\n\n pip install -r requirements-dev.txt\n cython pyhacrf/*.pyx\n python setup.py install\n\nTo deploy to pypi, make sure you have compiled the \\*.pyx files to \\*.c", "description_content_type": null, "docs_url": null, "download_url": "https://github.com/dirko/pyhacrf/tarball/0.1.2", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/dirko/pyhacrf", "keywords": null, "license": "BSD", "maintainer": null, "maintainer_email": null, "name": "pyhacrf", "package_url": "https://pypi.org/project/pyhacrf/", "platform": "UNKNOWN", "project_url": "https://pypi.org/project/pyhacrf/", "project_urls": { "Download": "https://github.com/dirko/pyhacrf/tarball/0.1.2", "Homepage": "https://github.com/dirko/pyhacrf" }, "release_url": "https://pypi.org/project/pyhacrf/0.1.2/", "requires_dist": null, "requires_python": null, "summary": "Hidden alignment conditional random field, a discriminative string edit distance", "version": "0.1.2" }, "last_serial": 1731996, "releases": { "0.0.10": [ { "comment_text": "", "digests": { "md5": "08f40088f0a7a554d5df115e43a08d2d", "sha256": "ea51cdfe989d3e59919e23837eec8fdc35f798c31f7f3feca23e9a215c010b29" }, "downloads": -1, "filename": "pyhacrf-0.0.10.tar.gz", "has_sig": false, "md5_digest": "08f40088f0a7a554d5df115e43a08d2d", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 76883, "upload_time": "2015-08-02T11:12:18", "url": "https://files.pythonhosted.org/packages/91/5b/2789e87e12c5c51be24a8191c58eaf386b23ac61a93b3d5e6a4fc99b2cf8/pyhacrf-0.0.10.tar.gz" } ], "0.0.11": [ { "comment_text": "", "digests": { "md5": "4fb83ad32ab214d0d9d860bd951c786f", "sha256": "b39ef775c871950acad828243b6e26dbb5b6d37913ffdca4388d541b32e8a0b7" }, "downloads": -1, "filename": "pyhacrf-0.0.11.tar.gz", "has_sig": false, "md5_digest": "4fb83ad32ab214d0d9d860bd951c786f", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 76894, "upload_time": "2015-08-04T18:40:38", "url": "https://files.pythonhosted.org/packages/4a/18/ec17e7c3a7d2a3d557a9cec9d26f3db8c5bf36891c959fc4bfdf40ca5423/pyhacrf-0.0.11.tar.gz" } ], "0.0.12": [ { "comment_text": "", "digests": { "md5": "b24cc6156f8a1edee30e32ff69362e66", "sha256": "fa9391e274246b8a5f70a7f3b36c1b59093d2310a4bd87c343a0a2cfdd2a932c" }, "downloads": -1, "filename": "pyhacrf-0.0.12.tar.gz", "has_sig": false, "md5_digest": "b24cc6156f8a1edee30e32ff69362e66", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 134091, "upload_time": "2015-08-18T16:46:19", "url": "https://files.pythonhosted.org/packages/5d/ae/bcd2a0f5cc38f0ac96e162d52b91ffac664666cf8706c420ecd02ef5e2ec/pyhacrf-0.0.12.tar.gz" } ], "0.0.2": [ { "comment_text": "", "digests": { "md5": "22a9e0d904f7403e40f09b6a42809cac", "sha256": "136282db444c23e6a1f19ac493b3d5eb199f5a81346bc0983a63b8d2c654fa06" }, "downloads": -1, "filename": "pyhacrf-0.0.2.tar.gz", "has_sig": false, "md5_digest": "22a9e0d904f7403e40f09b6a42809cac", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 6133, "upload_time": "2015-03-31T21:05:09", "url": "https://files.pythonhosted.org/packages/bf/99/1a7401aa1f691785a777af52746e90922bdf4ade3d362b4123e7e25c44a3/pyhacrf-0.0.2.tar.gz" } ], "0.0.5": [ { "comment_text": "", "digests": { "md5": "c06c384f6ba5f50ef472e7c99290ef80", "sha256": "9e640f282651190225bd9083648a3efb42e7849b51bf83640f2bf31ca31c2ada" }, "downloads": -1, "filename": "pyhacrf-0.0.5.tar.gz", "has_sig": false, "md5_digest": "c06c384f6ba5f50ef472e7c99290ef80", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 8269, "upload_time": "2015-04-24T18:54:33", "url": "https://files.pythonhosted.org/packages/19/fb/2e2dbd810b6216ae5d56ded875db7837b497188b14c0261baf23b0524ad4/pyhacrf-0.0.5.tar.gz" } ], "0.0.6": [ { "comment_text": "", "digests": { "md5": "963156de82d244a1e36a8f461abbda3b", "sha256": "f90e1ef8cbaff5f296223f7af1dcf369445b95917d3638a8c824d29abf329e31" }, "downloads": -1, "filename": "pyhacrf-0.0.6.tar.gz", "has_sig": false, "md5_digest": "963156de82d244a1e36a8f461abbda3b", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 8219, "upload_time": "2015-04-24T19:35:22", "url": "https://files.pythonhosted.org/packages/77/e0/3732f5b42e5a4163e3dfbd38303d3f793abc6cc447b954cb09b023137ad6/pyhacrf-0.0.6.tar.gz" } ], "0.0.7": [ { "comment_text": "", "digests": { "md5": "73c916cdb605fc1b4cbd5b415dedecb9", "sha256": "e987b5ae3cb1f067d658e5e8b5f9ff126986754d908e4cb2a6b680fe2ff40b4c" }, "downloads": -1, "filename": "pyhacrf-0.0.7.tar.gz", "has_sig": false, "md5_digest": "73c916cdb605fc1b4cbd5b415dedecb9", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 56433, "upload_time": "2015-06-11T17:14:06", "url": "https://files.pythonhosted.org/packages/69/b0/8fd876a78dcb6f67aedb4bcbb392f236a3f534b003112ac2d86a14e64a1e/pyhacrf-0.0.7.tar.gz" } ], "0.0.8": [ { "comment_text": "", "digests": { "md5": "18fbaa4f7e8a9ca9981631cf36d1d0e6", "sha256": "dcf59850a9106ed900a1f20b9b0fe831e67a999d531d0d032aa3695506d4c052" }, "downloads": -1, "filename": "pyhacrf-0.0.8.tar.gz", "has_sig": false, "md5_digest": "18fbaa4f7e8a9ca9981631cf36d1d0e6", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 56583, "upload_time": "2015-06-12T17:42:49", "url": "https://files.pythonhosted.org/packages/15/7f/b043fb3e325b5ffa17019d0ceb60133d6c10db49bb0c85a88ee186a6cdcc/pyhacrf-0.0.8.tar.gz" } ], "0.0.9": [ { "comment_text": "", "digests": { "md5": "0446be3e8176784c2157b98848907fa5", "sha256": "417498163a623e26cfe25ba2f6dd090e7770162799a40c1fcd67deed08a1b69f" }, "downloads": -1, "filename": "pyhacrf-0.0.9.tar.gz", "has_sig": false, "md5_digest": "0446be3e8176784c2157b98848907fa5", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 76776, "upload_time": "2015-07-16T20:16:21", "url": "https://files.pythonhosted.org/packages/02/2e/49d3b69ed842c7c86969ab79c1e6629ca51ef531b9cbefcec1b5b1f15dfe/pyhacrf-0.0.9.tar.gz" } ], "0.1.1": [ { "comment_text": "", "digests": { "md5": "570974ab9a100ac84a4823dbb559ab53", "sha256": "0fa750dba9722d4f27010fc09808ae38201c61edb0711e8ac626d828af465a5c" }, "downloads": -1, "filename": "pyhacrf-0.1.1.tar.gz", "has_sig": false, "md5_digest": "570974ab9a100ac84a4823dbb559ab53", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 134095, "upload_time": "2015-09-19T15:11:35", "url": "https://files.pythonhosted.org/packages/4b/cd/eeda3f873334db9cf28c1e4e3c7bc8529664b5fc20855386da72d99a2fcd/pyhacrf-0.1.1.tar.gz" } ], "0.1.2": [ { "comment_text": "", "digests": { "md5": "1706f33f7be3de7fe018d286a2f9db70", "sha256": "696e08162e772fc624eebd7b2772b0f48063ae0f1340ff2b30982aae6216d6d9" }, "downloads": -1, "filename": "pyhacrf-0.1.2.tar.gz", "has_sig": false, "md5_digest": "1706f33f7be3de7fe018d286a2f9db70", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 134094, "upload_time": "2015-09-21T17:00:58", "url": "https://files.pythonhosted.org/packages/25/64/43561938fd81545aab1a4a2fda7bf54e3ed9fdc5d75c6806bdce67b5b25d/pyhacrf-0.1.2.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "1706f33f7be3de7fe018d286a2f9db70", "sha256": "696e08162e772fc624eebd7b2772b0f48063ae0f1340ff2b30982aae6216d6d9" }, "downloads": -1, "filename": "pyhacrf-0.1.2.tar.gz", "has_sig": false, "md5_digest": "1706f33f7be3de7fe018d286a2f9db70", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 134094, "upload_time": "2015-09-21T17:00:58", "url": "https://files.pythonhosted.org/packages/25/64/43561938fd81545aab1a4a2fda7bf54e3ed9fdc5d75c6806bdce67b5b25d/pyhacrf-0.1.2.tar.gz" } ] }