{ "info": { "author": "Diane Napolitano", "author_email": "dnapolitano@ets.org", "bugtrack_url": null, "classifiers": [ "License :: OSI Approved :: Apache Software License", "Topic :: Text Processing" ], "description": "match\n=====\n\nThe purpose of the module `Match` is to get the offsets (as well as the string between those offsets, for debugging) of a cleaned-up, tokenized string from its original, untokenized source. \"Big deal,\" you might say, but this is actually a pretty difficult task if the original text is sufficiently messy, not to mention rife with Unicode characters.\n\nConsider some text, stored in a variable `original_text`, like:\n\nI am writing a letter ! Sometimes,I forget to put spaces (and do weird stuff with punctuation) ? J'aurai une pomme, s'il vous pl\u00e2it !\n\nThis will/should/might be properly tokenized as:\n\n[[u'I', u'am', u'writing', u'a', u'letter', u'!'],\n [u'Sometimes', u',', u'I', u'forget', u'to', u'put', u'spaces', u'-LRB-', u'and', u'do', u'weird', u'stuff', u'with', u'punctuation', u'-RRB-', u'?'],\n [u\"J'aurai\", u'une', u'pomme', u',', u\"s'il\", u'vous', u'pl\\xe2it', u'!']]\n\nNow:\n\nIn [22]: Match.match(original_text, [u'-LRB-', u'and', u'do', u'weird', u'stuff', u'with', u'punctuation', u'-RRB-'])\nOut[22]: [(60, 97, u'(and do weird stuff with punctuation)')]\n\nIn [23]: Match.match(original_text, [u'I', u'am', u'writing', u'a', u'letter', u'!'])\nOut[23]: [(0, 25, u'I am writing a letter !')]\n\nIn [24]: Match.match(original_text, [u\"s'il\", u'vous', u'pl\\xe2it', u'!'])\nOut[24]: [(121, 138, u\"s'il vous pl\\xe2it !\")]", "description_content_type": null, "docs_url": null, "download_url": "UNKNOWN", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/EducationalTestingService/match", "keywords": "tokenization", "license": "UNKNOWN", "maintainer": null, "maintainer_email": null, "name": "match", "package_url": "https://pypi.org/project/match/", "platform": "UNKNOWN", "project_url": "https://pypi.org/project/match/", "project_urls": { "Download": "UNKNOWN", "Homepage": "https://github.com/EducationalTestingService/match" }, "release_url": "https://pypi.org/project/match/0.2.2/", "requires_dist": null, "requires_python": null, "summary": "UNKNOWN", "version": "0.2.2" }, "last_serial": 1770732, "releases": { "0.2": [ { "comment_text": "", "digests": { "md5": "629a697dab4ee59b0a4fda3f73eec268", "sha256": "63384f638e15070d57f1cbefabac7b08a4bab0184f166f31a7df0693d6e0a072" }, "downloads": -1, "filename": "match-0.2.tar.gz", "has_sig": false, "md5_digest": "629a697dab4ee59b0a4fda3f73eec268", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 5784, "upload_time": "2014-11-23T17:13:51", "url": "https://files.pythonhosted.org/packages/78/57/c8ff77083b1cd22fb26a3939789cdbc973086952cddb601ae247eec23a77/match-0.2.tar.gz" } ], "0.2.1": [ { "comment_text": "", "digests": { "md5": "44d267c2e2e2a05a073224812fc1ca16", "sha256": "7e326974e16226daa8cdcec83168dd9e443e2d3844de472ebd0b750bd37e991e" }, "downloads": -1, "filename": "match-0.2.1.tar.gz", "has_sig": false, "md5_digest": "44d267c2e2e2a05a073224812fc1ca16", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 5826, "upload_time": "2015-05-20T14:42:15", "url": "https://files.pythonhosted.org/packages/90/9e/eaa7c8814a3ff7f95c84b385cf888ed14768a5745729933c85782627940a/match-0.2.1.tar.gz" } ], "0.2.2": [ { "comment_text": "", "digests": { "md5": "cc66e80944ea8c3b0df8ae3f0be293c3", "sha256": "52f61683fb76343a6e2a61fc402cad50ba056a9ca1fd6a0792a19d4b339cfc44" }, "downloads": -1, "filename": "match-0.2.2.tar.gz", "has_sig": false, "md5_digest": "cc66e80944ea8c3b0df8ae3f0be293c3", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 5914, "upload_time": "2015-10-15T20:47:22", "url": "https://files.pythonhosted.org/packages/56/a2/7bacc7bdcffabff610c50862aeb33921a25a00f912003f29e9d0a71bb5ba/match-0.2.2.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "cc66e80944ea8c3b0df8ae3f0be293c3", "sha256": "52f61683fb76343a6e2a61fc402cad50ba056a9ca1fd6a0792a19d4b339cfc44" }, "downloads": -1, "filename": "match-0.2.2.tar.gz", "has_sig": false, "md5_digest": "cc66e80944ea8c3b0df8ae3f0be293c3", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 5914, "upload_time": "2015-10-15T20:47:22", "url": "https://files.pythonhosted.org/packages/56/a2/7bacc7bdcffabff610c50862aeb33921a25a00f912003f29e9d0a71bb5ba/match-0.2.2.tar.gz" } ] }