{ "info": { "author": "Robert Lujo", "author_email": "trebor74hr@gmail.com", "bugtrack_url": null, "classifiers": [ "Development Status :: 3 - Alpha", "Environment :: Console", "Intended Audience :: Developers", "Intended Audience :: Education", "Intended Audience :: End Users/Desktop", "Intended Audience :: Information Technology", "Intended Audience :: Science/Research", "License :: OSI Approved :: BSD License", "Natural Language :: English", "Operating System :: OS Independent", "Programming Language :: Python", "Topic :: Internet :: WWW/HTTP :: Indexing/Search", "Topic :: Scientific/Engineering :: Information Analysis", "Topic :: Text Processing", "Topic :: Text Processing :: Indexing", "Topic :: Text Processing :: Linguistic" ], "description": "Text tokenizer and sentence splitter \n====================================\nLibrary \"text-sentence\" is text tokenizer and sentence splitter. \n\nInput is for main function is text, list of known names and abbreviations. \nResult is list of tokens. Each token has type and other attributes i.e.:\n\n - is word, \n - is number, \n - is roman number, \n - is sentence end, \n - is abbreviation, \n - is name, \n - is contraction,\n - is end of chapter \n - etc. \n \n**Determining end of sentence** needs special logic and care what is the main\nreason for naming package with \"text-sentence\".\n\nTAGS \n----\n tokenization, sentence splitter, sentencer, chapter, names, abbreviation\n\nAUTHOR\n======\nRobert Lujo, Zagreb, Croatia, find mail address in LICENCE\n\n\nFEATURES\n========\nTo name the most important:\n - TODO: ...\n\nSystem is based on unicode strings.\n\nCheck `Getting started`_.\n\nINSTALLATION\n============\nInstallation instructions - if you have installed pip package \nhttp://pypi.python.org/pypi/pip::\n\n pip install text-sentence\n\nIf not, then do it old-fashioned way:\n - download zip from http://pypi.python.org/pypi/text-sentence/\n - unzip\n - open shell\n - go to distribution directory\n - python setup.py install\n\nDevelopment version you can see at http://bitbucket.org/trebor74hr/text-sentence.\n\nor Mercurial clone with::\n\n hg clone https://bitbucket.org/trebor74hr/text-sentence\n\nGETTING STARTED\n===============\nUsage example - start python shell::\n\n >>> from text_sentence import Tokenizer\n >>> t = Tokenizer()\n >>> list(t.tokenize(\"This is first sentence. This is second one!And this is third, is it?\"))\n [T('this'/sent_start), T('is'), T('first'), T('sentence'), T('.'/sent_end), \n T('this'/sent_start), T('is'), T('second'), T('one'), T('!'/sent_end), \n T('and'/sent_start), T('this'), T('is'), T('third'), T(','/inner_sep), \n T('is'), T('it'), T('?'/sent_end)]\n\nMore samples can be found in tests:\n\n http://bitbucket.org/trebor74hr/text-sentence/src/tip/text_sentence/test_sentence.txt\n\nFurther\n-------\nSince there is currently no good documentation, the best source of \nfurther information is by reading tests inside of module and\ntests test_sentence. More information in `Running tests`_.\nYou can allways read a source.\n\n\nDOCUMENTATION\n=============\nCurrently there is no documentation. In progress ...\n\n\nSUPPORT\n=======\nSince this project is limited by my free time, support is limited. \n\n\nREPORT BUG OR REQUEST FEATURE\n-----------------------------\nIf you encounter bug, the best is to report it to the bitbucket web page\nhttp://bitbucket.org/trebor74hr/text-sentence.\n\nThe best way to contact me is by mail (find in LICENCE).\n\nTODO list is in readme.txt (dev version).\n\n\nCONTRIBUTION\n============\nSince this project is not currently in the stable API phase, contribution\nshould wait for a while.\n\n\nRUNNING TESTS\n=============\nAll tests are doctests (not unittests). There are two type of tests in the\npackage: \n\n 1. doctests in module i.e. in __init__.py\n 2. doctests in test_sentence.txt \n\nRunning module directly will run 1. and 2. \n\nTo run tests:\n - goto text_sentence directory\n - run tests by running module, e.g.::\n\n > python __init__.py\n __main__: running doctests\n test_sentence.txt: running doctests\n\n - other with::\n\n > python -m\"text_sentence\"\n\nTODO\n====\nvarious things, see readme.txt in dev version for details.\n\nCHANGES\n=======\n0.14\n----\nulr1 100621: \n - is_contraction token attribute - e.g. isn't or o\u0161'\n\n0.13\n----\nulr1 100619:\n - sample in getting started\n\n0.12\n----\nulr1 100619:\n - test_sentence.txt installation\n - readme fix main title\n\n0.11\n----\nulr1 100618:\n - adapted tests\n - __init__.py and sentence.py\n\n0.10\n----\nulr1 100617:\n - first installable release", "description_content_type": null, "docs_url": null, "download_url": "UNKNOWN", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "http://bitbucket.org/trebor74hr/text-sentence/", "keywords": null, "license": "UNKNOWN", "maintainer": null, "maintainer_email": null, "name": "text-sentence", "package_url": "https://pypi.org/project/text-sentence/", "platform": "UNKNOWN", "project_url": "https://pypi.org/project/text-sentence/", "project_urls": { "Download": "UNKNOWN", "Homepage": "http://bitbucket.org/trebor74hr/text-sentence/" }, "release_url": "https://pypi.org/project/text-sentence/0.14/", "requires_dist": null, "requires_python": null, "summary": "text-sentence is text tokenizer and sentence splitter", "version": "0.14" }, "last_serial": 800503, "releases": { "0.10": [ { "comment_text": "", "digests": { "md5": "7edb1bfe8493c715b1973bbe68141fb0", "sha256": "b578f3c222fca3014ea424cd1e098ac2a93e996c8a3f8694b6d27c1469651fea" }, "downloads": -1, "filename": "text-sentence-0.10.zip", "has_sig": false, "md5_digest": "7edb1bfe8493c715b1973bbe68141fb0", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 24346, "upload_time": "2010-06-17T21:12:28", "url": "https://files.pythonhosted.org/packages/97/c3/328c8494522257e6c7d03d0f5fcce13ae73b74f0fab7090b46d9d9fe28b2/text-sentence-0.10.zip" } ], "0.11": [ { "comment_text": "", "digests": { "md5": "d3182bd077d46041f8e9500292c162a8", "sha256": "635f7cd505a046771a5e4f4a0434124dd396c6aafdb87a83a061a5891dfd81e0" }, "downloads": -1, "filename": "text-sentence-0.11.zip", "has_sig": false, "md5_digest": "d3182bd077d46041f8e9500292c162a8", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 24823, "upload_time": "2010-06-18T08:47:42", "url": "https://files.pythonhosted.org/packages/12/64/515b4bfc22b50b9582ebb75447ec7d6dc17bffbe159650cbd1cb97d38c06/text-sentence-0.11.zip" } ], "0.12": [ { "comment_text": "", "digests": { "md5": "f555a91be3e189f0c4656715c6123d31", "sha256": "6bd437c627435cabd123a0b143b50d5743b15007a10a64b9a53d51b17dd4d076" }, "downloads": -1, "filename": "text-sentence-0.12.zip", "has_sig": false, "md5_digest": "f555a91be3e189f0c4656715c6123d31", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 24950, "upload_time": "2010-06-19T00:29:06", "url": "https://files.pythonhosted.org/packages/71/20/a594d069817a66c2b9b52ecc396e885370db5de9c6c559cf0afd5aed375b/text-sentence-0.12.zip" } ], "0.13": [ { "comment_text": "", "digests": { "md5": "eb097ac00b9dbf6a4ff0cd871a94f06d", "sha256": "90ac366bcde8f65950494a97d884c0b11d818bcfe3e2ab67badb7077707ac486" }, "downloads": -1, "filename": "text-sentence-0.13.zip", "has_sig": false, "md5_digest": "eb097ac00b9dbf6a4ff0cd871a94f06d", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 25412, "upload_time": "2010-06-19T00:45:55", "url": "https://files.pythonhosted.org/packages/46/eb/b83564fb721c12136ae5c1a9ee5c4dae3f851cdbec629e01c7d6c8324ae0/text-sentence-0.13.zip" } ], "0.14": [ { "comment_text": "", "digests": { "md5": "039393aca75378813ca17d1c09c7b9df", "sha256": "76a88662c42c8e9d62b0eb122ae3ee782d3ebce5af90e2db594966bf7361abf5" }, "downloads": -1, "filename": "text-sentence-0.14.zip", "has_sig": false, "md5_digest": "039393aca75378813ca17d1c09c7b9df", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 25787, "upload_time": "2010-06-21T15:18:42", "url": "https://files.pythonhosted.org/packages/0a/71/95691d938ba0f47e1573fd34af9c7bdc0f94be27a3d36eeedc64eb2fed64/text-sentence-0.14.zip" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "039393aca75378813ca17d1c09c7b9df", "sha256": "76a88662c42c8e9d62b0eb122ae3ee782d3ebce5af90e2db594966bf7361abf5" }, "downloads": -1, "filename": "text-sentence-0.14.zip", "has_sig": false, "md5_digest": "039393aca75378813ca17d1c09c7b9df", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 25787, "upload_time": "2010-06-21T15:18:42", "url": "https://files.pythonhosted.org/packages/0a/71/95691d938ba0f47e1573fd34af9c7bdc0f94be27a3d36eeedc64eb2fed64/text-sentence-0.14.zip" } ] }