{ "info": { "author": "Tobias Rodaebel", "author_email": "rodaebel@users.sourceforge.net", "bugtrack_url": null, "classifiers": [ "Development Status :: 5 - Production/Stable", "Intended Audience :: Developers", "License :: OSI Approved :: GNU Library or Lesser General Public License (LGPL)", "Operating System :: OS Independent", "Programming Language :: C++", "Programming Language :: Python", "Topic :: Software Development :: Libraries :: Python Modules", "Topic :: Text Processing :: Markup :: HTML", "Topic :: Text Processing :: Markup :: SGML", "Topic :: Text Processing :: Markup :: XML" ], "description": "=====================\nPackage haufe.stripml\n=====================\n\nPython extension for stripping HTML markup from text.\n\nThis package simply removes HTML-like markup from text in a very quick and\nbrutal manner. It could quite easily be used to strip XML or SGML from text as\nwell. It does not do any syntax checking.\n\nThe core functionalities are implemented with the C++ programming language, and\nthus much quicker than using SGMLParser or regular expressions for the same\ntask.\n\n\nCopyright\n---------\nhaufe.stripml is (C) Tobias Rodaebel & Haufe Mediengruppe, D-79111 Freiburg, Germany\n\n\nLicence\n-------\nThis package is released under the LGPL 3 See LICENSE.txt.\n\n\nInstallation\n------------\n\nUse easy_install::\n\n easy_install haufe.stripml\n\n\nTesting\n-------\n\nhaufe.stripml can be tested by typing::\n\n python setup.py test -m haufe.stripml.tests\n\n\nCredits\n-------\nThanks to Gottfried Ganssauge for translating to the C++ programming language.\n\n\nFirst we want to test the stripml method.\n\n >>> from haufe.stripml import stripml\n >>> stripml.__doc__\n 'stripml(s) -> string'\n\nThe one and only argument is a string.\n\n >>> stripml('foo')\n 'foo'\n >>> type(stripml('foo')) == type('')\n True\n\nThe stripml method supports unicode, too.\n\n >>> stripml(u'bar')\n u'bar'\n >>> type(stripml(u'foo')) == type(u'')\n True\n\nTrying an integer as first argument. A TypeError should be raised.\n\n >>> try:\n ... stripml(10)\n ... except TypeError, strerror:\n ... print strerror\n String or unicode string required.\n\nEmpty script\n\n >>> stripml ('')\n ''\n >>> stripml (u'')\n u''\n\nTry some huge element name\n\n >>> stripml ('')\n ''\n >>> stripml (u'')\n u''\n\nNow we try some dumb HTML ...\n\n >>> stripml('foo')\n 'foo'\n >>> stripml('foo bar.')\n 'foo bar.'\n >>> stripml('''Really big string\n ... ''')\n 'Really big string\\n'\n\n... and now as unicode.\n\n >>> stripml(u'foo')\n u'foo'\n >>> stripml(u'foo bar.')\n u'foo bar.'\n >>> stripml(u'''Really big string\n ... ''')\n u'Really big string\\n'\n\nSometimes we have `script` tags, which contents nobody needs.\n\n >>> stripml('''We have a script in here , dude.''')\n 'We have a script in here , dude.'\n\nUnicode.\n\n >>> stripml(u'''We have a script in here , dude.''')\n u'We have a script in here , dude.'\n\nBut on the other hand the contents of `scrip`-Tags (without the trailing 't')\nshould not be stripped\n\n >>> stripml('KEEP THIS')\n 'KEEP THIS'\n >>> stripml(u'KEEP THIS')\n u'KEEP THIS'\n\nAnd neither should `scripting`-Tags\n\n >>> stripml('KEEP THIS')\n 'KEEP THIS'\n >>> stripml(u'KEEP THIS')\n u'KEEP THIS'\n\nHow about forgotten -Tags\n >>> stripml('KEEP THIS')\n 'KEEP THIS'\n >>> stripml(u'KEEP THIS')\n u'KEEP THIS'\n\nA much longer string.\n\n >>> result = stripml(u'''\n ... \n ... \n ... \n ... \n ... \n ... \n ... \n ... \n ... Test document\n ... \n ... \n ... \n ...

Test document

\n ...

This document is
only for testing!

\n ... \n ... \n ... \n ... ''')\n >>> result.strip()\n u'Test document\\n\\n\\n\\n Test document\\n This document is only for testing!'\n >>> type(result)\n \n\nA single 'less then' or 'greater then' will be passed through.\n\n >>> stripml(u'hundred < thousand < million.')\n u'hundred < thousand < million.'\n >>> stripml(u'thousand > hundred.')\n u'thousand > hundred.'\n >>> stripml('hundred < thousand < million.')\n 'hundred < thousand < million.'\n >>> stripml('thousand > hundred.')\n 'thousand > hundred.'\n\nLet's see if a really long string can be handled well.\n\n >>> s = 5000 * u'

This is a span within a paragraph.

\\n'\n >>> stripml(s) == 5000 * u'This is a span within a paragraph.\\n'\n True\n\nAnd we should have a look at entities and encodings.\n\n >>> stripml(u'In Straße und Überführung haben wir Umlaute.')\n u'In Straße und Überführung haben wir Umlaute.'\n >>> stripml('In Straße und Überführung haben wir Umlaute.')\n 'In Straße und Überführung haben wir Umlaute.'\n >>> print stripml(u'In Stra\u00dfe und \u00dcberf\u00fchrung haben wir Umlaute.').encode('ISO-8859-1') == u'In Stra\u00dfe und \u00dcberf\u00fchrung haben wir Umlaute.'.encode('ISO-8859-1')\n True\n\n=======\nChanges\n=======\n\n1.2.2 (2012-11-07)\n------------------\n- support for visual studio 2010 by M.Honeck\n\n1.2.1 (2008-03-20)\n------------------\n- added nose testrunner support\n\n1.2.0 (2007-10-23)\n------------------\n- First public release.\n- License added.", "description_content_type": null, "docs_url": null, "download_url": "UNKNOWN", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "http://pypi.python.org/pypi/haufe.stripml", "keywords": "SGML,HTML,XML,XHTML", "license": "LGPL 3", "maintainer": null, "maintainer_email": null, "name": "haufe.stripml", "package_url": "https://pypi.org/project/haufe.stripml/", "platform": "All", "project_url": "https://pypi.org/project/haufe.stripml/", "project_urls": { "Download": "UNKNOWN", "Homepage": "http://pypi.python.org/pypi/haufe.stripml" }, "release_url": "https://pypi.org/project/haufe.stripml/1.2.2/", "requires_dist": null, "requires_python": null, "summary": "Python extension for removing markup from strings.", "version": "1.2.2" }, "last_serial": 792846, "releases": { "1.2.0": [ { "comment_text": "", "digests": { "md5": "5d3cb912d09b0f761b2ddb2227ebd038", "sha256": "e5624d46e678bcfedae40b37bfb2e8c864ec2832fb8356d6e3e0f1f2ae69d0ce" }, "downloads": -1, "filename": "haufe.stripml-1.2.0.tar.gz", "has_sig": false, "md5_digest": "5d3cb912d09b0f761b2ddb2227ebd038", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 10677, "upload_time": "2007-10-24T08:58:08", "url": "https://files.pythonhosted.org/packages/62/ff/c61ee96eb49327f4d324c5dcbc2c70acae6187936c647449e725aab7969d/haufe.stripml-1.2.0.tar.gz" } ], "1.2.1": [ { "comment_text": "", "digests": { "md5": "70ae33048944924bac91ea74fadd35df", "sha256": "517d6cf73766b45c368d40d86cd93dd2dfe1376326d412d8798d5014cf894bbf" }, "downloads": -1, "filename": "haufe.stripml-1.2.1.tar.gz", "has_sig": false, "md5_digest": "70ae33048944924bac91ea74fadd35df", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 11911, "upload_time": "2009-03-17T08:51:28", "url": "https://files.pythonhosted.org/packages/94/33/0981a9e7b9c751a25216033cee5241554d5bd5368e1f3bdce0456ee18564/haufe.stripml-1.2.1.tar.gz" } ], "1.2.2": [ { "comment_text": "", "digests": { "md5": "bdbddeef52c80f83c299fb25d502cfab", "sha256": "288223fc8ea690171892dde354c282a215b130e9fe3d40fa96e32a98e068a8a2" }, "downloads": -1, "filename": "haufe.stripml-1.2.2.tar.gz", "has_sig": false, "md5_digest": "bdbddeef52c80f83c299fb25d502cfab", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 12098, "upload_time": "2012-11-08T09:23:04", "url": "https://files.pythonhosted.org/packages/4c/d3/bc60d301db4d5ad080e59bfaffdf0f98d6c495e303e80fb4a7f90f32379c/haufe.stripml-1.2.2.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "bdbddeef52c80f83c299fb25d502cfab", "sha256": "288223fc8ea690171892dde354c282a215b130e9fe3d40fa96e32a98e068a8a2" }, "downloads": -1, "filename": "haufe.stripml-1.2.2.tar.gz", "has_sig": false, "md5_digest": "bdbddeef52c80f83c299fb25d502cfab", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 12098, "upload_time": "2012-11-08T09:23:04", "url": "https://files.pythonhosted.org/packages/4c/d3/bc60d301db4d5ad080e59bfaffdf0f98d6c495e303e80fb4a7f90f32379c/haufe.stripml-1.2.2.tar.gz" } ] }