{ "info": { "author": "Michael Ruppert", "author_email": "michael.ruppert@fau.de", "bugtrack_url": null, "classifiers": [ "License :: OSI Approved :: GNU General Public License v3 or later (GPLv3+)", "Programming Language :: Python :: 3", "Topic :: Software Development :: Libraries :: Python Modules", "Topic :: Text Processing :: Linguistic" ], "description": "OSPLO: OneSentencePerLineOpener\n===============================\n\n- Viele Computerlinguistische Verfahren profitieren davon, wenn der Text im one-sentence-per line Format vorliegt (z.b die Erstellung von d\u00fcnnbesetzten Scipy Matrizen f\u00fcr Dokument-Term Matrizen, oder auch einfach Textauswertungen bis hin zu dem Input Format f\u00fcr Gensim.\n\n- Oft ist das nicht der Fall, und in vielen F\u00e4llen ist es zu speicherintensiv, ein komplettes Korpus einzulesen, um dann eine Tokenisierung und Satztrennung durchzuf\u00fchren.\n\n- OSPLO ist die naivste L\u00f6sung f\u00fcr dieses Problem.\nEs greift auf smart_open (Mit dem man auch komprimierte Archive und auch Webressourcen \u00f6ffnen kann) zur\u00fcck und verwendet den sehr guten Tokenizer und Satzsplitter SoMaJo.\nDas Endergebnis ist nicht besonders schnell, erlaubt aber direkte Iteration \u00fcber eine Datei als w\u00e4re sie im One-Sentence-per-Line Format, ohne die Datei gleichzeitig einlesen zu m\u00fcssen.\n\nz.B:\n```python\nfrom osplo import OSLOpen\nfor sentence in OSLOpen(\"https://www.gnu.org/licenses/gpl.txt\"): \n print(sentence)\n\n```\nDie ersten Zeilen des Outputs w\u00fcrden dann so aussehen:\n\n```\n\n GNU GENERAL PUBLIC LICENSE Version 3 , 29 June 2007 Copyright ( C ) 2007 Free Software Foundation , Inc. < https://fsf.org/> Everyone is permitted to copy and distribute verbatim copies of this license document , but changing it is not allowed .\n Preamble The GNU General Public License is a free , copyleft license for software and other kinds of works .\n The licenses for most software and other practical works are designed to take away your freedom to share and change the works .\n By contrast , the GNU General Public License is intended to guarantee your freedom to share and change all versions of a program--to make sure it remains free software for all its users .\n We , the Free Software Foundation , use the GNU General Public License for most of our software ; it applies also to any other work released this way by its authors .\n You can apply it to your programs , too .\n When we speak of free software , we are referring to freedom , not price .\n Our General Public Licenses are designed to make sure that you have the freedom to distribute copies of free software ( and charge for them if you wish ) , that you receive source code or can get it if you want it , that you can change the software or use pieces of it in new free programs , and that you know you can do these things .\n To protect your rights , we need to prevent others from denying you these rights or asking you to surrender the rights .\n Therefore , you have certain responsibilities if you distribute copies of the software , or if you modify it : responsibilities to respect the freedom of others .\n For example , if you distribute copies of such a program , whether gratis or for a fee , you must pass on to the recipients the same freedoms that you received .\n You must make sure that they , too , receive or can get the source code .\n``` \n\n\nHinweis:\nDie Software ist unter GPL3.0 lizensiert, ist allerdings nur ein Demo-Projekt f\u00fcr die Nutzung von Packaging.\n\n\n", "description_content_type": "", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/miweru/osplo", "keywords": "", "license": "GNU GENERAL PUBLIC LICENSE V3 OR LATER (GPLV3+)", "maintainer": "", "maintainer_email": "", "name": "osplo", "package_url": "https://pypi.org/project/osplo/", "platform": "", "project_url": "https://pypi.org/project/osplo/", "project_urls": { "Homepage": "https://github.com/miweru/osplo" }, "release_url": "https://pypi.org/project/osplo/0.1.12/", "requires_dist": [ "somajo (>=1.8.3)", "smart-open (>=1.8.0)" ], "requires_python": ">=3.5", "summary": "Stellt einen langsamen Iterator \u00fcber Textdateien (komprimiert, txt, online) zur Verf\u00fcgung, der einen Satz pro Iteration zur\u00fcckliefert.", "version": "0.1.12" }, "last_serial": 4831889, "releases": { "0.1.12": [ { "comment_text": "", "digests": { "md5": "2022b582668c8266e33f94332caf218e", "sha256": "c50478c488363e9a736dca355562a2366e2df7581615221da2ae34540962d8a2" }, "downloads": -1, "filename": "osplo-0.1.12-py3-none-any.whl", "has_sig": false, "md5_digest": "2022b582668c8266e33f94332caf218e", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": ">=3.5", "size": 15982, "upload_time": "2019-02-08T08:16:09", "url": "https://files.pythonhosted.org/packages/0f/8a/0f51ce8f484f53c2895b2ce288660d85981a3cadcfe163d890531445bbb1/osplo-0.1.12-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "92498c25a659509dd8bda3edcbbd5345", "sha256": "56aafe679273fb50dcf7a0819576394e30163380b0b46d99ccc951987af1cded" }, "downloads": -1, "filename": "osplo-0.1.12.tar.gz", "has_sig": false, "md5_digest": "92498c25a659509dd8bda3edcbbd5345", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.5", "size": 3629, "upload_time": "2019-02-08T08:16:10", "url": "https://files.pythonhosted.org/packages/a5/d8/c61904624d08ba23f11cef9526c74f166d6f68af5ca5fb44b609331afc3c/osplo-0.1.12.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "2022b582668c8266e33f94332caf218e", "sha256": "c50478c488363e9a736dca355562a2366e2df7581615221da2ae34540962d8a2" }, "downloads": -1, "filename": "osplo-0.1.12-py3-none-any.whl", "has_sig": false, "md5_digest": "2022b582668c8266e33f94332caf218e", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": ">=3.5", "size": 15982, "upload_time": "2019-02-08T08:16:09", "url": "https://files.pythonhosted.org/packages/0f/8a/0f51ce8f484f53c2895b2ce288660d85981a3cadcfe163d890531445bbb1/osplo-0.1.12-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "92498c25a659509dd8bda3edcbbd5345", "sha256": "56aafe679273fb50dcf7a0819576394e30163380b0b46d99ccc951987af1cded" }, "downloads": -1, "filename": "osplo-0.1.12.tar.gz", "has_sig": false, "md5_digest": "92498c25a659509dd8bda3edcbbd5345", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.5", "size": 3629, "upload_time": "2019-02-08T08:16:10", "url": "https://files.pythonhosted.org/packages/a5/d8/c61904624d08ba23f11cef9526c74f166d6f68af5ca5fb44b609331afc3c/osplo-0.1.12.tar.gz" } ] }