{ "info": { "author": "", "author_email": "", "bugtrack_url": null, "classifiers": [ "Development Status :: 3 - Alpha", "License :: OSI Approved :: BSD License", "Programming Language :: Python :: 3.6" ], "description": "ExtractContent3\n===============\n\n.. image:: https://img.shields.io/badge/License-BSD%202--Clause-orange.svg\n :target: https://opensource.org/licenses/BSD-2-Clause\n\n.. image:: https://img.shields.io/badge/python-3.6-blue.svg\n\n.. image:: https://travis-ci.org/kanjirz50/python-extractcontent3.svg?branch=master\n :target: https://travis-ci.org/kanjirz50/python-extractcontent3\n\nExtractContent3\u306fPython3\u3067\u52d5\u4f5c\u3059\u308b\u3001HTML\u304b\u3089\u672c\u6587\u3092\u62bd\u51fa\u3059\u308b\u30e2\u30b8\u30e5\u30fc\u30eb\u3067\u3059\u3002\n\u3053\u306e\u30e2\u30b8\u30e5\u30fc\u30eb\u306f\u3001ExtractContent Ruby\u30e2\u30b8\u30e5\u30fc\u30eb\u3092Python\u7528\u306b\u66f8\u304d\u76f4\u3057\u305fpython-extracontent\u3092\u6539\u9020\u3057\u305f\u3082\u306e\u3067\u3059\u3002\n\nUsage\n------------\n\n.. code-block:: python\n\n from extractcontent3 import ExtractContent\n extractor = ExtractContent()\n\n # \u30aa\u30d7\u30b7\u30e7\u30f3\u5024\u3092\u6307\u5b9a\u3059\u308b\n opt = {\"threshold\":50}\n extractor.set_default(opt)\n\n html = open(\"index.html\").read() # \u89e3\u6790\u5bfe\u8c61HTML\n extractor.analyse(html)\n text, title = extractor.as_text()\n html, title = extractor.as_html()\n title = extractor.extract_title(html)\n\nInstallation\n------------\n\n.. code-block:: bash\n # pypi\n $ pip install extractcontent3\n\n # Github\u304b\u3089\u306e\u30a4\u30f3\u30b9\u30c8\u30fc\u30eb\n $ pip install git+https://github.com/kanjirz50/python-extractcontent3\n\nOption\n-------------\n\n.. code-block:: python\n\n \"\"\"\n \u30aa\u30d7\u30b7\u30e7\u30f3\u306e\u7a2e\u985e:\n \u540d\u79f0 / \u30c7\u30d5\u30a9\u30eb\u30c8\u5024\n\n threshold / 100\n \u672c\u6587\u3068\u898b\u306a\u3059\u30b9\u30b3\u30a2\u306e\u95be\u5024\n\n min_length / 80\n \u8a55\u4fa1\u3092\u884c\u3046\u30d6\u30ed\u30c3\u30af\u9577\u306e\u6700\u5c0f\u5024\n\n decay_factor / 0.73\n \u6e1b\u8870\u4fc2\u6570\n \u5c0f\u3055\u3044\u307b\u3069\u5148\u982d\u306b\u8fd1\u3044\u30d6\u30ed\u30c3\u30af\u306e\u30b9\u30b3\u30a2\u304c\u9ad8\u304f\u306a\u308a\u307e\u3059\n\n continuous_factor / 1.62\n \u9023\u7d9a\u30d6\u30ed\u30c3\u30af\u4fc2\u6570\n \u5927\u304d\u3044\u307b\u3069\u30d6\u30ed\u30c3\u30af\u3092\u9023\u7d9a\u3068\u5224\u5b9a\u3057\u306b\u304f\u304f\u306a\u308b\n\n punctuation_weight / 10\n \u53e5\u8aad\u70b9\u306b\u5bfe\u3059\u308b\u30b9\u30b3\u30a2\u3000\n \u5927\u304d\u3044\u307b\u3069\u53e5\u8aad\u70b9\u304c\u5b58\u5728\u3059\u308b\u30d6\u30ed\u30c3\u30af\u3092\u672c\u6587\u3068\u5224\u5b9a\u3057\u3084\u3059\u304f\u306a\u308b\n\n punctuations r\"(?is)([\\u3001\\u3002\\uff01\\uff0c\\uff0e\\uff1f]|\\.[^A-Za-z0-9]|,[^0-9]|!|\\?)\" \n \u53e5\u8aad\u70b9\u3092\u62bd\u51fa\u3059\u308b\u6b63\u898f\u8868\u73fe\n\n waste_expressions / r\"(?i)Copyright|All Rights Reserved\"\n \u30d5\u30c3\u30bf\u30fc\u306b\u542b\u307e\u308c\u308b\u7279\u5fb4\u7684\u306a\u30ad\u30fc\u30ef\u30fc\u30c9\u3092\u6307\u5b9a\u3057\u305f\u6b63\u898f\u8868\u73fe\n\n debug / False\n True\u306e\u5834\u5408\u3001\u30d6\u30ed\u30c3\u30af\u60c5\u5831\u3092\u51fa\u529b\n \"\"\"\n\n\u8b1d\u8f9e\n----\n\n\u30aa\u30ea\u30b8\u30ca\u30eb\u7248\u306e\u4f5c\u6210\u8005\u3084Fork\u3067\u6539\u826f\u3092\u52a0\u3048\u305f\u65b9\u3005\u306b\u611f\u8b1d\u3057\u307e\u3059\u3002\n\n- Copyright of the original implementation:: (c)2007/2008/2009 Nakatani Shuyo / Cybozu labs Inc. All rights reserved\n - http://rubyforge.org/projects/extractcontent/\n - http://labs.cybozu.co.jp/blog/nakatani/2007/09/web_1.html\n- https://github.com/petitviolet/python-extractcontent\n- https://github.com/yono/python-extractcontent\n\n\n\n\n\n\n", "description_content_type": null, "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/kanjirz50/python-extractcontent3", "keywords": "", "license": "BSD 2-Clause", "maintainer": "", "maintainer_email": "", "name": "extractcontent3", "package_url": "https://pypi.org/project/extractcontent3/", "platform": "", "project_url": "https://pypi.org/project/extractcontent3/", "project_urls": { "Homepage": "https://github.com/kanjirz50/python-extractcontent3" }, "release_url": "https://pypi.org/project/extractcontent3/0.0.2/", "requires_dist": null, "requires_python": "~=3.3", "summary": "ExtractContent for Python 3", "version": "0.0.2" }, "last_serial": 3544179, "releases": { "0.0.2": [ { "comment_text": "", "digests": { "md5": "3aba9d9bd537410d46738f65756206c9", "sha256": "480f3cb7a9f41e75a0d07b6c942d2ba01167fbb0b3861b20e612cc4eeb7230e2" }, "downloads": -1, "filename": "extractcontent3-0.0.2-py2-none-any.whl", "has_sig": false, "md5_digest": "3aba9d9bd537410d46738f65756206c9", "packagetype": "bdist_wheel", "python_version": "py2", "requires_python": "~=3.3", "size": 8072, "upload_time": "2018-02-02T06:02:57", "url": "https://files.pythonhosted.org/packages/55/46/73a84dc2652075a58e5eb34a541c7abcdb32b061116a488fd47c2920d490/extractcontent3-0.0.2-py2-none-any.whl" }, { "comment_text": "", "digests": { "md5": "9398c5607bba7af6a512bf499ebc8964", "sha256": "0621768838275e0bbf470070c7cb3492adc95c513b34f212baec16698557fe02" }, "downloads": -1, "filename": "extractcontent3-0.0.2.tar.gz", "has_sig": false, "md5_digest": "9398c5607bba7af6a512bf499ebc8964", "packagetype": "sdist", "python_version": "source", "requires_python": "~=3.3", "size": 5598, "upload_time": "2018-02-02T06:02:59", "url": "https://files.pythonhosted.org/packages/08/90/e98f213762bccb12ff09a5ff315e40c0e2a7acc7a3eaa57f4db2e45e991e/extractcontent3-0.0.2.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "3aba9d9bd537410d46738f65756206c9", "sha256": "480f3cb7a9f41e75a0d07b6c942d2ba01167fbb0b3861b20e612cc4eeb7230e2" }, "downloads": -1, "filename": "extractcontent3-0.0.2-py2-none-any.whl", "has_sig": false, "md5_digest": "3aba9d9bd537410d46738f65756206c9", "packagetype": "bdist_wheel", "python_version": "py2", "requires_python": "~=3.3", "size": 8072, "upload_time": "2018-02-02T06:02:57", "url": "https://files.pythonhosted.org/packages/55/46/73a84dc2652075a58e5eb34a541c7abcdb32b061116a488fd47c2920d490/extractcontent3-0.0.2-py2-none-any.whl" }, { "comment_text": "", "digests": { "md5": "9398c5607bba7af6a512bf499ebc8964", "sha256": "0621768838275e0bbf470070c7cb3492adc95c513b34f212baec16698557fe02" }, "downloads": -1, "filename": "extractcontent3-0.0.2.tar.gz", "has_sig": false, "md5_digest": "9398c5607bba7af6a512bf499ebc8964", "packagetype": "sdist", "python_version": "source", "requires_python": "~=3.3", "size": 5598, "upload_time": "2018-02-02T06:02:59", "url": "https://files.pythonhosted.org/packages/08/90/e98f213762bccb12ff09a5ff315e40c0e2a7acc7a3eaa57f4db2e45e991e/extractcontent3-0.0.2.tar.gz" } ] }