{ "info": { "author": "chimera0", "author_email": "ai-brain-lab@accel-brain.com", "bugtrack_url": null, "classifiers": [ "Development Status :: 5 - Production/Stable", "Intended Audience :: Information Technology", "Intended Audience :: Science/Research", "License :: OSI Approved :: GNU General Public License v2 (GPLv2)", "Programming Language :: Python :: 3", "Topic :: Scientific/Engineering :: Artificial Intelligence", "Topic :: Text Processing" ], "description": "# Automatic Summarization Library: pysummarization\n\n`pysummarization` is Python3 library for the automatic summarization, document abstraction, and text filtering.\n\n## Description\n\nThe function of this library is automatic summarization using a kind of natural language processing and neural network language model. This library enable you to create a summary with the major points of the original document or web-scraped text that filtered by text clustering. And this library applies [pydbm](https://github.com/chimera0/accel-brain-code/tree/master/Deep-Learning-by-means-of-Design-Pattern) to implement **Encoder/Decoder based on LSTM** (with an Attention mechanism) improving the accuracy of summarization by **Sequence-to-Sequence**(**Seq2Seq**) learning.\n\n## Documentation\n\nFull documentation is available on [https://code.accel-brain.com/Automatic-Summarization/](https://code.accel-brain.com/Automatic-Summarization/) . This document contains information on functionally reusability, functional scalability and functional extensibility.\n\n## Installation\n\nInstall using pip:\n\n```sh\npip install pysummarization\n```\n\n### Source code\n\nThe source code is currently hosted on GitHub.\n\n- [accel-brain-code/Automatic-Summarization](https://github.com/chimera0/accel-brain-code/tree/master/Automatic-Summarization)\n\n### Python package index(PyPI)\n\nInstallers for the latest released version are available at the Python package index.\n\n- [pysummarization : Python Package Index](https://pypi.python.org/pypi/pysummarization/)\n\n### Dependencies\n\n- numpy: v1.13.3 or higher.\n- nltk: v3.2.3 or higher.\n\n#### Options\n\n- mecab-python3: v0.7 or higher.\n * Relevant only for Japanese.\n- pdfminer2\n * Relevant only for PDF files.\n- pyquery:v1.2.17 or higher.\n * Relevant only for web scraiping.\n- pydbm: v1.5.0 or higher.\n * Only when using **Encoder/Decoder based on LSTM**, **skip-gram**, **Re-Seq2Seq**, and **EncDec-AD**.\n\n## Usecase: Summarize an English string argument.\n\nImport Python modules.\n\n```python\nfrom pysummarization.nlpbase.auto_abstractor import AutoAbstractor\nfrom pysummarization.tokenizabledoc.simple_tokenizer import SimpleTokenizer\nfrom pysummarization.abstractabledoc.top_n_rank_abstractor import TopNRankAbstractor\n```\n\nPrepare an English string argument.\n\n```python\ndocument = \"Natural language generation (NLG) is the natural language processing task of generating natural language from a machine representation system such as a knowledge base or a logical form. Psycholinguists prefer the term language production when such formal representations are interpreted as models for mental representations.\"\n```\n\nAnd instantiate objects and call the method.\n\n```python\n# Object of automatic summarization.\nauto_abstractor = AutoAbstractor()\n# Set tokenizer.\nauto_abstractor.tokenizable_doc = SimpleTokenizer()\n# Set delimiter for making a list of sentence.\nauto_abstractor.delimiter_list = [\".\", \"\\n\"]\n# Object of abstracting and filtering document.\nabstractable_doc = TopNRankAbstractor()\n# Summarize document.\nresult_dict = auto_abstractor.summarize(document, abstractable_doc)\n\n# Output result.\nfor sentence in result_dict[\"summarize_result\"]:\n print(sentence)\n```\n\nThe `result_dict` is a dict. this format is as follows.\n\n```python\n dict{\n \"summarize_result\": \"The list of summarized sentences.\", \n \u00a0 \u00a0 \"scoring_data\": \u00a0 \u00a0 \"The list of scores(Rank of importance).\"\n }\n```\n\n## Usecase: Summarize Japanese string argument.\n\nImport Python modules.\n\n```python\nfrom pysummarization.nlpbase.auto_abstractor import AutoAbstractor\nfrom pysummarization.tokenizabledoc.mecab_tokenizer import MeCabTokenizer\nfrom pysummarization.abstractabledoc.top_n_rank_abstractor import TopNRankAbstractor\n```\n\nPrepare an English string argument.\n\n```python\ndocument = \"\u81ea\u7136\u8a00\u8a9e\u51e6\u7406\uff08\u3057\u305c\u3093\u3052\u3093\u3054\u3057\u3087\u308a\u3001\u82f1\u8a9e: natural language processing\u3001\u7565\u79f0\uff1aNLP\uff09\u306f\u3001\u4eba\u9593\u304c\u65e5\u5e38\u7684\u306b\u4f7f\u3063\u3066\u3044\u308b\u81ea\u7136\u8a00\u8a9e\u3092\u30b3\u30f3\u30d4\u30e5\u30fc\u30bf\u306b\u51e6\u7406\u3055\u305b\u308b\u4e00\u9023\u306e\u6280\u8853\u3067\u3042\u308a\u3001\u4eba\u5de5\u77e5\u80fd\u3068\u8a00\u8a9e\u5b66\u306e\u4e00\u5206\u91ce\u3067\u3042\u308b\u3002\u300c\u8a08\u7b97\u8a00\u8a9e\u5b66\u300d\uff08computational linguistics\uff09\u3068\u306e\u985e\u4f3c\u3082\u3042\u308b\u304c\u3001\u81ea\u7136\u8a00\u8a9e\u51e6\u7406\u306f\u5de5\u5b66\u7684\u306a\u8996\u70b9\u304b\u3089\u306e\u8a00\u8a9e\u51e6\u7406\u3092\u3055\u3059\u306e\u306b\u5bfe\u3057\u3066\u3001\u8a08\u7b97\u8a00\u8a9e\u5b66\u306f\u8a00\u8a9e\u5b66\u7684\u8996\u70b9\u3092\u91cd\u8996\u3059\u308b\u624b\u6cd5\u3092\u3055\u3059\u4e8b\u304c\u591a\u3044[1]\u3002\u30c7\u30fc\u30bf\u30d9\u30fc\u30b9\u5185\u306e\u60c5\u5831\u3092\u81ea\u7136\u8a00\u8a9e\u306b\u5909\u63db\u3057\u305f\u308a\u3001\u81ea\u7136\u8a00\u8a9e\u306e\u6587\u7ae0\u3092\u3088\u308a\u5f62\u5f0f\u7684\u306a\uff08\u30b3\u30f3\u30d4\u30e5\u30fc\u30bf\u304c\u7406\u89e3\u3057\u3084\u3059\u3044\uff09\u8868\u73fe\u306b\u5909\u63db\u3059\u308b\u3068\u3044\u3063\u305f\u51e6\u7406\u304c\u542b\u307e\u308c\u308b\u3002\u5fdc\u7528\u4f8b\u3068\u3057\u3066\u306f\u4e88\u6e2c\u5909\u63db\u3001IME\u306a\u3069\u306e\u6587\u5b57\u5909\u63db\u304c\u6319\u3052\u3089\u308c\u308b\u3002\"\n```\n\nAnd instantiate objects and call the method.\n\n```python\n# Object of automatic summarization.\nauto_abstractor = AutoAbstractor()\n# Set tokenizer for Japanese.\nauto_abstractor.tokenizable_doc = MeCabTokenizer()\n# Set delimiter for making a list of sentence.\nauto_abstractor.delimiter_list = [\"\u3002\", \"\\n\"]\n# Object of abstracting and filtering document.\nabstractable_doc = TopNRankAbstractor()\n# Summarize document.\nresult_dict = auto_abstractor.summarize(document, abstractable_doc)\n\n# Output result.\nfor sentence in result_dict[\"summarize_result\"]:\n print(sentence)\n```\n\n## Usecase: English Web-Page Summarization\n\nRun the batch program: [demo/demo_summarization_english_web_page.py](https://github.com/chimera0/accel-brain-code/blob/master/Automatic-Summarization/demo/demo_summarization_english_web_page.py)\n\n```\npython demo/demo_summarization_english_web_page.py {URL}\n```\n\n- {URL}: web site URL.\n\n### Demo\n\nLet's summarize this page: [Natural_language_generation - Wikipedia](https://en.wikipedia.org/wiki/Natural_language_generation).\n\n```\npython demo/demo_summarization_english_web_page.py https://en.wikipedia.org/wiki/Natural_language_generation\n```\n\nThe result is as follows.\n```\nNatural language generation From Wikipedia, the free encyclopedia Jump to: navigation , search Natural language generation ( NLG ) is the natural language processing task of generating natural language from a machine representation system such as a knowledge base or a logical form .\n\n Psycholinguists prefer the term language production when such formal representations are interpreted as models for mental representations.\n\n It could be said an NLG system is like a translator that converts data into a natural language representation.\n```\n\n## Usecase: Japanese Web-Page Summarization\n\nRun the batch program: [demo/demo_summarization_japanese_web_page.py](https://github.com/chimera0/accel-brain-code/blob/master/Automatic-Summarization/demo/demo_summarization_japanese_web_page.py)\n\n```\npython demo/demo_summarization_japanese_web_page.py {URL}\n```\n- {URL}: web site URL.\n\n### Demo\n\nLet's summarize this page: [\u81ea\u52d5\u8981\u7d04 - Wikipedia](https://ja.wikipedia.org/wiki/%E8%87%AA%E5%8B%95%E8%A6%81%E7%B4%84).\n\n```\npython demo/demo_summarization_japanese_web_page.py https://ja.wikipedia.org/wiki/%E8%87%AA%E5%8B%95%E8%A6%81%E7%B4%84\n```\n\nThe result is as follows.\n```\n \u81ea\u52d5\u8981\u7d04 \uff08\u3058\u3069\u3046\u3088\u3046\u3084\u304f\uff09\u306f\u3001 \u30b3\u30f3\u30d4\u30e5\u30fc\u30bf\u30d7\u30ed\u30b0\u30e9\u30e0 \u3092\u7528\u3044\u3066\u3001\u6587\u66f8\u304b\u3089\u305d\u306e\u8981\u7d04\u3092\u4f5c\u6210\u3059\u308b\u51e6\u7406\u3067\u3042\u308b\u3002\n\n\u81ea\u52d5\u8981\u7d04\u306e\u5fdc\u7528\u5148\u306e1\u3064\u306f Google \u306a\u3069\u306e \u691c\u7d22\u30a8\u30f3\u30b8\u30f3 \u3067\u3042\u308b\u304c\u3001\u3082\u3061\u308d\u3093\u72ec\u7acb\u3057\u305f1\u3064\u306e\u8981\u7d04\u30d7\u30ed\u30b0\u30e9\u30e0\u3068\u3044\u3063\u305f\u3082\u306e\u3082\u3042\u308a\u3046\u308b\u3002\n\n \u5358\u4e00\u6587\u66f8\u8981\u7d04\u3068\u8907\u6570\u6587\u66f8\u8981\u7d04 [ \u7de8\u96c6 ] \u5358\u4e00\u6587\u66f8\u8981\u7d04 \u306f\u3001\u5358\u4e00\u306e\u6587\u66f8\u3092\u8981\u7d04\u306e\u5bfe\u8c61\u3068\u3059\u308b\u3082\u306e\u3067\u3042\u308b\u3002\n\n\u4f8b\u3048\u3070\u30011\u3064\u306e\u65b0\u805e\u8a18\u4e8b\u3092\u8981\u7d04\u3059\u308b\u4f5c\u696d\u306f\u5358\u4e00\u6587\u66f8\u8981\u7d04\u3067\u3042\u308b\u3002\n```\n\n## Usecase: Japanese Web-Page Summarization with N-gram\n\nThe minimum unit of token is not necessarily `a word` in automatic summarization. `N-gram` is also applicable to the tokenization.\n\nRun the batch program: [demo/demo_with_n_gram_japanese_web_page.py](https://github.com/chimera0/accel-brain-code/blob/master/Automatic-Summarization/demo/demo_with_n_gram_japanese_web_page.py)\n\n```\npython demo_with_n_gram_japanese_web_page.py {URL}\n```\n- {URL}: web site URL.\n\n### Demo\n\nLet's summarize this page:[\u60c5\u5831\u691c\u7d22 - Wikipedia](https://ja.wikipedia.org/wiki/%E6%83%85%E5%A0%B1%E6%A4%9C%E7%B4%A2).\n\n```\npython demo/demo_with_n_gram_japanese_web_page.py https://ja.wikipedia.org/wiki/%E6%83%85%E5%A0%B1%E6%A4%9C%E7%B4%A2\n```\n\nThe result is as follows.\n\n```\n\u60c5\u5831\u691c\u7d22\u30a2\u30eb\u30b4\u30ea\u30ba\u30e0\u306e\u8a73\u7d30\u306b\u3064\u3044\u3066\u306f \u60c5\u5831\u691c\u7d22\u30a2\u30eb\u30b4\u30ea\u30ba\u30e0 \u3092\u53c2\u7167\u306e\u3053\u3068\u3002\n\n \u30d1\u30bf\u30fc\u30f3\u30de\u30c3\u30c1\u30f3\u30b0 \u691c\u7d22\u8cea\u554f\u3068\u3057\u3066\u5165\u529b\u3055\u308c\u305f\u8868\u73fe\u3092\u305d\u306e\u307e\u307e\u542b\u3080\u6587\u66f8\u3092\u691c\u7d22\u3059\u308b\u30a2\u30eb\u30b4\u30ea\u30ba\u30e0\u3002\n\n \u30d9\u30af\u30c8\u30eb\u7a7a\u9593\u30e2\u30c7\u30eb \u30ad\u30fc\u30ef\u30fc\u30c9\u7b49\u3092\u5404 \u6b21\u5143 \u3068\u3057\u3066\u8a2d\u5b9a\u3057\u305f\u9ad8\u6b21\u5143 \u30d9\u30af\u30c8\u30eb\u7a7a\u9593 \u3092\u60f3\u5b9a\u3057\u3001\u691c\u7d22\u306e\u5bfe\u8c61\u3068\u3059\u308b\u30c7\u30fc\u30bf\u3084\u30e6\u30fc\u30b6\u306b\u3088\u308b\u691c\u7d22\u8cea\u554f\u306b\u4f55\u3089\u304b\u306e\u52a0\u5de5\u3092\u884c\u3044 \u30d9\u30af\u30c8\u30eb \u3092\u751f\u6210\u3059\u308b\n```\n\n## Usecase: Summarization, filtering the mutually similar, tautological, pleonastic, or redundant sentences\n\nIf the sentences you want to summarize consist of repetition of same or similar sense in different words, the summary results may also be redundant. Then before summarization, you should filter the mutually similar, tautological, pleonastic, or redundant sentences to extract features having an information quantity. The function of `SimilarityFilter` is to cut-off the sentences having the state of resembling or being alike by calculating the similarity measure.\n\nBut there is no reason to stick to a single similarity concept. *Modal logically*, the definition of this concept is *contingent*, like the concept of *distance*. Even if one similarity or distance function is defined in relation to a problem setting, there are always *functionally equivalent* algorithms to solve the problem setting. Then this library has a wide variety of subtyping polymorphisms of `SimilarityFilter`.\n\n### Dice, Jaccard, and Simpson\n\nThere are some classes for calculating the similarity measure. In this library, **Dice coefficient**, **Jaccard coefficient**, and **Simpson coefficient** between two sentences is calculated as follows.\n\nImport Python modules for calculating the similarity measure and instantiate the object.\n\n```python\nfrom pysummarization.similarityfilter.dice import Dice\nsimilarity_filter = Dice()\n```\n\nor\n\n```python\nfrom pysummarization.similarityfilter.jaccard import Jaccard\nsimilarity_filter = Jaccard()\n```\n\nor\n\n```python\nfrom pysummarization.similarityfilter.simpson import Simpson\nsimilarity_filter = Simpson()\n```\n\n### Functional equivalent: Combination of Tf-Idf and Cosine similarity\n\nIf you want to calculate similarity with **Tf-Idf cosine similarity**, instantiate `TfIdfCosine`.\n\n```python\nfrom pysummarization.similarityfilter.tfidf_cosine import TfIdfCosine\nsimilarity_filter = TfIdfCosine()\n```\n\n### Calculating similarity\n\nIf you want to calculate similarity between two sentences, call `calculate` method as follow.\n\n```python\n# Tokenized sentences\ntoken_list_x = [\"Dice\", \"coefficient\", \"is\", \"a\", \"similarity\", \"measure\", \".\"]\ntoken_list_y = [\"Jaccard\", \"coefficient\", \"is\", \"a\", \"similarity\", \"measure\", \".\"]\n# 0.75\nsimilarity_num = similarity_filter.calculate(token_list_x, token_list_y)\n```\n\n### Filtering similar sentences and summarization\n\nThe function of these methods is to cut-off mutually similar sentences. In text summarization, basic usage of this function is as follow. After all, `SimilarityFilter` is delegated as well as GoF's Strategy Pattern.\n\nImport Python modules for NLP and text summarization.\n\n```python\nfrom pysummarization.nlp_base import NlpBase\nfrom pysummarization.nlpbase.auto_abstractor import AutoAbstractor\nfrom pysummarization.tokenizabledoc.mecab_tokenizer import MeCabTokenizer\nfrom pysummarization.abstractabledoc.top_n_rank_abstractor import TopNRankAbstractor\nfrom pysummarization.similarityfilter.tfidf_cosine import TfIdfCosine\n```\n\nInstantiate object of the NLP.\n\n```python\n# The object of the NLP.\nnlp_base = NlpBase()\n# Set tokenizer. This is japanese tokenizer with MeCab.\nnlp_base.tokenizable_doc = MeCabTokenizer()\n```\n\nInstantiate object of `SimilarityFilter` and set the cut-off threshold.\n\n```python\n# The object of `Similarity Filter`. \n# The similarity observed by this object is so-called cosine similarity of Tf-Idf vectors.\nsimilarity_filter = TfIdfCosine()\n\n# Set the object of NLP.\nsimilarity_filter.nlp_base = nlp_base\n\n# If the similarity exceeds this value, the sentence will be cut off.\nsimilarity_filter.similarity_limit = 0.25\n```\n\nPrepare sentences you want to summarize.\n\n```python\n# Summarized sentences (sited from http://ja.uncyclopedia.info/wiki/%E5%86%97%E8%AA%9E%E6%B3%95).\ndocument = \"\u5197\u8a9e\u6cd5\uff08\u3058\u3087\u3046\u3054\u307b\u3046\u3001\u30ec\u30c7\u30e5\u30f3\u30c0\u30f3\u30b7\u30fc\u3001redundancy\u3001j\u014dgoh\u014d\uff09\u3068\u306f\u3001\u4f55\u5ea6\u3082\u4f55\u5ea6\u3082\u7e70\u308a\u8fd4\u3057\u91cd\u306d\u3066\u91cd\u8907\u3057\u3066\u524d\u8ff0\u3055\u308c\u305f\u306e\u3068\u540c\u3058\u610f\u5473\u306e\u540c\u69d8\u3067\u3042\u308b\u540c\u610f\u7fa9\u306e\u6587\u7ae0\u3092\u3001\u5fc5\u8981\u3042\u308b\u3044\u306f\u8aac\u660e\u304b\u7406\u89e3\u3092\u8981\u6c42\u3055\u308c\u305f\u4ee5\u4e0a\u304b\u3001\u4f1d\u3048\u4f1d\u9054\u3057\u305f\u3044\u3068\u610f\u56f3\u3055\u308c\u305f\u3001\u3042\u308b\u3044\u306f\u8868\u3057\u8868\u73fe\u3057\u305f\u3044\u610f\u5473\u4ee5\u4e0a\u306b\u3001\u7e70\u308a\u8fd4\u3057\u91cd\u306d\u3066\u91cd\u8907\u3057\u3066\u7e70\u308a\u8fd4\u3059\u3053\u3068\u306b\u3088\u308b\u3001\u4e0d\u5fc5\u8981\u3067\u3042\u308b\u304b\u3001\u307e\u305f\u306f\u4f59\u5206\u306a\u4f59\u8a08\u3067\u3042\u308b\u6587\u7ae0\u306e\u3001\u5fc5\u8981\u4ee5\u4e0a\u306e\u4f7f\u7528\u3067\u3042\u308a\u3001\u4f55\u5ea6\u3082\u4f55\u5ea6\u3082\u7e70\u308a\u8fd4\u3057\u91cd\u306d\u3066\u91cd\u8907\u3057\u3066\u524d\u8ff0\u3055\u308c\u305f\u306e\u3068\u540c\u3058\u610f\u5473\u306e\u540c\u69d8\u306e\u6587\u7ae0\u3092\u3001\u5fc5\u8981\u3042\u308b\u3044\u306f\u8aac\u660e\u304b\u7406\u89e3\u3092\u8981\u6c42\u3055\u308c\u305f\u4ee5\u4e0a\u304b\u3001\u4f1d\u3048\u4f1d\u9054\u3057\u305f\u3044\u3068\u610f\u56f3\u3055\u308c\u305f\u3001\u3042\u308b\u3044\u306f\u8868\u3057\u8868\u73fe\u3057\u305f\u3044\u610f\u5473\u4ee5\u4e0a\u306b\u3001\u7e70\u308a\u8fd4\u3057\u91cd\u306d\u3066\u91cd\u8907\u3057\u3066\u7e70\u308a\u8fd4\u3059\u3053\u3068\u306b\u3088\u308b\u3001\u4e0d\u5fc5\u8981\u3067\u3042\u308b\u304b\u3001\u307e\u305f\u306f\u4f59\u5206\u306a\u6587\u7ae0\u306e\u3001\u5fc5\u8981\u4ee5\u4e0a\u306e\u4f7f\u7528\u3067\u3042\u308b\u3002\u3053\u308c\u304c\u5197\u8a9e\u6cd5\uff08\u3058\u3087\u3046\u3054\u307b\u3046\u3001\u30ec\u30c7\u30e5\u30f3\u30c0\u30f3\u30b7\u30fc\u3001redundancy\u3001j\u014dgoh\u014d\uff09\u3067\u3042\u308b\u3002\u57fa\u672c\u7684\u306b\u3001\u5197\u8a9e\u6cd5\uff08\u3058\u3087\u3046\u3054\u307b\u3046\u3001\u30ec\u30c7\u30e5\u30f3\u30c0\u30f3\u30b7\u30fc\u3001redundancy\u3001j\u014dgoh\u014d\uff09\u304c\u591a\u304f\u306e\u5834\u5408\u306b\u304a\u3044\u3066\u6982\u3057\u3066\u4e00\u822c\u7684\u306b\u7e70\u308a\u8fd4\u3055\u308c\u308b\u901a\u5e38\u306e\u5834\u5408\u306f\u3001\u666e\u901a\u3001\u540c\u3058\u540c\u69d8\u306e\u767a\u60f3\u3084\u601d\u8003\u3084\u6982\u5ff5\u3084\u7269\u4e8b\u3092\u8868\u3057\u8868\u73fe\u3059\u308b\u5225\u3005\u306e\u7570\u306a\u3063\u305f\u6587\u7ae0\u3084\u5358\u8a9e\u3084\u8a00\u8449\u304c\u4f55\u56de\u3082\u4f55\u5ea6\u3082\u4f59\u5206\u306b\u7e70\u308a\u8fd4\u3055\u308c\u3001\u305d\u306e\u7d50\u679c\u3068\u3057\u3066\u767a\u8a00\u8005\u306e\u8003\u3048\u304c\u4f55\u56de\u3082\u4f55\u5ea6\u3082\u8a00\u3044\u76f4\u3055\u308c\u3001\u4e8b\u5b9f\u4e0a\u3001\u5b9f\u969b\u306b\u540c\u3058\u540c\u69d8\u306e\u767a\u8a00\u304c\u4f55\u56de\u3082\u4f55\u5ea6\u306b\u3082\u308f\u305f\u308a\u3001\u5e7e\u91cd\u306b\u3082\u8a00\u3044\u63db\u3048\u3089\u308c\u3001\u304b\u3064\u3001\u540c\u3058\u3053\u3068\u304c\u4f55\u56de\u3082\u4f55\u5ea6\u3082\u7e70\u308a\u8fd4\u3057\u91cd\u8907\u3057\u3066\u904e\u5270\u306b\u56de\u6570\u3092\u91cd\u306d\u524d\u8ff0\u3055\u308c\u305f\u306e\u3068\u540c\u3058\u610f\u5473\u306e\u540c\u69d8\u306e\u6587\u7ae0\u304c\u4f55\u5ea6\u3082\u4f55\u5ea6\u3082\u4e0d\u5fc5\u8981\u306b\u7e70\u308a\u8fd4\u3055\u308c\u308b\u3002\u901a\u5e38\u306e\u5834\u5408\u3001\u591a\u304f\u306e\u5834\u5408\u306b\u304a\u3044\u3066\u6982\u3057\u3066\u4e00\u822c\u7684\u306b\u3053\u306e\u3088\u3046\u306b\u5197\u8a9e\u6cd5\uff08\u3058\u3087\u3046\u3054\u307b\u3046\u3001\u30ec\u30c7\u30e5\u30f3\u30c0\u30f3\u30b7\u30fc\u3001redundancy\u3001j\u014dgoh\u014d\uff09\u304c\u7e70\u308a\u8fd4\u3055\u308c\u308b\u3002\"\n```\n\nInstantiate object of `AutoAbstractor` and call the method.\n\n```python\n# The object of automatic sumamrization.\nauto_abstractor = AutoAbstractor()\n# Set tokenizer. This is japanese tokenizer with MeCab.\nauto_abstractor.tokenizable_doc = MeCabTokenizer()\n# Object of abstracting and filtering document.\nabstractable_doc = TopNRankAbstractor()\n# Delegate the objects and execute summarization.\nresult_dict = auto_abstractor.summarize(document, abstractable_doc, similarity_filter)\n```\n\n### Demo\n\nLet's summarize this page:[\u5faa\u74b0\u8ad6\u6cd5 - Wikipedia](https://ja.wikipedia.org/wiki/%E5%BE%AA%E7%92%B0%E8%AB%96%E6%B3%95).\n\nRun the batch program: [demo/demo_similarity_filtering_japanese_web_page.py](https://github.com/chimera0/accel-brain-code/blob/master/Automatic-Summarization/demo/demo_similarity_filtering_japanese_web_page.py)\n\n```\npython demo/demo_similarity_filtering_japanese_web_page.py {URL} {SimilarityFilter} {SimilarityLimit}\n```\n- {URL}: web site URL.\n- {SimilarityFilter}: The object of `SimilarityFilter`:\n * `Dice`\n * `Jaccard`\n * `Simpson`\n * `TfIdfCosine`\n- {SimilarityLimit}: The cut-off threshold.\n\nFor instance, command line argument is as follows:\n\n```\npython demo/demo_similarity_filtering_japanese_web_page.py https://ja.wikipedia.org/wiki/%E5%BE%AA%E7%92%B0%E8%AB%96%E6%B3%95 Jaccard 0.3\n```\n\nThe result is as follows.\n\n```\n\u5faa\u74b0\u8ad6\u6cd5 \u51fa\u5178: \u30d5\u30ea\u30fc\u767e\u79d1\u4e8b\u5178\u300e\u30a6\u30a3\u30ad\u30da\u30c7\u30a3\u30a2\uff08Wikipedia\uff09\u300f \u79fb\u52d5\u5148: \u6848\u5185 \u3001 \u691c\u7d22 \u5faa\u74b0\u8ad6\u6cd5 \uff08\u3058\u3085\u3093\u304b\u3093\u308d\u3093\u307d\u3046\u3001circular reasoning, circular logic, vicious circle [1] \uff09\u3068\u306f\u3001 \u3042\u308b\u547d\u984c\u306e \u8a3c\u660e \u306b\u304a\u3044\u3066\u3001\u305d\u306e\u547d\u984c\u3092\u4eee\u5b9a\u3057\u305f\u8b70\u8ad6\u3092\u7528\u3044\u308b\u3053\u3068 [1] \u3002\n\n\u8a3c\u660e\u3059\u3079\u304d\u7d50\u8ad6\u3092\u524d\u63d0\u3068\u3057\u3066\u7528\u3044\u308b\u8ad6\u6cd5 [2] \u3002\n\n \u3042\u308b\u7528\u8a9e\u306e \u5b9a\u7fa9 \u3092\u4e0e\u3048\u308b\u8868\u73fe\u306e\u4e2d\u306b\u305d\u306e\u7528\u8a9e\u81ea\u4f53\u304c\u672c\u8cea\u7684\u306b\u767b\u5834\u3057\u3066\u3044\u308b\u3053\u3068 [1]\n```\n\n## Usecase: Summarization with Neural Network Language Model.\n\nAccording to the neural networks theory, and in relation to manifold hypothesis, it is well known that multilayer neural networks can learn features of observed data points and have the feature points in hidden layer. High-dimensional data can be converted to low-dimensional codes by training the model such as **Stacked Auto-Encoder** and **Encoder/Decoder** with a small central layer to reconstruct high-dimensional input vectors. This function of dimensionality reduction facilitates feature expressions to calculate similarity of each data point.\n\nThis library provides **Encoder/Decoder based on LSTM**, which makes it possible to extract series features of natural sentences embedded in deeper layers by **sequence-to-sequence learning**. *Intuitively* speaking, similarities of the series feature points correspond to similarities of the observed data points. If we believe this hypothesis, the following models become in principle possible.\n\n### retrospective sequence-to-sequence learning(re-seq2seq).\n\nThe concept of the re-seq2seq(Zhang, K. et al., 2018) provided inspiration to this library. This model is a new sequence learning model mainly in the field of Video Summarizations. \"The key idea behind re-seq2seq is to measure how well the machine-generated summary is similar to the original video in an abstract semantic space\" (Zhang, K. et al., 2018, p3).\n\nThe encoder of a seq2seq model observes the original video and output feature points which represents the semantic meaning of the observed data points. Then the feature points is observed by the decoder of this model. Additionally, in the re-seq2seq model, the outputs of the decoder is propagated to a retrospective encoder, which infers feature points to represent the semantic meaning of the summary. \"If the summary preserves the important and relevant information in the original video, then we should expect that the two embeddings are similar (e.g. in Euclidean distance)\" (Zhang, K. et al., 2018, p3).\n\n
\n\n

Zhang, K., Grauman, K., & Sha, F. (2018). Retrospective Encoders for Video Summarization. In Proceedings of the European Conference on Computer Vision (ECCV) (pp. 383-399), p2.

\n
\n\nThis library refers to this intuitive insight above to apply the model to text summarizations. Like videos, semantic feature representation based on representation learning of manifolds is also possible in text summarizations.\n\nThe intuition in the design of their loss function is also suggestive. \"The intuition behind our modeling is that the outputs should convey the same amount of information as the inputs. For summarization, this is precisely the goal: a good summary should be such that after viewing the summary, users would get about the same amount of information as if they had viewed the original video\" (Zhang, K. et al., 2018, p7).\n\nBut the model in this library and Zhang, K. et al.(2018) are different in some respects from the relation with the specification of the Deep Learning library: [pydbm](https://github.com/chimera0/accel-brain-code/tree/master/Deep-Learning-by-means-of-Design-Pattern). First, Encoder/Decoder based on LSTM is not designed as a hierarchical structure. Second, it is possible to introduce regularization techniques which are not discussed in Zhang, K. et al.(2018) such as the dropout, the gradient clipping, and limitation of weights. Third, the regression loss function for matching summaries is simplified in terms of calculation efficiency in this library.\n\n#### Building retrospective sequence-to-sequence learning(re-seq2seq).\n\nImport Python modules.\n\n```python\nfrom pysummarization.abstractablesemantics.re_seq_2_seq import ReSeq2Seq\n```\n\nImport a tokenizer and a vectorizer.\n\n\n```python\nfrom pysummarization.nlp_base import NlpBase\nfrom pysummarization.tokenizabledoc.simple_tokenizer import SimpleTokenizer\nfrom pysummarization.vectorizabletoken.t_hot_vectorizer import THotVectorizer\n\n# `str` of your document.\ndocument = \"Your document.\"\n\nnlp_base = NlpBase()\nnlp_base.delimiter_list = [\".\", \"\\n\"]\ntokenizable_doc = SimpleTokenizer()\nsentence_list = nlp_base.listup_sentence(document)\ntoken_list = tokenizable_doc.tokenize(document)\ntoken_arr = np.array(token_list)\n```\n\nSetup the vectorizer.\n\n```python\nvectorizable_token = THotVectorizer(token_list=token_arr.tolist())\nvector_list = vectorizable_token.vectorize(token_list=token_arr.tolist())\nvector_arr = np.array(vector_list)\n```\n\nThe `ReSeq2Seq` has a `learn` method, to execute learning observed data points. This method can receive a `np.ndarray` of observed data points, which is a rank-3 array-like or sparse matrix of shape: (`The number of samples`, `The length of cycle`, `The number of features`). For example, the `np.ndarray` set as follows. \n\n```python\n# The length of sequences.\nseq_len = 5\n\nobserved_list = []\nfor i in range(seq_len, vector_arr.shape[0]):\n observed_list.append(vector_arr[i-seq_len:i])\nobserved_arr = np.array(observed_list)\n```\n\nInstantiate `ReSeq2Seq` and input hyperparameters.\n\n```python\nabstractable_semantics = ReSeq2Seq(\n # A margin parameter for the mismatched pairs penalty.\n margin_param=0.01,\n # Tradeoff parameter for loss function.\n retrospective_lambda=0.5,\n # Tradeoff parameter for loss function.\n retrospective_eta=0.5,\n # is-a `EncoderDecoderController`.\n # If `None`, this class will build the model with default parameters.\n encoder_decoder_controller=None,\n # is-a `LSTMModel` as a retrospective encoder(or re-encoder).\n # If `None`, this class will build the model with default parameters.\n retrospective_encoder=None,\n # The default parameter. The number of units in input layers.\n input_neuron_count=observed_arr.shape[-1],\n # The default parameter. The number of units in hidden layers.\n hidden_neuron_count=observed_arr.shape[-1],\n # The default parameter. Regularization for weights matrix to repeat multiplying\n # the weights matrix and `0.9` until $\\sum_{j=0}^{n}w_{ji}^2 < weight\\_limit$.\n weight_limit=0.5,\n # The default parameter. Probability of dropout.\n dropout_rate=0.0,\n # The default parameter.\n # The epochs in mini-batch pre-learning Encoder/Decoder.\n # If this value is `0`, no pre-learning will be executed\n # in this class's method `learn`. In this case, you should \n # do pre-learning before calling `learn`.\n pre_learning_epochs=50,\n # The default parameter. \n # The epochs in mini-batch training Encoder/Decoder and retrospective encoder.\n epochs=500,\n # The default parameter. Batch size.\n batch_size=20,\n # The default parameter. Learning rate.\n learning_rate=1e-05,\n # The default parameter. \n # Attenuate the `learning_rate` by a factor of this value every `attenuate_epoch`.\n learning_attenuate_rate=0.1,\n # The default parameter. \n # Attenuate the `learning_rate` by a factor of `learning_attenuate_rate` every `attenuate_epoch`.\n # Additionally, in relation to regularization,\n # this class constrains weight matrixes every `attenuate_epoch`.\n attenuate_epoch=50,\n # The default parameter. Threshold of the gradient clipping.\n grad_clip_threshold=1e+10,\n # The default parameter. \n # The length of sequneces in Decoder with Attention model.\n seq_len=seq_len,\n # The default parameter.\n # Refereed maxinum step `t` in Backpropagation Through Time(BPTT).\n # If `0`, this class referes all past data in BPTT.\n bptt_tau=seq_len,\n # Size of Test data set. If this value is `0`, the validation will not be executed.\n test_size_rate=0.3,\n # Tolerance for the optimization.\n # When the loss or score is not improving by at least tol \n # for two consecutive iterations, convergence is considered \n # to be reached and training stops.\n tol=1e-10,\n # Tolerance for deviation of loss.\n tld=1e-10\n)\n```\n\nExecute `learn` method.\n\n```python\nabstractable_semantics.learn(\n observed_arr=observed_arr, \n target_arr=observed_arr\n)\n```\n\nExecute `summarize` method to extract summaries.\n\n```python\nabstract_list = abstractable_semantics.summarize(\n # `np.ndarray` of observed data points.\n observed_arr,\n # is-a `VectorizableToken`.\n vectorizable_token,\n # A `list` that contains `str`s of all sentences.\n sentence_list,\n # The number of extracted sentences.\n limit=5\n)\n```\n\nThe `abstract_list` is a `list` that contains `str`s of sentences.\n\n### Functional equivalent: LSTM-based Encoder/Decoder scheme for Anomaly Detection (EncDec-AD).\n\nThis library applies the Encoder-Decoder scheme for Anomaly Detection (EncDec-AD) to text summarizations by intuition. In this scheme, LSTM-based Encoder/Decoder or so-called the sequence-to-sequence(Seq2Seq) model learns to reconstruct normal time-series behavior, and thereafter uses reconstruction error to detect anomalies.\n\nMalhotra, P., et al. (2016) showed that EncDecAD paradigm is robust and can detect anomalies from predictable, unpredictable, periodic, aperiodic, and quasi-periodic time-series. Further, they showed that the paradigm is able to detect anomalies from short time-series (length as small as 30) as well as long time-series (length as large as 500).\n\n
\n

Cho, K., Van Merri\u00ebnboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., & Bengio, Y. (2014). Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078., p2.

\n
\n\nThis library refers to the intuitive insight in relation to the use case of reconstruction error to detect anomalies above to apply the model to text summarization. As exemplified by Seq2Seq paradigm, document and sentence which contain tokens of text can be considered as time-series features. The anomalies data detected by EncDec-AD should have to express something about the text.\n\nFrom the above analogy, this library introduces two conflicting intuitions. On the one hand, the anomalies data may catch observer's eye from the viewpoints of rarity or amount of information as the indicator of natural language processing like TF-IDF shows. On the other hand, the anomalies data may be ignorable noise as mere outlier.\n\nIn any case, this library deduces the function and potential of EncDec-AD in text summarization is to draw the distinction of normal and anomaly texts and is to filter the one from the other.\n\nNote that the model in this library and Malhotra, P., et al. (2016) are different in some respects\nfrom the relation with the specification of the Deep Learning library: [pydbm](https://github.com/chimera0/accel-brain-code/tree/master/Deep-Learning-by-means-of-Design-Pattern). First, weight matrix of encoder and decoder is not shered. Second, it is possible to introduce regularization techniques which are not discussed in Malhotra, P., et al. (2016) such as the dropout, the gradient clipping, and limitation of weights. Third, the loss function for reconstruction error is not limited to the L2 norm.\n\n#### Building LSTM-based Encoder/Decoder scheme for Anomaly Detection (EncDec-AD).\n\nImport Python modules.\n\n```python\nfrom pysummarization.abstractablesemantics.enc_dec_ad import EncDecAD\nabstractable_semantics = EncDecAD(\n # is-a `EncoderDecoderController`.\n # If `None`, this class will build the model with default parameters.\n encoder_decoder_controller=None,\n # The default parameter. The number of units in input layers.\n input_neuron_count=observed_arr.shape[-1],\n # The default parameter. The number of units in hidden layers.\n hidden_neuron_count=observed_arr.shape[-1],\n # The default parameter. Regularization for weights matrix to repeat multiplying\n # the weights matrix and `0.9` until $\\sum_{j=0}^{n}w_{ji}^2 < weight\\_limit$.\n weight_limit=0.5,\n # The default parameter. Probability of dropout.\n dropout_rate=0.0,\n # The default parameter. \n # The epochs in mini-batch training Encoder/Decoder and retrospective encoder.\n epochs=500,\n # The default parameter. Batch size.\n batch_size=20,\n # The default parameter. Learning rate.\n learning_rate=1e-05,\n # The default parameter. \n # Attenuate the `learning_rate` by a factor of this value every `attenuate_epoch`.\n learning_attenuate_rate=0.1,\n # The default parameter.\n # Attenuate the `learning_rate` by a factor of `learning_attenuate_rate` every `attenuate_epoch`.\n # Additionally, in relation to regularization,\n # this class constrains weight matrixes every `attenuate_epoch`.\n attenuate_epoch=50,\n # The default parameter. The length of sequneces in Decoder with Attention model.\n seq_len=seq_len,\n # The default parameter. \n # Refereed maxinum step `t` in Backpropagation Through Time(BPTT).\n # If `0`, this class referes all past data in BPTT.\n bptt_tau=seq_len,\n # The default parameter. \n # Size of Test data set. If this value is `0`, the validation will not be executed.\n test_size_rate=0.3\n)\n```\n\nExecute `learn` method.\n\n```python\nabstractable_semantics.learn(\n observed_arr=observed_arr, \n target_arr=observed_arr\n)\n```\n\nExecute `summarize` method to extract summaries.\n\n```python\nabstract_list = abstractable_semantics.summarize(\n # `np.ndarray` of observed data points.\n observed_arr,\n # is-a `VectorizableToken`.\n vectorizable_token,\n # A `list` that contains `str`s of all sentences.\n sentence_list,\n # The number of extracted sentences.\n limit=5\n)\n```\n\nThe `abstract_list` is a `list` that contains `str`s of sentences.\n\n# References\n\n- Boulanger-Lewandowski, N., Bengio, Y., & Vincent, P. (2012). Modeling temporal dependencies in high-dimensional sequences: Application to polyphonic music generation and transcription. arXiv preprint arXiv:1206.6392.\n- Cho, K., Van Merri\u00ebnboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., & Bengio, Y. (2014). Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078.\n- Luhn, Hans Peter. \"The automatic creation of literature abstracts.\" IBM Journal of research and development 2.2 (1958): 159-165.\n- Malhotra, P., Ramakrishnan, A., Anand, G., Vig, L., Agarwal, P., & Shroff, G. (2016). LSTM-based encoder-decoder for multi-sensor anomaly detection. arXiv preprint arXiv:1607.00148.\n- Matthew A. Russell\u3000\u8457\u3001\u4f50\u85e4 \u654f\u7d00\u3001\u702c\u6238\u53e3 \u5149\u5b8f\u3001\u539f\u5ddd \u6d69\u4e00\u3000\u76e3\u8a33\u3001\u9577\u5c3e \u9ad8\u5f18\u3000\u8a33\u300e\u5165\u9580 \u30bd\u30fc\u30b7\u30e3\u30eb\u30c7\u30fc\u30bf \u7b2c2\u7248\u2015\u2015\u30bd\u30fc\u30b7\u30e3\u30eb\u30a6\u30a7\u30d6\u306e\u30c7\u30fc\u30bf\u30de\u30a4\u30cb\u30f3\u30b0\u300f 2014\u5e7406\u6708 \u767a\u884c\n- Sutskever, I., Hinton, G. E., & Taylor, G. W. (2009). The recurrent temporal restricted boltzmann machine. In Advances in Neural Information Processing Systems (pp. 1601-1608).\n- Zhang, K., Grauman, K., & Sha, F. (2018). Retrospective Encoders for Video Summarization. In Proceedings of the European Conference on Computer Vision (ECCV) (pp. 383-399).\n\n## More detail demos\n\n- [Web\u30af\u30ed\u30fc\u30e9\u578b\u4eba\u5de5\u77e5\u80fd\uff1a\u30ad\u30e1\u30e9\u30fb\u30cd\u30c3\u30c8\u30ef\u30fc\u30af\u306e\u4ed5\u69d8](https://media.accel-brain.com/_chimera-network-is-web-crawling-ai/) (Japanese)\n - 20001 bots are running as 20001 web-crawlers and 20001 web-scrapers.\n- [\u300c\u4ee3\u7406\u6f14\u7b97\u300d\u4e00\u89a7 | Welcome to Singularity](https://media.accel-brain.com/category/agency-operation/) (Japanese)\n - 20001 bots are running as 20001 blogers and 20001 \"content curation writers\".\n\n## Related PoC\n\n- [Web\u30af\u30ed\u30fc\u30e9\u578b\u4eba\u5de5\u77e5\u80fd\u306b\u3088\u308b\u30d1\u30e9\u30c9\u30c3\u30af\u30b9\u63a2\u7d22\u66b4\u9732\u6a5f\u80fd\u306e\u793e\u4f1a\u9032\u5316\u8ad6](https://accel-brain.com/social-evolution-of-exploration-and-exposure-of-paradox-by-web-crawling-type-artificial-intelligence/) (Japanese)\n - [World-Wide Web\u306e\u793e\u4f1a\u69cb\u9020\u3068Web\u30af\u30ed\u30fc\u30e9\u578b\u4eba\u5de5\u77e5\u80fd\u306e\u610f\u5473\u8ad6](https://accel-brain.com/social-evolution-of-exploration-and-exposure-of-paradox-by-web-crawling-type-artificial-intelligence/sozialstruktur-des-world-wide-web-und-semantik-der-kunstlichen-intelligenz-des-web-crawlers/)\n - [\u610f\u5473\u8ad6\u306e\u610f\u5473\u8ad6\u3001\u89b3\u5bdf\u306e\u89b3\u5bdf](https://accel-brain.com/social-evolution-of-exploration-and-exposure-of-paradox-by-web-crawling-type-artificial-intelligence/semantik-der-semantik-und-beobachtung-der-beobachtung/)\n- [\u6df1\u5c64\u5f37\u5316\u5b66\u7fd2\u306e\u30d9\u30a4\u30ba\u4e3b\u7fa9\u7684\u306a\u60c5\u5831\u63a2\u7d22\u306b\u99c6\u52d5\u3055\u308c\u305f\u81ea\u7136\u8a00\u8a9e\u51e6\u7406\u306e\u610f\u5473\u8ad6](https://accel-brain.com/semantics-of-natural-language-processing-driven-by-bayesian-information-search-by-deep-reinforcement-learning/) (Japanese)\n - [\u5e73\u5747\u5834\u8fd1\u4f3c\u63a8\u8ad6\u306e\u7d71\u8a08\u529b\u5b66\u3001\u81ea\u5df1\u7b26\u53f7\u5316\u5668\u3068\u3057\u3066\u306e\u6df1\u5c64\u30dc\u30eb\u30c4\u30de\u30f3\u30de\u30b7\u30f3](https://accel-brain.com/semantics-of-natural-language-processing-driven-by-bayesian-information-search-by-deep-reinforcement-learning/tiefe-boltzmann-maschine-als-selbstkodierer/)\n - [\u6b63\u5247\u5316\u554f\u984c\u306b\u304a\u3051\u308b\u6575\u5bfe\u7684\u751f\u6210\u30cd\u30c3\u30c8\u30ef\u30fc\u30af(GANs)\u3068\u6575\u5bfe\u7684\u81ea\u5df1\u7b26\u53f7\u5316\u5668(AAEs)\u306e\u30cd\u30c3\u30c8\u30ef\u30fc\u30af\u69cb\u9020](https://accel-brain.com/semantics-of-natural-language-processing-driven-by-bayesian-information-search-by-deep-reinforcement-learning/regularisierungsproblem-und-gan/)\n - [\u30cb\u30e5\u30fc\u30e9\u30eb\u30cd\u30c3\u30c8\u30ef\u30fc\u30af\u8a00\u8a9e\u30e2\u30c7\u30eb\u306e\u81ea\u7136\u8a00\u8a9e\u51e6\u7406\u3068\u518d\u5e30\u7684\u30cb\u30e5\u30fc\u30e9\u30eb\u30cd\u30c3\u30c8\u30ef\u30fc\u30af\u306e\u30cd\u30c3\u30c8\u30ef\u30fc\u30af\u69cb\u9020](https://accel-brain.com/semantics-of-natural-language-processing-driven-by-bayesian-information-search-by-deep-reinforcement-learning/naturliche-sprachverarbeitung-des-neuronalen-netzwerkmodells-und-der-netzwerkstruktur-eines-rekursiven-neuronalen-netzwerks/)\n - [\u81ea\u7136\u8a00\u8a9e\u51e6\u7406\u306e\u30d1\u30e9\u30c9\u30c3\u30af\u30b9\u3001\u30d1\u30e9\u30c9\u30c3\u30af\u30b9\u306e\u81ea\u7136\u8a00\u8a9e\u51e6\u7406](https://accel-brain.com/semantics-of-natural-language-processing-driven-by-bayesian-information-search-by-deep-reinforcement-learning/naturliche-sprachverarbeitung-von-paradoxien-und-paradoxien-durch-naturliche-sprachverarbeitung/)\n- [\u300c\u4eba\u5de5\u306e\u7406\u60f3\u300d\u3092\u80cc\u666f\u3068\u3057\u305f\u300c\u4e07\u7269\u7167\u5fdc\u300d\u306e\u30c7\u30fc\u30bf\u30e2\u30c7\u30ea\u30f3\u30b0](https://accel-brain.com/data-modeling-von-korrespondenz-in-artificial-paradise/) (Japanese)\n - [\u904a\u6b69\u8005\u306e\u6a5f\u80fd\u7684\u7b49\u4fa1\u7269\u3068\u3057\u3066\u306eWeb\u30af\u30ed\u30fc\u30e9\u3001\u63a2\u7d22\u306e\u30a2\u30eb\u30b4\u30ea\u30ba\u30e0\u3068\u30a2\u30eb\u30b4\u30ea\u30ba\u30e0\u306e\u63a2\u7d22](https://accel-brain.com/data-modeling-von-korrespondenz-in-artificial-paradise/web-crawler-als-funktionelles-aquivalent-des-flaneurs/)\n\n## Author\n\n- chimera0(RUM)\n\n## Author URI\n\n- http://accel-brain.com/\n\n## License\n\n- GNU General Public License v2.0", "description_content_type": "text/markdown", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/chimera0/accel-brain-code/tree/master/Automatic-Summarization", "keywords": "Automatic summarization document abstraction abstract text filtering", "license": "GPL2", "maintainer": "", "maintainer_email": "", "name": "pysummarization", "package_url": "https://pypi.org/project/pysummarization/", "platform": "", "project_url": "https://pypi.org/project/pysummarization/", "project_urls": { "Homepage": "https://github.com/chimera0/accel-brain-code/tree/master/Automatic-Summarization" }, "release_url": "https://pypi.org/project/pysummarization/1.1.4/", "requires_dist": null, "requires_python": "", "summary": "pysummarization is Python library for the automatic summarization, document abstraction, and text filtering in relation to Encoder/Decoder based on LSTM and LSTM-RTRBM.", "version": "1.1.4" }, "last_serial": 5629744, "releases": { "1.0.1": [ { "comment_text": "", "digests": { "md5": "3edb91cf238a03111b7b67f0eda90394", "sha256": "7e81ebf2ac5c5f2ed247a0285161a67f9f7cac1ea804787e93ba217145b23727" }, "downloads": -1, "filename": "pysummarization-1.0.1.tar.gz", "has_sig": false, "md5_digest": "3edb91cf238a03111b7b67f0eda90394", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 1331, "upload_time": "2017-09-23T07:25:27", "url": "https://files.pythonhosted.org/packages/27/2e/8822a277c1f53e71690a39eed2e72d881b71744717c4678cc160be96fba3/pysummarization-1.0.1.tar.gz" } ], "1.0.2": [], "1.0.3": [ { "comment_text": "", "digests": { "md5": "849d55c27f6cc3db0fa5cf407c0c127c", "sha256": "17f897c14a0e3223efad5cda50f6600359e00a6c1fa642d292c9983a327bedd5" }, "downloads": -1, "filename": "pysummarization-1.0.3-py3-none-any.whl", "has_sig": false, "md5_digest": "849d55c27f6cc3db0fa5cf407c0c127c", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 13036, "upload_time": "2017-09-23T11:43:45", "url": "https://files.pythonhosted.org/packages/3e/9a/8dbe18d67a527937959d16f7af2343bca6d16a5d41a73d17902839f91725/pysummarization-1.0.3-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "81c60f9407e379181ac1f567dc6cda34", "sha256": "ae9222114e9b670efb8684419cb3cf67c5107dc79d9b3d3929d92c697b9e5ad7" }, "downloads": -1, "filename": "pysummarization-1.0.3.tar.gz", "has_sig": false, "md5_digest": "81c60f9407e379181ac1f567dc6cda34", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 8890, "upload_time": "2017-09-23T11:43:50", "url": "https://files.pythonhosted.org/packages/24/40/2d75a3712c625e1441ff5082c603a700641d102851baa01eb942efee50c6/pysummarization-1.0.3.tar.gz" } ], "1.0.4": [ { "comment_text": "", "digests": { "md5": "230d5e37b3c2851726210b7648411da8", "sha256": "48ddb51d0ab7f45f924187ae91274e453dc88e7542981e341cd7e40770768241" }, "downloads": -1, "filename": "pysummarization-1.0.4-py3-none-any.whl", "has_sig": false, "md5_digest": "230d5e37b3c2851726210b7648411da8", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 13068, "upload_time": "2017-09-23T11:55:07", "url": "https://files.pythonhosted.org/packages/7a/1d/08a806d9de12b1aed93ecba18e7a5e7230f78077ebbd7359f66ce0e0fff8/pysummarization-1.0.4-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "03610879425f7ddff3678a50687e546c", "sha256": "73e49f787f08516495cf53f9bb9a1747ee8917d35a81c4a943769ccecc52a172" }, "downloads": -1, "filename": "pysummarization-1.0.4.tar.gz", "has_sig": false, "md5_digest": "03610879425f7ddff3678a50687e546c", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 8916, "upload_time": "2017-09-23T11:55:10", "url": "https://files.pythonhosted.org/packages/c1/f7/f8b973e9493d075ba9c1af40f25cff56670622c5c6c436e976ae60da557f/pysummarization-1.0.4.tar.gz" } ], "1.0.5": [ { "comment_text": "", "digests": { "md5": "d4167e1ecd7d636b3eee5ad63fe690bb", "sha256": "5e8601e3a7699d49be4c239e51ba71289b9fde2ec8ab28c3580e44681ee1009b" }, "downloads": -1, "filename": "pysummarization-1.0.5-py2-none-any.whl", "has_sig": false, "md5_digest": "d4167e1ecd7d636b3eee5ad63fe690bb", "packagetype": "bdist_wheel", "python_version": "py2", "requires_python": null, "size": 18592, "upload_time": "2018-01-31T13:05:09", "url": "https://files.pythonhosted.org/packages/1b/2a/21f6e7726789747b5475ca675dea1777b8fced997b123cb2425cf3eb7e95/pysummarization-1.0.5-py2-none-any.whl" }, { "comment_text": "", "digests": { "md5": "61bc53386eade938762070a4e7798afd", "sha256": "8bc278e50dba6ed97598bc360d55a7e5c6baddce95f4b3b25d2bda5ab3d333d2" }, "downloads": -1, "filename": "pysummarization-1.0.5-py3-none-any.whl", "has_sig": false, "md5_digest": "61bc53386eade938762070a4e7798afd", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 29423, "upload_time": "2018-05-06T01:19:12", "url": "https://files.pythonhosted.org/packages/bd/68/e043593de7d1cdc878ee1f829462dbc891a722d43de9b467f02fe522ba5f/pysummarization-1.0.5-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "3163c0e2fd853de0114323fd3c859cbb", "sha256": "6e651d03073d8babd188b5c66cb2db946bc83da5965ab4623ab134df8f39fa71" }, "downloads": -1, "filename": "pysummarization-1.0.5.tar.gz", "has_sig": false, "md5_digest": "3163c0e2fd853de0114323fd3c859cbb", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 13776, "upload_time": "2018-01-31T13:05:11", "url": "https://files.pythonhosted.org/packages/18/eb/903a6ae86db424d4b6d8edbe8ca38806de7a38bbb3e0225757b23774bf72/pysummarization-1.0.5.tar.gz" } ], "1.0.6": [ { "comment_text": "", "digests": { "md5": "ee56ecaab9ef56ac6118299de0b164ee", "sha256": "6281b93e7a10b936c3001c69710abddeaebffd8c25ed2baa75cdcc94f33b42c7" }, "downloads": -1, "filename": "pysummarization-1.0.6.linux-x86_64.tar.gz", "has_sig": false, "md5_digest": "ee56ecaab9ef56ac6118299de0b164ee", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 26580, "upload_time": "2018-05-06T01:20:37", "url": "https://files.pythonhosted.org/packages/c7/e7/d016e639b2c7579e3319ce94ca1d0d849bcaf260d2e9f094fbe2be470f6b/pysummarization-1.0.6.linux-x86_64.tar.gz" }, { "comment_text": "", "digests": { "md5": "30f788589d0f871cba19b8ba9d7ab2f4", "sha256": "c9651e67891418da37c066451ba031db2d8e068ebf3131df38328a7e39702675" }, "downloads": -1, "filename": "pysummarization-1.0.6-py3-none-any.whl", "has_sig": false, "md5_digest": "30f788589d0f871cba19b8ba9d7ab2f4", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 29428, "upload_time": "2018-05-06T01:20:35", "url": "https://files.pythonhosted.org/packages/17/a2/21402b26e51fa963ee4d9d0caf8bca3984bcc1f38be19d9f76b23f12d988/pysummarization-1.0.6-py3-none-any.whl" } ], "1.0.7": [ { "comment_text": "", "digests": { "md5": "9ea498a8c45303410c0c89b711889084", "sha256": "32e7fe0c2d6b807b3dd97dac34137bf9baece240ba192a5b53924845167737a0" }, "downloads": -1, "filename": "pysummarization-1.0.7-py3-none-any.whl", "has_sig": false, "md5_digest": "9ea498a8c45303410c0c89b711889084", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 32628, "upload_time": "2018-08-05T02:05:28", "url": "https://files.pythonhosted.org/packages/1e/87/3cc7b1ad98a0410b0c2ccb3240834929e196df64da38ad6953778646dfd0/pysummarization-1.0.7-py3-none-any.whl" } ], "1.0.8": [ { "comment_text": "", "digests": { "md5": "49114f2f9ec07fddeb7706bb0c264318", "sha256": "e6bdd2e376a51213127d1a9dfbac69f29a8baf4abdc10b5ffc2a6f5c0a731d62" }, "downloads": -1, "filename": "pysummarization-1.0.8-py3-none-any.whl", "has_sig": false, "md5_digest": "49114f2f9ec07fddeb7706bb0c264318", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 32628, "upload_time": "2018-08-19T12:54:35", "url": "https://files.pythonhosted.org/packages/42/78/8e7fbbda310e3fcd5e29b8d3b263c80784a6113c7a2bd2833b2de116800e/pysummarization-1.0.8-py3-none-any.whl" } ], "1.0.9": [ { "comment_text": "", "digests": { "md5": "ade566c939d57a20909fb6b760d5ee70", "sha256": "0fec75fa7efdc75d99b41961357d8b51bf2ca950669fa9afb5d083fac10d9cc7" }, "downloads": -1, "filename": "pysummarization-1.0.9.linux-x86_64.tar.gz", "has_sig": false, "md5_digest": "ade566c939d57a20909fb6b760d5ee70", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 42465, "upload_time": "2018-09-02T07:01:47", "url": "https://files.pythonhosted.org/packages/e9/63/90bb29d295b7008e9cd141481e181e0516654cfe6a3504254b3a93cf26d1/pysummarization-1.0.9.linux-x86_64.tar.gz" }, { "comment_text": "", "digests": { "md5": "978b9cc2870bb256d71ec3ec12df2e64", "sha256": "b3db5e46b22b769bbdf73b47a3fd6be9b0af1440f3fd85ab9e62c5d0d839c9fc" }, "downloads": -1, "filename": "pysummarization-1.0.9-py3-none-any.whl", "has_sig": false, "md5_digest": "978b9cc2870bb256d71ec3ec12df2e64", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 36126, "upload_time": "2018-09-02T06:58:31", "url": "https://files.pythonhosted.org/packages/8c/7f/5f0fb6554ba0810a48ac739dcfb5d94784109b49956df393c0b8bc7c0b00/pysummarization-1.0.9-py3-none-any.whl" } ], "1.1.1": [ { "comment_text": "", "digests": { "md5": "14501949ab75053fd4df137874752dd9", "sha256": "48f9c8ef7a2dfd018eed46e7e85ef0a09ed7b12859aa25b76d1c82563e198d42" }, "downloads": -1, "filename": "pysummarization-1.1.1-py3-none-any.whl", "has_sig": false, "md5_digest": "14501949ab75053fd4df137874752dd9", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 38495, "upload_time": "2018-09-16T08:29:48", "url": "https://files.pythonhosted.org/packages/17/50/2b70caf3092c8b6f01022ba0adff9312fc3172be59322d60857a00bf3cab/pysummarization-1.1.1-py3-none-any.whl" } ], "1.1.2": [ { "comment_text": "", "digests": { "md5": "4e951c53d7606a7e81f3e435ba266ae4", "sha256": "185ba0db0781d8997bc981b32fdb0f3862735c5dfcaa50ad3df8e5fa1f74a80f" }, "downloads": -1, "filename": "pysummarization-1.1.2.linux-x86_64.tar.gz", "has_sig": false, "md5_digest": "4e951c53d7606a7e81f3e435ba266ae4", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 77364, "upload_time": "2019-03-31T06:41:19", "url": "https://files.pythonhosted.org/packages/24/bd/69008882c64202d6a0f7c0d43b45aad3b0887795014c1b75f608db73dee7/pysummarization-1.1.2.linux-x86_64.tar.gz" }, { "comment_text": "", "digests": { "md5": "b653170328883eb17eab4769c4023c63", "sha256": "be442384fef0f6015313b71328dda89c6eadcc7dadcc8dfd86141bbe9457ecab" }, "downloads": -1, "filename": "pysummarization-1.1.2-py3-none-any.whl", "has_sig": false, "md5_digest": "b653170328883eb17eab4769c4023c63", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 62442, "upload_time": "2019-03-23T15:02:23", "url": "https://files.pythonhosted.org/packages/04/b3/e0742cad55ef827a31bd768fe2aa0039b5215b787f53c8457ff66846d1f4/pysummarization-1.1.2-py3-none-any.whl" } ], "1.1.3": [ { "comment_text": "", "digests": { "md5": "a4b5768ede346ee6e15327f11e53141d", "sha256": "b0c6c8f5e0483c9361db0d1bdbec1ffca57b074ce1b3d4ecf411ace939a0b55a" }, "downloads": -1, "filename": "pysummarization-1.1.3.tar.gz", "has_sig": false, "md5_digest": "a4b5768ede346ee6e15327f11e53141d", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 55220, "upload_time": "2019-06-23T11:29:25", "url": "https://files.pythonhosted.org/packages/e5/d0/09db4b4bac05ff092864fde5e6ff102c5035c97dc13ec5aba68b3d92d447/pysummarization-1.1.3.tar.gz" } ], "1.1.4": [ { "comment_text": "", "digests": { "md5": "ecbaab7d0ee3b585b647ef62a8f4b8bd", "sha256": "6aef436f8617ead9feb72249aef4a4016d0af713d43c127d60723b5e46b5e2e8" }, "downloads": -1, "filename": "pysummarization-1.1.4.tar.gz", "has_sig": false, "md5_digest": "ecbaab7d0ee3b585b647ef62a8f4b8bd", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 57538, "upload_time": "2019-08-04T04:08:19", "url": "https://files.pythonhosted.org/packages/7b/38/131f8574e0e12f27fa2d35b11a91055a67c8e55b205a505669c6df7881cb/pysummarization-1.1.4.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "ecbaab7d0ee3b585b647ef62a8f4b8bd", "sha256": "6aef436f8617ead9feb72249aef4a4016d0af713d43c127d60723b5e46b5e2e8" }, "downloads": -1, "filename": "pysummarization-1.1.4.tar.gz", "has_sig": false, "md5_digest": "ecbaab7d0ee3b585b647ef62a8f4b8bd", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 57538, "upload_time": "2019-08-04T04:08:19", "url": "https://files.pythonhosted.org/packages/7b/38/131f8574e0e12f27fa2d35b11a91055a67c8e55b205a505669c6df7881cb/pysummarization-1.1.4.tar.gz" } ] }