{ "info": { "author": "Alessandro Scoccia Pappagallo", "author_email": "aless@ndro.xyz", "bugtrack_url": null, "classifiers": [ "License :: OSI Approved :: MIT License", "Operating System :: OS Independent", "Programming Language :: Python :: 3" ], "description": "# Multi-View Network in Keras\n\nThis package is based on [End-to-End Multi-View Networks for Text Classification](https://arxiv.org/abs/1704.05907) by Hongyu Guo, Colin Cherry and Jiang Su (2017). The overall architecture of the Multi-View Network (MVN) was not really explained in painstaking details in the paper, so I had to make some guess work.\n\nFeel free reach out to me at aless@ndro.xyz with any feedback.\n\n# Basic Usage\n\nAssuming you have your corpus prepared as a list of documents, each represented by a list of embeddings (one per token), you can train the MVN this way:\n\n```python\nimport multi_view_network\nimport numpy as np\n\n#\u00a0Very important: the documents in embedded_corpus **need** to have\n#\u00a0the same number of embedded_tokens. If this is not the case\n#\u00a0you can use multi_view_network.pad_embedded_corpus() to pad\n#\u00a0the documents with 0-filled mock embeddings.\ndata = np.array(embedded_corpus)\n\n# The output of the MVN is softmaxed so it's important to\n#\u00a0make sure the labels are one-hot encoded.\nlabels = np.array([[0, 1], [0, 1], [1, 0], etc.])\n\nmodel = multi_view_network.BuildMultiViewNetwork(\n embeddings_dim=300, hidden_units=16, dropout_rate=0.2, output_units=2)\nmodel.compile(optimizer='sgd', loss='categorical_crossentropy')\nmodel.fit(data, labels, epochs=200, batch_size=32)\n```\n\n# More Complex Architectures\n\nThe `models.py` module contains all the necessary Layers to build MVNs of arbitrary size and complexity. For example:\n\n```python\nimport multi_view_network\n\nembeddings_dim = 300\nhidden_units = 64\noutput_units = 2\n\ninputs = keras.layers.Input(shape=(None, embeddings_dim))\ns1 = SelectionLayer(name='s1')(inputs)\ns2 = SelectionLayer(name='s2')(inputs)\ns3 = SelectionLayer(name='s3')(inputs)\ns4 = SelectionLayer(name='s4')(inputs)\ns5 = SelectionLayer(name='s5')(inputs)\ns6 = SelectionLayer(name='s6')(inputs)\ns7 = SelectionLayer(name='s7')(inputs)\ns8 = SelectionLayer(name='s8')(inputs)\nv1 = ViewLayer(view_index=1, name='v1')(s1)\nv2 = ViewLayer(view_index=2, name='v2')([s1, s2])\nv3 = ViewLayer(view_index=3, name='v3')([s1, s2, s3])\nv4 = ViewLayer(view_index=4, name='v4')([s1, s2, s3, s4])\nv5 = ViewLayer(view_index=5, name='v5')([s1, s2, s3, s4, s5])\nv6 = ViewLayer(view_index=6, name='v6')([s1, s2, s3, s4, s5, s6])\nv7 = ViewLayer(view_index=7, name='v7')([s1, s2, s3, s4, s5, s6, s7])\nv8 = ViewLayer(view_index='Last', name='v8')(s8)\nconcatenation = keras.layers.concatenate(\n [v1, v2, v3, v4, v5, v6, v7, v8], name='concatenation')\nfully_connected = keras.layers.Dense(\n units=hidden_units, name='fully_connected')(concatenation)\ndropout = keras.layers.Dropout(rate=dropout_rate)(fully_connected)\nanother_dense_layer = keras.layers.Dense(\n units=hidden_units, name='another_dense_layer')(dropout)\nsoftmax = keras.layers.Dense(\n units=output_units, activation='softmax',\n name='softmax')(dropout)\n\nmodel = keras.models.Model(inputs=inputs, outputs=softmax)\n```\n\n# Utilities\n\nThe `utils.py` module contains a couple of functions that could come in handy when pre-processing your input. As mentioned above, **it's important that when you coerce your list of embedded_documents to `np.array()` all the documents have a same number of embedded_tokens**. Otherwise, the resulting array will have an incorrect `.shape`, which would cause [Keras](https://keras.io/) to throw an error (as the input wouldn't match the expected shape).\n\nThere are two utility functions you can use to solve this problem: pad_embedded_corpus() and cap_embedded_corpus(). The first one adds 0-filled mock embedded_tokens to each document until all documents have the same length. The second one crops each document so that only the first X tokens are maintained, achieving the same result.\n\nFor example:\n\n```python\nimport multi_view_network\n\nembedded_corpus = [\n [\n [0, 0]\n ],\n [\n [0, 0],\n [1, 1]\n ],\n [\n [0, 0],\n [1, 1],\n [2, 1]\n ]\n]\n\npadded_corpus = multi_view_network.pad_embedded_corpus(embedded_corpus, embeddings_dim=2)\npadded_corpus_sizes = [len(lst) for lst in padded_corpus]\n#\u00a0padded_corps_sizes\n# >>> [3, 3, 3]\n\ncapped_corpus = multi_view_network.cap_embedded_corpus(embedded_corpus)\ncapped_corpus_sizes = [len(lst) for lst in capped_corpus]\n#capped_corpus_sizes\n# >>> [1, 1, 1]\n```\n\nAdding 0-filled vectors to the documents has no effect on the output and training performance of the MVN, and it's thus the recommended way to make sure all embedded_documents have the same length.\n\n\n", "description_content_type": "text/markdown", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/annoys-parrot/multi_view_network", "keywords": "keras tensorflow machine-learning NLP research", "license": "", "maintainer": "", "maintainer_email": "", "name": "multi-view-network", "package_url": "https://pypi.org/project/multi-view-network/", "platform": "", "project_url": "https://pypi.org/project/multi-view-network/", "project_urls": { "Homepage": "https://github.com/annoys-parrot/multi_view_network" }, "release_url": "https://pypi.org/project/multi-view-network/1.0/", "requires_dist": [ "tensorflow", "keras" ], "requires_python": "", "summary": "Keras implementation of Multi-View Network by Guo et al.", "version": "1.0" }, "last_serial": 4252800, "releases": { "1.0": [ { "comment_text": "", "digests": { "md5": "db59a0550ebe067a5271778288e291d0", "sha256": "a203ad4678fb4fef01d7e047a88e41823948361058325935661fc212f63af1c6" }, "downloads": -1, "filename": "multi_view_network-1.0-py3-none-any.whl", "has_sig": false, "md5_digest": "db59a0550ebe067a5271778288e291d0", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 10654, "upload_time": "2018-09-09T02:55:16", "url": "https://files.pythonhosted.org/packages/c4/7d/df27ae33730204d121a96ee9c60672b252fff8f3f4581ba87f347ddd833c/multi_view_network-1.0-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "32de434d7a941e93c45d49d5fb12876b", "sha256": "2cda56695417dee306b200b114ca7801145b89ff720d27d988f06ba480d5b67e" }, "downloads": -1, "filename": "multi_view_network-1.0.tar.gz", "has_sig": false, "md5_digest": "32de434d7a941e93c45d49d5fb12876b", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 8631, "upload_time": "2018-09-09T02:55:17", "url": "https://files.pythonhosted.org/packages/84/6b/dd6344f886eac349b9abb1a3af1df89a79b76d224c28e69a5842829bdaaa/multi_view_network-1.0.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "db59a0550ebe067a5271778288e291d0", "sha256": "a203ad4678fb4fef01d7e047a88e41823948361058325935661fc212f63af1c6" }, "downloads": -1, "filename": "multi_view_network-1.0-py3-none-any.whl", "has_sig": false, "md5_digest": "db59a0550ebe067a5271778288e291d0", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 10654, "upload_time": "2018-09-09T02:55:16", "url": "https://files.pythonhosted.org/packages/c4/7d/df27ae33730204d121a96ee9c60672b252fff8f3f4581ba87f347ddd833c/multi_view_network-1.0-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "32de434d7a941e93c45d49d5fb12876b", "sha256": "2cda56695417dee306b200b114ca7801145b89ff720d27d988f06ba480d5b67e" }, "downloads": -1, "filename": "multi_view_network-1.0.tar.gz", "has_sig": false, "md5_digest": "32de434d7a941e93c45d49d5fb12876b", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 8631, "upload_time": "2018-09-09T02:55:17", "url": "https://files.pythonhosted.org/packages/84/6b/dd6344f886eac349b9abb1a3af1df89a79b76d224c28e69a5842829bdaaa/multi_view_network-1.0.tar.gz" } ] }