{ "info": { "author": "Paolo Ripamonti", "author_email": "paolo.ripamonti93@gmail.com", "bugtrack_url": null, "classifiers": [ "License :: OSI Approved :: MIT License", "Operating System :: OS Independent", "Programming Language :: Python :: 3" ], "description": "# Word2Vec-Keras Text Classifier\nWord2Vec-Keras is a simple Word2Vec and LSTM wrapper for text classification.\n\nIt combines Gensim Word2Vec model with Keras neural network trhough an Embedding layer as input.\nThe Neural Network contains with LSTM layer\n\n## How install\n```git\npip3 install git+https://github.com/paoloripamonti/word2vec-keras\n```\n\n## Usage\n\n```python\nfrom word2vec_keras import Word2VecKeras\n\nmodel = Word2VecKeras()\n\nmodel.train(x_train, y_train)\n```\n\nTrain Word2Vec and Keras models.\nThe Keras model has _EralyStopping_ callback for stopping training after 6 epochs that not improve accuracy.\n\nTrain parameters:\n- **x_train**: list of raw sentences, no text cleaning will be perfomed\n- **y_train**: list of labels\n- **w2v_size**: (Default: 300) Word2Vec - Dimensionality of the word vectors\n- **w2v_window**: (Default: 5) Word2Vec - Maximum distance between the current and predicted word within a sentence.\n- **w2v_min_count**: (Default: 1) Word2Vec - Ignores all words with total frequency lower than this.\n- **w2v_epochs**: (Default: 100) Word2Vec - Number of iterations (epochs) over the corpus.\n- **k_max_sequence_len**: (Default: 500) Keras - Maximum length of all sequences\n- **k_batch_size**:(Default: 128) Keras - Number of samples per gradient update\n- **k_epochs**:(Default: 32) Keras - Number of epochs to train the model. An epoch is an iteration over the entire x and y data provided\n- **k_lstm_neurons**: (Default: 128) Keras - LSTM neurons per layer\n- **k_hidden_layer_neurons**: (Default: \\[128, 64, 32]) Keras - Number of Dense layers after LSTM layer.\n- **verbose**: (Default: 1) Keras- 0, 1, or 2. Verbosity mode. 0 = silent, 1 = progress bar, 2 = one line per epoch\n\n### Evaluate\n\n```python\n\nmodel.evaluate(x_test, y_test)\n```\n\nEvaluate model\n\nEvaluate parameters:\n- **x_test**: list of raw sentences, no text cleaning will be perfomed\n- **y_test**: list of labels\n\nEvaluate result:\n- Return a dictionary with ACCURAY, CLASSIFICATION_REPORT and CONFUSION_MATRIX\n\n\n### Predict\n\n```python\nmodel.predict('lorem ipsum dolor sit amet consectetur adipiscing elit...', threshold=0.6)\n```\n\nMake prediction of give text\n\nPredict parameters:\n- **x_text**: Raw text, no text cleaning will be perfomed\n- **threshold**: (Default: 0.0) Cut-off threshold, if confidence il less than given value return __OTHER__ as label\n\nPredict result:\n- Return a dictionary with LABEL, CONFIDENCE and ELAPSED_TIME, i.e. {label: LABEL, confidence: CONFIDENCE, elapsed_time: TIME}\n\n### Save & load model\n\n```python\nmodel.save('/path/model.tar.gz')\n```\n\nSave model as compressed **tar.gz** file that contains several utility pickles, keras model and Word2Vec model\n\n```python\nmodel = Word2VecKeras()\n\nmodel.load('/path/model.tar.gz')\n```\n\nLoad model from saved tar.gz file\n\n## Example\n\n```python\nfrom sklearn.datasets import fetch_20newsgroups\nfrom word2vec_keras import Word2VecKeras\nfrom pprint import pprint\n\n# fetch the dataset using scikit-learn\ncategories = ['alt.atheism', 'soc.religion.christian',\n 'comp.graphics', 'sci.med']\n\ntrain_b = fetch_20newsgroups(subset='train',\n categories=categories, shuffle=True, random_state=42)\ntest_b = fetch_20newsgroups(subset='test',\n categories=categories, shuffle=True, random_state=42)\n\nprint('size of training set: %s' % (len(train_b['data'])))\nprint('size of validation set: %s' % (len(test_b['data'])))\nprint('classes: %s' % (train_b.target_names))\n\nx_train = train_b.data\ny_train = [train_b.target_names[idx] for idx in train_b.target]\nx_test = test_b.data\ny_test = [train_b.target_names[idx] for idx in test_b.target]\n\nmodel = Word2VecKeras()\nmodel.train(x_train, y_train)\n\npprint(model.evaluate(x_test, y_test))\n\nmodel.save('./model.tar.gz')\n\n```\n\n[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1--KCXDEXQHg-ueAwnyRrRmFbcqwvLhcZ)", "description_content_type": "text/markdown", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/javatechy/dokr", "keywords": "keras,word2vec,deep learning,machine learning", "license": "", "maintainer": "", "maintainer_email": "", "name": "word2vec-keras", "package_url": "https://pypi.org/project/word2vec-keras/", "platform": "", "project_url": "https://pypi.org/project/word2vec-keras/", "project_urls": { "Homepage": "https://github.com/javatechy/dokr" }, "release_url": "https://pypi.org/project/word2vec-keras/0.1/", "requires_dist": null, "requires_python": "", "summary": "Word2Vec Keras Text Classifier", "version": "0.1" }, "last_serial": 5895558, "releases": { "0.1": [ { "comment_text": "", "digests": { "md5": "980c72f6572758d52b362f74bd0f7bec", "sha256": "80703becfb7a9efdccf11a86fb8b9048daf379e4c0a595e3dd4879d3fa31a9d9" }, "downloads": -1, "filename": "word2vec-keras-0.1.tar.gz", "has_sig": false, "md5_digest": "980c72f6572758d52b362f74bd0f7bec", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 6059, "upload_time": "2019-09-27T11:09:09", "url": "https://files.pythonhosted.org/packages/70/35/ee0d3a2d7a68bf6055d67b65120d87a0ae66203a46fc282169c66411bd01/word2vec-keras-0.1.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "980c72f6572758d52b362f74bd0f7bec", "sha256": "80703becfb7a9efdccf11a86fb8b9048daf379e4c0a595e3dd4879d3fa31a9d9" }, "downloads": -1, "filename": "word2vec-keras-0.1.tar.gz", "has_sig": false, "md5_digest": "980c72f6572758d52b362f74bd0f7bec", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 6059, "upload_time": "2019-09-27T11:09:09", "url": "https://files.pythonhosted.org/packages/70/35/ee0d3a2d7a68bf6055d67b65120d87a0ae66203a46fc282169c66411bd01/word2vec-keras-0.1.tar.gz" } ] }