{ "info": { "author": "sally14", "author_email": "sally14.doe@gmail.com", "bugtrack_url": null, "classifiers": [ "License :: OSI Approved :: MIT License", "Operating System :: OS Independent", "Programming Language :: Python :: 3" ], "description": "# Embeddings\n\n\nThis package is designed to provide easy-to-use python class and cli\ninterfaces to:\n\n- clean corpuses in an efficient way in terms of computation time\n\n- generate word2vec embeddings (based on gensim) and directly write them to a format that is compatible with [Tensorflow Projector](http://projector.tensorflow.org/)\n\nThus, with two classes, or two commands, anyone should be able clean a corpus and generate embeddings that can be uploaded and visualized with Tensorflow Projector.\n\n## Getting started\n\n### Requirements\n\nThis packages requires ```gensim```, ```nltk```, and ```docopt``` to run. If\npip doesn't install this dependencies automatically, you can install it by\nrunning :\n```bash\npip install nltk docopt gensim\n```\n\n### Installation\n\nTo install this package, simply run :\n```bash\npip install embeddingsprep\n```\n\nFurther versions might include conda builds, but it's currently not the case.\n\n\n## Main features\n\n### Preprocessing\n\nFor Word2Vec, we want a soft yet important preprocessing. We want to denoise the text while keeping as much variety and information as possible. A detailed version of what is done during the preprocessing is available [here](./preprocessing/index.html)\n\n\n#### Usage example :\n\nCreating and saving a loadable configuration:\n```python\nfrom embeddingsprep.preprocessing.preprocessor import PreprocessorConfig, Preprocessor\nconfig = PreprocessorConfig('/tmp/logdir')\nconfig.set_config(writing_dir='/tmp/outputs')\nconfig.save_config()\n```\n\n```python\nprep = Preprocessor('/tmp/logdir') # Loads the config object in /tmp/logdir if it exists\nprep.fit('~/mydata/') # Fits the unigram & bigrams occurences\nprep.filter() # Filters with all the config parameters\nprep.transform('~/mydata') # Transforms the texts with the filtered vocab. \n```\n\n\n### Word2Vec\n\nFor the Word2Vec, we just wrote a simple wrapper that takes the\npreprocessed files as an input, trains a Word2Vec model with gensim and writes the vocab, embeddings .tsv files that can be visualized with tensorflow projector (http://projector.tensorflow.org/)\n\n#### Usage example:\n\n\n```python\nfrom embeddingsprep.models.word2vec import Word2Vec\nmodel = Word2Vec(emb_size=300, window=5, epochs=3)\nmodel.train('./my-preprocessed-data/')\nmodel.save('./my-output-dir')\n```\n\n## Contributing\n\nAny github issue, contribution or suggestion is welcomed! You can open issues on the [github repository](https://github.com/sally14/embeddings).\n\n\n", "description_content_type": "text/markdown", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/sally14/embeddings", "keywords": "", "license": "", "maintainer": "", "maintainer_email": "", "name": "embeddingsprep", "package_url": "https://pypi.org/project/embeddingsprep/", "platform": "", "project_url": "https://pypi.org/project/embeddingsprep/", "project_urls": { "Homepage": "https://github.com/sally14/embeddings" }, "release_url": "https://pypi.org/project/embeddingsprep/0.1.4/", "requires_dist": [ "docopt (>=0.6.2)", "gensim (>=3.8.1)", "nltk (>=3.4.5)", "glob" ], "requires_python": ">=3.6", "summary": "A word2vec preprocessing and training package", "version": "0.1.4", "yanked": false, "yanked_reason": null }, "last_serial": 6030582, "releases": { "0.1.0": [ { "comment_text": "", "digests": { "md5": "d4a77b1a5c1ba551c1929f2416687373", "sha256": "c7138dec391ec4dac6997eba97753e78fe7bd299aacd0f1bc37fce4e8b118fee" }, "downloads": -1, "filename": "embeddingsprep-0.1.0-py3-none-any.whl", "has_sig": false, "md5_digest": "d4a77b1a5c1ba551c1929f2416687373", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": ">=3.6", "size": 2357, "upload_time": "2019-10-25T14:34:25", "upload_time_iso_8601": "2019-10-25T14:34:25.819027Z", "url": "https://files.pythonhosted.org/packages/9d/7d/bf83028bf0399eb3be7109b861b1cef3e256858f7746c1d86630891671af/embeddingsprep-0.1.0-py3-none-any.whl", "yanked": false, "yanked_reason": null }, { "comment_text": "", "digests": { "md5": "d44d32e8affb9d13dc543e62b1972bda", "sha256": "608edc8cd11441fcd7f6bc371e0954372e7a61dc0eeb7991e2a4fff2b78faa20" }, "downloads": -1, "filename": "embeddingsprep-0.1.0.tar.gz", "has_sig": false, "md5_digest": "d44d32e8affb9d13dc543e62b1972bda", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.6", "size": 1937, "upload_time": "2019-10-25T14:34:28", "upload_time_iso_8601": "2019-10-25T14:34:28.201066Z", "url": "https://files.pythonhosted.org/packages/98/1a/9d00e45332131f21fbd5276078277fb0a51133abaf76fdad672488513217/embeddingsprep-0.1.0.tar.gz", "yanked": false, "yanked_reason": null } ], "0.1.1": [ { "comment_text": "", "digests": { "md5": "336bdf3188a20e1d3bd1625365089af8", "sha256": "b1bb73e7788031b350973941f7f95b689e5cab7cd1af4359c190d8166e3772a2" }, "downloads": -1, "filename": "embeddingsprep-0.1.1-py3-none-any.whl", "has_sig": false, "md5_digest": "336bdf3188a20e1d3bd1625365089af8", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": ">=3.6", "size": 2357, "upload_time": "2019-10-25T14:48:28", "upload_time_iso_8601": "2019-10-25T14:48:28.044872Z", "url": "https://files.pythonhosted.org/packages/40/58/ecb91bcbeec87ee6e88ac59d42fe657046d475826b1e2ed16be49bc124fd/embeddingsprep-0.1.1-py3-none-any.whl", "yanked": false, "yanked_reason": null }, { "comment_text": "", "digests": { "md5": "24c53ef7c99c3b18d2c18ab03287c8d5", "sha256": "24cf0e5b6b20feb09a10b61aa5e8d4cc2248385e1f7bc3150331a41aafbb2c3b" }, "downloads": -1, "filename": "embeddingsprep-0.1.1.tar.gz", "has_sig": false, "md5_digest": "24c53ef7c99c3b18d2c18ab03287c8d5", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.6", "size": 1939, "upload_time": "2019-10-25T14:48:29", "upload_time_iso_8601": "2019-10-25T14:48:29.633869Z", "url": "https://files.pythonhosted.org/packages/05/9d/78c4d4072387b6bd59070fd925b9d98b8752a7816e51fec68c707c4a13e9/embeddingsprep-0.1.1.tar.gz", "yanked": false, "yanked_reason": null } ], "0.1.2": [ { "comment_text": "", "digests": { "md5": "53b67f1b5a7063e169c077970b78bbbd", "sha256": "04048f16870707832dc426e37f024c5a5e36ed82569c7801c03e44457a8a5c90" }, "downloads": -1, "filename": "embeddingsprep-0.1.2-py3-none-any.whl", "has_sig": false, "md5_digest": "53b67f1b5a7063e169c077970b78bbbd", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": ">=3.6", "size": 15736, "upload_time": "2019-10-25T15:00:07", "upload_time_iso_8601": "2019-10-25T15:00:07.129407Z", "url": "https://files.pythonhosted.org/packages/2b/05/b1bf5db818d6899df9f187562e7abeec32c7650d5a395f928972dc919a0c/embeddingsprep-0.1.2-py3-none-any.whl", "yanked": false, "yanked_reason": null }, { "comment_text": "", "digests": { "md5": "de3c8c4c8de6b4ab5768e66efc878d6f", "sha256": "93919168811aeab853275330a6ce2343ea18a1efa9a0c3dc0ce5cea4116f7c6a" }, "downloads": -1, "filename": "embeddingsprep-0.1.2.tar.gz", "has_sig": false, "md5_digest": "de3c8c4c8de6b4ab5768e66efc878d6f", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.6", "size": 11981, "upload_time": "2019-10-25T15:00:08", "upload_time_iso_8601": "2019-10-25T15:00:08.646583Z", "url": "https://files.pythonhosted.org/packages/d1/73/cc8ed279a93db7fa9b59d54c63e49a6d0efa1276bedf9dd3b655c8764e2a/embeddingsprep-0.1.2.tar.gz", "yanked": false, "yanked_reason": null } ], "0.1.3": [ { "comment_text": "", "digests": { "md5": "fc8a9ad62b438ee0e511b19c6e87940f", "sha256": "03f9bf8cefb1ce749d0851791ca4ce846411e8debe0bb3a68efb4d7bf2d386d8" }, "downloads": -1, "filename": "embeddingsprep-0.1.3-py3-none-any.whl", "has_sig": false, "md5_digest": "fc8a9ad62b438ee0e511b19c6e87940f", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": ">=3.6", "size": 17991, "upload_time": "2019-10-25T15:49:18", "upload_time_iso_8601": "2019-10-25T15:49:18.757543Z", "url": "https://files.pythonhosted.org/packages/e0/75/3e1e0f2755e97220cee2c35e94e3adabc06464794eff1af4d75c26bb687d/embeddingsprep-0.1.3-py3-none-any.whl", "yanked": false, "yanked_reason": null }, { "comment_text": "", "digests": { "md5": "e2c71c830e7c60b86cacf7634e63a956", "sha256": "528f516c0fea844c2e92ef5aa966a50dcc30b175e0cce30d0583a8c064db0597" }, "downloads": -1, "filename": "embeddingsprep-0.1.3.tar.gz", "has_sig": false, "md5_digest": "e2c71c830e7c60b86cacf7634e63a956", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.6", "size": 13679, "upload_time": "2019-10-25T15:49:20", "upload_time_iso_8601": "2019-10-25T15:49:20.351825Z", "url": "https://files.pythonhosted.org/packages/9b/6a/0392653585515194fb12675b9e9942ecfd967e9beb84061421bbeb39724f/embeddingsprep-0.1.3.tar.gz", "yanked": false, "yanked_reason": null } ], "0.1.4": [ { "comment_text": "", "digests": { "md5": "8140e4e87b446bef6c795aab32d1ab66", "sha256": "5de4e24900b9afaf845dc73105367abe4b6301d714cbeaabd5ae9037ef0778c1" }, "downloads": -1, "filename": "embeddingsprep-0.1.4-py3-none-any.whl", "has_sig": false, "md5_digest": "8140e4e87b446bef6c795aab32d1ab66", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": ">=3.6", "size": 17997, "upload_time": "2019-10-25T16:38:31", "upload_time_iso_8601": "2019-10-25T16:38:31.771324Z", "url": "https://files.pythonhosted.org/packages/e9/57/afeb276d062e6e21613f820082379190f7a59a2b1b16e218eca26bf9b528/embeddingsprep-0.1.4-py3-none-any.whl", "yanked": false, "yanked_reason": null }, { "comment_text": "", "digests": { "md5": "e26ca2c0e9731177c41ac1276bc2968c", "sha256": "7388278ac508a168c5bd9b014b32667ca29362946bbb32fd70667ed5d833fc9f" }, "downloads": -1, "filename": "embeddingsprep-0.1.4.tar.gz", "has_sig": false, "md5_digest": "e26ca2c0e9731177c41ac1276bc2968c", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.6", "size": 13693, "upload_time": "2019-10-25T16:38:34", "upload_time_iso_8601": "2019-10-25T16:38:34.561446Z", "url": "https://files.pythonhosted.org/packages/c2/2a/55d5b720ff7e3061169c9ffdc2c1c923553bee01abec1e24ee799253c401/embeddingsprep-0.1.4.tar.gz", "yanked": false, "yanked_reason": null } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "8140e4e87b446bef6c795aab32d1ab66", "sha256": "5de4e24900b9afaf845dc73105367abe4b6301d714cbeaabd5ae9037ef0778c1" }, "downloads": -1, "filename": "embeddingsprep-0.1.4-py3-none-any.whl", "has_sig": false, "md5_digest": "8140e4e87b446bef6c795aab32d1ab66", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": ">=3.6", "size": 17997, "upload_time": "2019-10-25T16:38:31", "upload_time_iso_8601": "2019-10-25T16:38:31.771324Z", "url": "https://files.pythonhosted.org/packages/e9/57/afeb276d062e6e21613f820082379190f7a59a2b1b16e218eca26bf9b528/embeddingsprep-0.1.4-py3-none-any.whl", "yanked": false, "yanked_reason": null }, { "comment_text": "", "digests": { "md5": "e26ca2c0e9731177c41ac1276bc2968c", "sha256": "7388278ac508a168c5bd9b014b32667ca29362946bbb32fd70667ed5d833fc9f" }, "downloads": -1, "filename": "embeddingsprep-0.1.4.tar.gz", "has_sig": false, "md5_digest": "e26ca2c0e9731177c41ac1276bc2968c", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.6", "size": 13693, "upload_time": "2019-10-25T16:38:34", "upload_time_iso_8601": "2019-10-25T16:38:34.561446Z", "url": "https://files.pythonhosted.org/packages/c2/2a/55d5b720ff7e3061169c9ffdc2c1c923553bee01abec1e24ee799253c401/embeddingsprep-0.1.4.tar.gz", "yanked": false, "yanked_reason": null } ], "vulnerabilities": [] }