{ "info": { "author": "Wenchen Li", "author_email": "wenchen.li.cs@gmail.com", "bugtrack_url": null, "classifiers": [ "License :: OSI Approved :: MIT License", "Programming Language :: Python", "Programming Language :: Python :: 3", "Programming Language :: Python :: 3.7", "Programming Language :: Python :: Implementation :: CPython", "Programming Language :: Python :: Implementation :: PyPy" ], "description": "\n\n
\n

\n
\n\n-----------------\n\n# capricorn\n\ncapricorn is a lightweight library for helping prepare vocabulary from\ncorpus and prepare word embedding ready to be used by learning models.\n\n1. build vocabulary from corpus\n2. load necessary word embedding with consistent word index in Vocabulary\n\n\n## getting started\n```\npip install capricorn\n```\n\n```python\nimport capricorn\nimport os\n\n# Specify filepaths\nVocab_path = \"vocab_processor\"\nembedding_vector_path = \"path/to/embedding\"\n\n# Load vocab\nif os.path.isfile(Vocab_path):\n print(\"Loading Vocabulary ...\")\n vocab_processor = capricorn.VocabularyProcessor.restore(Vocab_path)\n\nelse: # build vocab\n print(\"Building Vocabulary ...\")\n\n x_text = [\"Saudi Arabia Equity Movers: Almarai, Jarir Marketing and Spimaco.\",\n \"Orange, Thales to Get French Cloud Computing Funds, Figaro Says.\",\n \"Stansted Could Double Passengers on Deregulation, Times Reports.\"]\n\n # Build/load vocabulary\n max_document_length = 11\n min_freq_filter = 2\n\n vocab_processor = capricorn.VocabularyProcessor(max_document_length=max_document_length,\n min_frequency=min_freq_filter)\n # only fit\n # vocab_processor.fit(x_text)\n # or fit_transform to get the transformed corpus\n x_text_transformed = vocab_processor.fit_transform(x_text)\n vocab_processor.save(Vocab_path)\n print(\"vocab_processor saved at:\", Vocab_path)\n\n# build embedding matrix of which the index is consistent with vocab word2index mapping\nembedding_matrix = vocab_processor.prepare_embedding_matrix_with_dim(embedding_vector_path, 300)\n\n```\n# User input\n\nThe library default to use special token \\_\\_UNK__ and \\_\\_PAD__, \nif the input sequence lengths below the max_document_length when initial\nVocabularyProcessor, it will automatically pad the sequence use the \\_\\_PAD__. \n\n\nIf user have pre defined special tokens when initialize Vocabulary, user \nneed to pre-process the sequence, namely adding the self defined special \ntokens to the input sequence. For example if user defined \\_\\_START__\nand \\_\\_END__ as additional special tokens and max_document_length=11, User has to process the original\nsentence from: \n\n\"We like it very much\" \n\nto:\n\n\"\\_\\_START__ \\_\\_PAD__ \\_\\_PAD__ We like it very much \\_\\_PAD__ \\_\\_PAD__ \\_\\_END__\"\n\n\n", "description_content_type": "text/markdown", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/WenchenLi/capricorn", "keywords": "", "license": "MIT", "maintainer": "", "maintainer_email": "", "name": "capricorn", "package_url": "https://pypi.org/project/capricorn/", "platform": "", "project_url": "https://pypi.org/project/capricorn/", "project_urls": { "Homepage": "https://github.com/WenchenLi/capricorn" }, "release_url": "https://pypi.org/project/capricorn/0.1.2/", "requires_dist": null, "requires_python": ">=3.6.0", "summary": "nlp vocabulary builder and embedding loader", "version": "0.1.2" }, "last_serial": 4633077, "releases": { "0.1.1": [ { "comment_text": "", "digests": { "md5": "a9231f08faec9ba9fca97438a551231f", "sha256": "682bc39549989747232cc824c459a4504577165c88569f88968769452f05b22e" }, "downloads": -1, "filename": "capricorn-0.1.1-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "a9231f08faec9ba9fca97438a551231f", "packagetype": "bdist_wheel", "python_version": "py2.py3", "requires_python": ">=3.6.0", "size": 8894, "upload_time": "2018-12-24T13:01:21", "url": "https://files.pythonhosted.org/packages/a0/14/0f6f397a3f6b265436e1c4ee3eb91b6b571a91f8db3dcdb5297aeff65e88/capricorn-0.1.1-py2.py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "60496c4ff37968e5558a83d9eee87aca", "sha256": "b81e2e8d385ea7cee58ff4c3f7129ceb2bdf37240fb7db275d0c35fac7cdcb19" }, "downloads": -1, "filename": "capricorn-0.1.1.tar.gz", "has_sig": false, "md5_digest": "60496c4ff37968e5558a83d9eee87aca", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.6.0", "size": 7917, "upload_time": "2018-12-24T13:01:24", "url": "https://files.pythonhosted.org/packages/4e/8d/97ada5aa1b6b3b13133429c4f7d28cdf2d228f6382c23e9ae162e65b6998/capricorn-0.1.1.tar.gz" } ], "0.1.2": [ { "comment_text": "", "digests": { "md5": "dfdb8c64cc25c088428efa2c1edc2e50", "sha256": "83f11430142796f07a016b38fbb2a44c589dfcd95a5bffb5235d650cb6a6f895" }, "downloads": -1, "filename": "capricorn-0.1.2-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "dfdb8c64cc25c088428efa2c1edc2e50", "packagetype": "bdist_wheel", "python_version": "py2.py3", "requires_python": ">=3.6.0", "size": 8962, "upload_time": "2018-12-25T09:10:10", "url": "https://files.pythonhosted.org/packages/3b/3c/0f893128deaa05bd1e0c4b758e18ee948b2ad72dc3fe8e80365c59a2c3e8/capricorn-0.1.2-py2.py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "16f0a4b36f1e0bc2e608e1d2e78fec32", "sha256": "3cd8523f9199c759d01df490c5f8e6c0c18dfba9617da6276804b2d4294d2fa0" }, "downloads": -1, "filename": "capricorn-0.1.2.tar.gz", "has_sig": false, "md5_digest": "16f0a4b36f1e0bc2e608e1d2e78fec32", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.6.0", "size": 8003, "upload_time": "2018-12-25T09:10:12", "url": "https://files.pythonhosted.org/packages/be/60/766fb7ee5d9c3846bb9ad003eb6ded93455601aaa281cfb5bc97423dd26f/capricorn-0.1.2.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "dfdb8c64cc25c088428efa2c1edc2e50", "sha256": "83f11430142796f07a016b38fbb2a44c589dfcd95a5bffb5235d650cb6a6f895" }, "downloads": -1, "filename": "capricorn-0.1.2-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "dfdb8c64cc25c088428efa2c1edc2e50", "packagetype": "bdist_wheel", "python_version": "py2.py3", "requires_python": ">=3.6.0", "size": 8962, "upload_time": "2018-12-25T09:10:10", "url": "https://files.pythonhosted.org/packages/3b/3c/0f893128deaa05bd1e0c4b758e18ee948b2ad72dc3fe8e80365c59a2c3e8/capricorn-0.1.2-py2.py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "16f0a4b36f1e0bc2e608e1d2e78fec32", "sha256": "3cd8523f9199c759d01df490c5f8e6c0c18dfba9617da6276804b2d4294d2fa0" }, "downloads": -1, "filename": "capricorn-0.1.2.tar.gz", "has_sig": false, "md5_digest": "16f0a4b36f1e0bc2e608e1d2e78fec32", "packagetype": "sdist", "python_version": "source", "requires_python": ">=3.6.0", "size": 8003, "upload_time": "2018-12-25T09:10:12", "url": "https://files.pythonhosted.org/packages/be/60/766fb7ee5d9c3846bb9ad003eb6ded93455601aaa281cfb5bc97423dd26f/capricorn-0.1.2.tar.gz" } ] }