{
    "info": {
        "author": "Edward Newell",
        "author_email": "edward.newell@gmail.com",
        "bugtrack_url": null,
        "classifiers": [
            "Development Status :: 3 - Alpha",
            "Intended Audience :: Developers",
            "License :: OSI Approved :: MIT License",
            "Programming Language :: Python :: 2.7",
            "Topic :: Software Development :: Build Tools"
        ],
        "description": "# theano-word2vec\nAn implementation of Mikolov's word2vec in Python 2 using Theano and Lasagne.\n\n## About this package\nThis package has been written with care for modularity of it's components, \nwith the hope that they will be re-usable in creating variations on standard\nword2vec.  Soon I'll provide full documentation with guidelines on \ncustomization and extension, as well as a tour of how the package is setup.\nFor now, please enjoy this quickstart guide\n\n## Quickstart\n*NOTE: This package is only available for Python 2 right now.*\n\n### Install\nInstall from the Python Package Index:\n```bash\npip install theano-word2vec\n```\n\nAlternatively, install a version you can hack on:\n```bash\ngit clone https://github.com/enewe101/word2vec.git\ncd word2vec\npython setup.py develop\n```\n\n### Use\n\nThe simplest way to train a word2vec embedding:\n```python\n>>> from word2vec import word2vec\n>>> embedder, dictionary = word2vec(files=['corpus/file1.txt', 'corpus/file2.txt'])\n```\nWhere the input files should be formatted with one sentence per line, with\ntokens space-separated.\n\nOnce trained, the embedder can be used to convert words to vectors:\n```python\n>>> tokens = 'A sentence to embed'.split()\n>>> token_ids = dictionary.get_ids(tokens)\n>>> vectors = word2vec_embedder.embed(token_ids)\n```\n\nThe `word2vec()` function exposes most of the basic parameters appearing\nin Mikolov's skip-gram model based on noise contrastive estimation:\n```python\n>>> embedder, dictionary = word2vec(\n...\t\t# directory in which to save embedding parameters (deepest dir created if doesn't exist)\n...\t\tsavedir='data/my-embedding',\n...\n...\t\t# List of files comprising the corpus\n...\t\tfiles=['corpus/file1.txt', 'corpus/file2.txt'],\t\n...\n...\t\t# Include whole directories of files (deep files not included)\n...\t\tdirectories=['corpus', 'corpus/additional'],\n...\n...\t\t# Indicate files to exclude using regexes\n...\t\tskip=[re.compile('*.bk$'), re.compile('exclude-from-corpus')],\t\n...\n...\t\t# Number of passes through training corpus\n...\t\tnum_epochs=5,\t\t\t\t\n...\n...\t\t# Specify the mapping from tokens to ints (else create it automatically)\n...\t\tunigram_dictionary=preexisting_dictionary,\t\n...\n...\t\t# Number of \"noise\" examples included for every \"signal\" example\n...\t\tnoise_ratio=15,\t\n...\n...\t\t# Relative probability of skip-gram sampling centered on query word\n...\t\tkernel=[1,2,3,3,2,1],\t\t\n...\n...\t\t# Threshold used to calculate discard-probability for query words\n...\t\tt=1.0e-5,\t\t\t\t\n...\n...\t\t# Size of minibatches during training\n...\t\tbatch_size = 1000,\n...\n...\t\t# Dimensionality of the embedding vector space \n...\t\tnum_embedding_dimensions = 500, \n...\n...\t\t# Initializer for embedding parameters (can be a numpy array too)\n...\t\tword_embedding_init=lasagne.init.Normal(),\t\n...\n...\t\t# Initializer for context embedding parameters (can be numpy array)\n...\t\tcontext_embedding_init=lasagne.init.Normal(),\t\n...\n...\t\t# Size of stochastic gradient descent steps during training\n...\t\tlearning_rate = 0.1,\t\n...\n...\t\t# Amount of Nesterov momentum during training\n...\t\tmomentum=0.9,\t\t\n...\n...\t\t# Print messages during training\n...\t\tverbose=True,\n...\n...\t\t# Number of parrallel corpus-reading processes \n...\t\tnum_example_generators=3\t\n...\t)\n```\n\nFor more customization, check out the documentation (soon) to see how to \nassemble your own training setup using the classes provided in word2vec.",
        "description_content_type": null,
        "docs_url": null,
        "download_url": "",
        "downloads": {
            "last_day": -1,
            "last_month": -1,
            "last_week": -1
        },
        "home_page": "https://github.com/enewe101/word2vec",
        "keywords": "word2vec word embeddings deep learning nlp",
        "license": "MIT",
        "maintainer": "",
        "maintainer_email": "",
        "name": "theano-word2vec",
        "package_url": "https://pypi.org/project/theano-word2vec/",
        "platform": "",
        "project_url": "https://pypi.org/project/theano-word2vec/",
        "project_urls": {
            "Homepage": "https://github.com/enewe101/word2vec"
        },
        "release_url": "https://pypi.org/project/theano-word2vec/0.2.2/",
        "requires_dist": null,
        "requires_python": "",
        "summary": "word2vec using Theano and Lasagne",
        "version": "0.2.2"
    },
    "last_serial": 3029281,
    "releases": {
        "0.1": [
            {
                "comment_text": "",
                "digests": {
                    "md5": "8a1635da8408b749ae27ea7c9f895161",
                    "sha256": "106d906c3f1d8762ec4bde7ad8d55a1f18d8781a52cb37aba8689155ed8adc45"
                },
                "downloads": -1,
                "filename": "theano-word2vec-0.1.tar.gz",
                "has_sig": false,
                "md5_digest": "8a1635da8408b749ae27ea7c9f895161",
                "packagetype": "sdist",
                "python_version": "source",
                "requires_python": null,
                "size": 7314,
                "upload_time": "2016-04-11T04:36:25",
                "url": "https://files.pythonhosted.org/packages/7a/15/83191d85e83f1dd6d6a092b8dceee2a10178c009f0ee95f7911f18fd2d5a/theano-word2vec-0.1.tar.gz"
            }
        ],
        "0.1.1": [
            {
                "comment_text": "",
                "digests": {
                    "md5": "4970cad7cf1c9822dddd487e9b3ecd1a",
                    "sha256": "bb1e9a82b540222b0d9100f447527322fce70d4115fa44e3e18d1babfa3d7e12"
                },
                "downloads": -1,
                "filename": "theano-word2vec-0.1.1.tar.gz",
                "has_sig": false,
                "md5_digest": "4970cad7cf1c9822dddd487e9b3ecd1a",
                "packagetype": "sdist",
                "python_version": "source",
                "requires_python": null,
                "size": 7358,
                "upload_time": "2016-04-11T04:46:45",
                "url": "https://files.pythonhosted.org/packages/5d/0b/07cf87e7698eeb57210fb939af318a7ecd30f9bda208367435613bf9b935/theano-word2vec-0.1.1.tar.gz"
            }
        ],
        "0.1.2": [
            {
                "comment_text": "",
                "digests": {
                    "md5": "1894c473525acef9f850e310046ab5e0",
                    "sha256": "96891617731138d95bbb187c3f29bac4834bf7ef59d84ac5a53bd9191a431642"
                },
                "downloads": -1,
                "filename": "theano-word2vec-0.1.2.tar.gz",
                "has_sig": false,
                "md5_digest": "1894c473525acef9f850e310046ab5e0",
                "packagetype": "sdist",
                "python_version": "source",
                "requires_python": null,
                "size": 22562,
                "upload_time": "2016-04-18T04:17:40",
                "url": "https://files.pythonhosted.org/packages/f2/a7/311bfa18afe115a856e4caf93c125434d1f3fc49e847ab8d8be0d02340df/theano-word2vec-0.1.2.tar.gz"
            }
        ],
        "0.1.3": [
            {
                "comment_text": "",
                "digests": {
                    "md5": "77acdaad29ccb509a43c9f458d59c879",
                    "sha256": "c394a6f181e424a8e298521ac11bc99f4d1b96afb3cd451bef9f9b7cefc1d346"
                },
                "downloads": -1,
                "filename": "theano-word2vec-0.1.3.tar.gz",
                "has_sig": false,
                "md5_digest": "77acdaad29ccb509a43c9f458d59c879",
                "packagetype": "sdist",
                "python_version": "source",
                "requires_python": null,
                "size": 31739,
                "upload_time": "2016-05-04T16:17:23",
                "url": "https://files.pythonhosted.org/packages/bc/9e/970f48922ed385c2eceb6dda17fbce2755b7488f88fc197bb407321773c1/theano-word2vec-0.1.3.tar.gz"
            }
        ],
        "0.1.4": [
            {
                "comment_text": "",
                "digests": {
                    "md5": "f36a7396739d99c47354cb4dc8e786df",
                    "sha256": "aabe3a6e2876482c74233e56adaba71313faf7dd54d08fdc9917b5329d19b177"
                },
                "downloads": -1,
                "filename": "theano-word2vec-0.1.4.tar.gz",
                "has_sig": false,
                "md5_digest": "f36a7396739d99c47354cb4dc8e786df",
                "packagetype": "sdist",
                "python_version": "source",
                "requires_python": null,
                "size": 32128,
                "upload_time": "2016-05-04T16:38:20",
                "url": "https://files.pythonhosted.org/packages/bf/ad/d7ed763b43516bbf2c6e185f4b513fcb7c1e27f88b0acd504112f43c18b4/theano-word2vec-0.1.4.tar.gz"
            }
        ],
        "0.1.5": [
            {
                "comment_text": "",
                "digests": {
                    "md5": "459dab5ba18a3ecdf16765f9314c7313",
                    "sha256": "18efbf5244584e9771d6af11142f31e7e4d732e1f037962fba1667894eb69a71"
                },
                "downloads": -1,
                "filename": "theano-word2vec-0.1.5.tar.gz",
                "has_sig": false,
                "md5_digest": "459dab5ba18a3ecdf16765f9314c7313",
                "packagetype": "sdist",
                "python_version": "source",
                "requires_python": null,
                "size": 32216,
                "upload_time": "2016-05-04T16:57:24",
                "url": "https://files.pythonhosted.org/packages/ec/51/ca84bafb95d8e1d0870537d52288e619912ca8862f3b35fa5b41f2a96683/theano-word2vec-0.1.5.tar.gz"
            }
        ],
        "0.2.1": [
            {
                "comment_text": "",
                "digests": {
                    "md5": "3f65843c300759c6f25124035edfaf25",
                    "sha256": "f8387be665b91902785204965c0b3719aa894cc489198dcad3e16f24fae9df15"
                },
                "downloads": -1,
                "filename": "theano-word2vec-0.2.1.tar.gz",
                "has_sig": false,
                "md5_digest": "3f65843c300759c6f25124035edfaf25",
                "packagetype": "sdist",
                "python_version": "source",
                "requires_python": null,
                "size": 38053,
                "upload_time": "2016-10-05T02:47:24",
                "url": "https://files.pythonhosted.org/packages/9d/71/740635afbcdcf200b1664b5a92a8eaee11e8a9c5f9dcd4876ebcb65e1e1f/theano-word2vec-0.2.1.tar.gz"
            }
        ],
        "0.2.2": [
            {
                "comment_text": "",
                "digests": {
                    "md5": "95db195793e40836fb0787b17fb5ca18",
                    "sha256": "b5ef5dd940c18c1c39d3d73f06c62ef9012f9ea281253734dff92205eef73211"
                },
                "downloads": -1,
                "filename": "theano-word2vec-0.2.2.tar.gz",
                "has_sig": false,
                "md5_digest": "95db195793e40836fb0787b17fb5ca18",
                "packagetype": "sdist",
                "python_version": "source",
                "requires_python": null,
                "size": 39039,
                "upload_time": "2017-07-17T17:50:02",
                "url": "https://files.pythonhosted.org/packages/01/69/40dea6b9c4bc992fbc2fa6c15f1d404bceb389aa0d86f85fc103064570a9/theano-word2vec-0.2.2.tar.gz"
            }
        ]
    },
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "md5": "95db195793e40836fb0787b17fb5ca18",
                "sha256": "b5ef5dd940c18c1c39d3d73f06c62ef9012f9ea281253734dff92205eef73211"
            },
            "downloads": -1,
            "filename": "theano-word2vec-0.2.2.tar.gz",
            "has_sig": false,
            "md5_digest": "95db195793e40836fb0787b17fb5ca18",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 39039,
            "upload_time": "2017-07-17T17:50:02",
            "url": "https://files.pythonhosted.org/packages/01/69/40dea6b9c4bc992fbc2fa6c15f1d404bceb389aa0d86f85fc103064570a9/theano-word2vec-0.2.2.tar.gz"
        }
    ]
}