{ "info": { "author": "Bayu Aldi Yansyah", "author_email": "bayualdiyansyah@gmail.com", "bugtrack_url": null, "classifiers": [ "Development Status :: 2 - Pre-Alpha", "Intended Audience :: Developers", "Intended Audience :: Science/Research", "License :: OSI Approved :: BSD License", "Programming Language :: C++", "Programming Language :: Cython", "Programming Language :: Python :: 2.6", "Programming Language :: Python :: 2.7", "Programming Language :: Python :: 3", "Programming Language :: Python :: 3.2", "Programming Language :: Python :: 3.3", "Programming Language :: Python :: 3.4", "Programming Language :: Python :: 3.5", "Topic :: Scientific/Engineering :: Artificial Intelligence" ], "description": "fasttext |Build Status| |PyPI version|\n======================================\n\nfasttext is a Python interface for `Facebook\nfastText `__.\n\nRequirements\n------------\n\nfasttext support Python 2.6 or newer. It requires\n`Cython `__ in order to build the\nC++ extension.\n\nInstallation\n------------\n\n.. code:: shell\n\n pip install fasttext\n\nExample usage\n-------------\n\nThis package has two main use cases: word representation learning and\ntext classification.\n\nThese were described in the two papers\n`1 <#enriching-word-vectors-with-subword-information>`__ and\n`2 <#bag-of-tricks-for-efficient-text-classification>`__.\n\nWord representation learning\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nIn order to learn word vectors, as described in\n`1 <#enriching-word-vectors-with-subword-information>`__, we can use\n``fasttext.skipgram`` and ``fasttext.cbow`` function like the following:\n\n.. code:: python\n\n import fasttext\n\n # Skipgram model\n model = fasttext.skipgram('data.txt', 'model')\n print model.words # list of words in dictionary\n\n # CBOW model\n model = fasttext.cbow('data.txt', 'model')\n print model.words # list of words in dictionary\n\nwhere ``data.txt`` is a training file containing ``utf-8`` encoded text.\nBy default the word vectors will take into account character n-grams\nfrom 3 to 6 characters.\n\nAt the end of optimization the program will save two files:\n``model.bin`` and ``model.vec``.\n\n``model.vec`` is a text file containing the word vectors, one per line.\n``model.bin`` is a binary file containing the parameters of the model\nalong with the dictionary and all hyper parameters.\n\nThe binary file can be used later to compute word vectors or to restart\nthe optimization.\n\nThe following ``fasttext(1)`` command is equivalent\n\n.. code:: shell\n\n # Skipgram model\n ./fasttext skipgram -input data.txt -output model\n\n # CBOW model\n ./fasttext cbow -input data.txt -output model\n\nObtaining word vectors for out-of-vocabulary words\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nThe previously trained model can be used to compute word vectors for\nout-of-vocabulary words.\n\n.. code:: python\n\n print model['king'] # get the vector of the word 'king'\n\nthe following ``fasttext(1)`` command is equivalent:\n\n.. code:: shell\n\n echo \"king\" | ./fasttext print-vectors model.bin\n\nThis will output the vector of word ``king`` to the standard output.\n\nLoad pre-trained model\n~~~~~~~~~~~~~~~~~~~~~~\n\nWe can use ``fasttext.load_model`` to load pre-trained model:\n\n.. code:: python\n\n model = fasttext.load_model('model.bin')\n print model.words # list of words in dictionary\n print model['king'] # get the vector of the word 'king'\n\nText classification\n~~~~~~~~~~~~~~~~~~~\n\nThis package can also be used to train supervised text classifiers and\nload pre-trained classifier from fastText.\n\nIn order to train a text classifier using the method described in\n`2 <#bag-of-tricks-for-efficient-text-classification>`__, we can use the\nfollowing function:\n\n.. code:: python\n\n classifier = fasttext.supervised('data.train.txt', 'model')\n\nequivalent as ``fasttext(1)`` command:\n\n.. code:: shell\n\n ./fasttext supervised -input data.train.txt -output model\n\nwhere ``data.train.txt`` is a text file containing a training sentence\nper line along with the labels. By default, we assume that labels are\nwords that are prefixed by the string ``__label__``.\n\nWe can specify the label prefix with the ``label_prefix`` param:\n\n.. code:: python\n\n classifier = fasttext.supervised('data.train.txt', 'model', label_prefix='__label__')\n\nequivalent as ``fasttext(1)`` command:\n\n.. code:: shell\n\n ./fasttext supervised -input data.train.txt -output model -label '__label__'\n\nThis will output two files: ``model.bin`` and ``model.vec``.\n\nOnce the model was trained, we can evaluate it by computing the\nprecision at 1 (P@1) and the recall on a test set using\n``classifier.test`` function:\n\n.. code:: python\n\n result = classifier.test('test.txt')\n print 'P@1:', result.precision\n print 'R@1:', result.recall\n print 'Number of examples:', result.nexamples\n\nThis will print the same output to stdout as:\n\n.. code:: shell\n\n ./fasttext test model.bin test.txt\n\nIn order to obtain the most likely label for a list of text, we can use\n``classifer.predict`` method:\n\n.. code:: python\n\n texts = ['example very long text 1', 'example very longtext 2']\n labels = classifier.predict(texts)\n print labels\n\n # Or with the probability\n labels = classifier.predict_proba(texts)\n print labels\n\nWe can specify ``k`` value to get the k-best labels from classifier:\n\n.. code:: python\n\n labels = classifier.predict(texts, k=3)\n print labels\n\n # Or with the probability\n labels = classifier.predict_proba(texts, k=3)\n print labels\n\nThis interface is equivalent as ``fasttext(1)`` predict command. The\nsame model with the same input set will have the same prediction.\n\nAPI documentation\n-----------------\n\nSkipgram model\n~~~~~~~~~~~~~~\n\nTrain & load skipgram model\n\n.. code:: python\n\n model = fasttext.skipgram(params)\n\nList of available ``params`` and their default value:\n\n::\n\n input_file training file path (required)\n output output file path (required)\n lr learning rate [0.05]\n lr_update_rate change the rate of updates for the learning rate [100]\n dim size of word vectors [100]\n ws size of the context window [5]\n epoch number of epochs [5]\n min_count minimal number of word occurences [5]\n neg number of negatives sampled [5]\n word_ngrams max length of word ngram [1]\n loss loss function {ns, hs, softmax} [ns]\n bucket number of buckets [2000000]\n minn min length of char ngram [3]\n maxn max length of char ngram [6]\n thread number of threads [12]\n t sampling threshold [0.0001]\n silent disable the log output from the C++ extension [1]\n encoding specify input_file encoding [utf-8]\n\nExample usage:\n\n.. code:: python\n\n model = fasttext.skipgram('train.txt', 'model', lr=0.1, dim=300)\n\nCBOW model\n~~~~~~~~~~\n\nTrain & load CBOW model\n\n.. code:: python\n\n model = fasttext.cbow(params)\n\nList of available ``params`` and their default value:\n\n::\n\n input_file training file path (required)\n output output file path (required)\n lr learning rate [0.05]\n lr_update_rate change the rate of updates for the learning rate [100]\n dim size of word vectors [100]\n ws size of the context window [5]\n epoch number of epochs [5]\n min_count minimal number of word occurences [5]\n neg number of negatives sampled [5]\n word_ngrams max length of word ngram [1]\n loss loss function {ns, hs, softmax} [ns]\n bucket number of buckets [2000000]\n minn min length of char ngram [3]\n maxn max length of char ngram [6]\n thread number of threads [12]\n t sampling threshold [0.0001]\n silent disable the log output from the C++ extension [1]\n encoding specify input_file encoding [utf-8]\n\nExample usage:\n\n.. code:: python\n\n model = fasttext.cbow('train.txt', 'model', lr=0.1, dim=300)\n\nLoad pre-trained model\n~~~~~~~~~~~~~~~~~~~~~~\n\nFile ``.bin`` that previously trained or generated by fastText can be\nloaded using this function\n\n.. code:: python\n\n model = fasttext.load_model('model.bin', encoding='utf-8')\n\nAttributes and methods for the model\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nSkipgram and CBOW model have the following atributes & methods\n\n.. code:: python\n\n model.model_name # Model name\n model.words # List of words in the dictionary\n model.dim # Size of word vector\n model.ws # Size of context window\n model.epoch # Number of epochs\n model.min_count # Minimal number of word occurences\n model.neg # Number of negative sampled\n model.word_ngrams # Max length of word ngram\n model.loss_name # Loss function name\n model.bucket # Number of buckets\n model.minn # Min length of char ngram\n model.maxn # Max length of char ngram\n model.lr_update_rate # Rate of updates for the learning rate\n model.t # Value of sampling threshold\n model.encoding # Encoding of the model\n model[word] # Get the vector of specified word\n\nSupervised model\n~~~~~~~~~~~~~~~~\n\nTrain & load the classifier\n\n.. code:: python\n\n classifier = fasttext.supervised(params)\n\nList of available ``params`` and their default value:\n\n::\n\n input_file training file path (required)\n output output file path (required)\n label_prefix label prefix ['__label__']\n lr learning rate [0.1]\n lr_update_rate change the rate of updates for the learning rate [100]\n dim size of word vectors [100]\n ws size of the context window [5]\n epoch number of epochs [5]\n min_count minimal number of word occurences [1]\n neg number of negatives sampled [5]\n word_ngrams max length of word ngram [1]\n loss loss function {ns, hs, softmax} [softmax]\n bucket number of buckets [0]\n minn min length of char ngram [0]\n maxn max length of char ngram [0]\n thread number of threads [12]\n t sampling threshold [0.0001]\n silent disable the log output from the C++ extension [1]\n encoding specify input_file encoding [utf-8]\n pretrained_vectors\t pretrained word vectors (.vec file) for supervised learning []\n\nExample usage:\n\n.. code:: python\n\n classifier = fasttext.supervised('train.txt', 'model', label_prefix='__myprefix__',\n thread=4)\n\nLoad pre-trained classifier\n~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nFile ``.bin`` that previously trained or generated by fastText can be\nloaded using this function.\n\n.. code:: shell\n\n ./fasttext supervised -input train.txt -output classifier -label 'some_prefix'\n\n.. code:: python\n\n classifier = fasttext.load_model('classifier.bin', label_prefix='some_prefix')\n\nTest classifier\n~~~~~~~~~~~~~~~\n\nThis is equivalent as ``fasttext(1)`` test command. The test using the\nsame model and test set will produce the same value for the precision at\none and the number of examples.\n\n.. code:: python\n\n result = classifier.test(params)\n\n # Properties\n result.precision # Precision at one\n result.recall # Recall at one\n result.nexamples # Number of test examples\n\nThe param ``k`` is optional, and equal to ``1`` by default.\n\nPredict the most-likely label of texts\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nThis interface is equivalent as ``fasttext(1)`` predict command.\n\n``texts`` is an array of string\n\n.. code:: python\n\n labels = classifier.predict(texts, k)\n\n # Or with probability\n labels = classifier.predict_proba(texts, k)\n\nThe param ``k`` is optional, and equal to ``1`` by default.\n\nAttributes and methods for the classifier\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nClassifier have the following atributes & methods\n\n.. code:: python\n\n classifier.labels # List of labels\n classifier.label_prefix # Prefix of the label\n classifier.dim # Size of word vector\n classifier.ws # Size of context window\n classifier.epoch # Number of epochs\n classifier.min_count # Minimal number of word occurences\n classifier.neg # Number of negative sampled\n classifier.word_ngrams # Max length of word ngram\n classifier.loss_name # Loss function name\n classifier.bucket # Number of buckets\n classifier.minn # Min length of char ngram\n classifier.maxn # Max length of char ngram\n classifier.lr_update_rate # Rate of updates for the learning rate\n classifier.t # Value of sampling threshold\n classifier.encoding # Encoding that used by classifier\n classifier.test(filename, k) # Test the classifier\n classifier.predict(texts, k) # Predict the most likely label\n classifier.predict_proba(texts, k) # Predict the most likely label include their probability\n\nThe param ``k`` for ``classifier.test``, ``classifier.predict`` and\n``classifier.predict_proba`` is optional, and equal to ``1`` by default.\n\nReferences\n----------\n\nEnriching Word Vectors with Subword Information\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\n[1] P. Bojanowski\\*, E. Grave\\*, A. Joulin, T. Mikolov, `*Enriching Word\nVectors with Subword\nInformation* `__\n\n::\n\n @article{bojanowski2016enriching,\n title={Enriching Word Vectors with Subword Information},\n author={Bojanowski, Piotr and Grave, Edouard and Joulin, Armand and Mikolov, Tomas},\n journal={arXiv preprint arXiv:1607.04606},\n year={2016}\n }\n\nBag of Tricks for Efficient Text Classification\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\n[2] A. Joulin, E. Grave, P. Bojanowski, T. Mikolov, `*Bag of Tricks for\nEfficient Text\nClassification* `__\n\n::\n\n @article{joulin2016bag,\n title={Bag of Tricks for Efficient Text Classification},\n author={Joulin, Armand and Grave, Edouard and Bojanowski, Piotr and Mikolov, Tomas},\n journal={arXiv preprint arXiv:1607.01759},\n year={2016}\n }\n\n(\\* These authors contributed equally.)\n\nJoin the fastText community\n---------------------------\n\n- Facebook page: https://www.facebook.com/groups/1174547215919768\n- Google group:\n https://groups.google.com/forum/#!forum/fasttext-library\n\n.. |Build Status| image:: https://travis-ci.org/salestock/fastText.py.svg?branch=master\n :target: https://travis-ci.org/salestock/fastText.py\n.. |PyPI version| image:: https://badge.fury.io/py/fasttext.svg\n :target: https://badge.fury.io/py/fasttext\n\n\n", "description_content_type": "", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/pyk/fastText.py", "keywords": "", "license": "BSD 3-Clause License", "maintainer": "", "maintainer_email": "", "name": "fasttext-win", "package_url": "https://pypi.org/project/fasttext-win/", "platform": "", "project_url": "https://pypi.org/project/fasttext-win/", "project_urls": { "Homepage": "https://github.com/pyk/fastText.py" }, "release_url": "https://pypi.org/project/fasttext-win/0.8.3/", "requires_dist": [ "numpy (>=1)", "future" ], "requires_python": "", "summary": "A Python interface for Facebook fastText library", "version": "0.8.3" }, "last_serial": 4051749, "releases": { "0.8.3": [ { "comment_text": "", "digests": { "md5": "7af2711d168921023be5131cff8aab85", "sha256": "ba510d486565b8518f258b676c30ef0f502113f4610863ab9a9af9cc954665e4" }, "downloads": -1, "filename": "fasttext_win-0.8.3-cp36-cp36m-win_amd64.whl", "has_sig": false, "md5_digest": "7af2711d168921023be5131cff8aab85", "packagetype": "bdist_wheel", "python_version": "cp36", "requires_python": null, "size": 106819, "upload_time": "2018-07-11T17:32:41", "url": "https://files.pythonhosted.org/packages/10/dd/c2765ebd81fc54dcd22dafb3a818c6e4dce405cbb1174645275193654e18/fasttext_win-0.8.3-cp36-cp36m-win_amd64.whl" }, { "comment_text": "", "digests": { "md5": "82c042579a578cc60933763ccf14a1dd", "sha256": "6c65751f1306e42809e15d98f04480a2471f5421f938db71f8e7b3c8ee930255" }, "downloads": -1, "filename": "fasttext_win-0.8.3.tar.gz", "has_sig": false, "md5_digest": "82c042579a578cc60933763ccf14a1dd", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 82316, "upload_time": "2018-07-11T17:32:42", "url": "https://files.pythonhosted.org/packages/a0/32/000658b6c380a987afdf1c7efee163d2014f18de450bffaca5f247ae8570/fasttext_win-0.8.3.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "7af2711d168921023be5131cff8aab85", "sha256": "ba510d486565b8518f258b676c30ef0f502113f4610863ab9a9af9cc954665e4" }, "downloads": -1, "filename": "fasttext_win-0.8.3-cp36-cp36m-win_amd64.whl", "has_sig": false, "md5_digest": "7af2711d168921023be5131cff8aab85", "packagetype": "bdist_wheel", "python_version": "cp36", "requires_python": null, "size": 106819, "upload_time": "2018-07-11T17:32:41", "url": "https://files.pythonhosted.org/packages/10/dd/c2765ebd81fc54dcd22dafb3a818c6e4dce405cbb1174645275193654e18/fasttext_win-0.8.3-cp36-cp36m-win_amd64.whl" }, { "comment_text": "", "digests": { "md5": "82c042579a578cc60933763ccf14a1dd", "sha256": "6c65751f1306e42809e15d98f04480a2471f5421f938db71f8e7b3c8ee930255" }, "downloads": -1, "filename": "fasttext_win-0.8.3.tar.gz", "has_sig": false, "md5_digest": "82c042579a578cc60933763ccf14a1dd", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 82316, "upload_time": "2018-07-11T17:32:42", "url": "https://files.pythonhosted.org/packages/a0/32/000658b6c380a987afdf1c7efee163d2014f18de450bffaca5f247ae8570/fasttext_win-0.8.3.tar.gz" } ] }