{ "info": { "author": "Marek Wydmuch", "author_email": "mwydmuch@cs.put.poznan.pl", "bugtrack_url": null, "classifiers": [ "Development Status :: 3 - Alpha", "Intended Audience :: Developers", "Intended Audience :: Science/Research", "License :: OSI Approved :: MIT License", "Operating System :: MacOS", "Operating System :: POSIX", "Operating System :: Unix", "Programming Language :: Python :: 2.7", "Programming Language :: Python :: 3.4", "Programming Language :: Python :: 3.5", "Programming Language :: Python :: 3.6", "Programming Language :: Python :: 3.7", "Topic :: Scientific/Engineering", "Topic :: Software Development" ], "description": "extremeText\n===========\n\n`extremeText `__ is an\nextension of `fastText `__\nlibrary for multi-label classification including extreme cases with\nhundreds of thousands and millions of labels.\n\n`extremeText `__ implements:\n\n- Probabilistic Labels Tree (PLT) loss for extreme multi-Label\n classification with top-down hierarchical clustering (k-means) for\n tree building,\n- sigmoid loss for multi-label classification,\n- L2 regularization and FOBOS update for all losses,\n- ensemble of loss layers with bagging,\n- calculation of hidden (document) vector as a weighted average of the\n word vectors,\n- calculation of TF-IDF weights for words.\n\nRequirements\n------------\n\n`extremeText `__ builds on\nmodern Mac OS and Linux distributions. Since it uses C++11 features, it\nrequires a compiler with good C++11 support. These include:\n\n- (gcc-4.8 or newer) or (clang-3.3 or newer)\n\nYou will need:\n\n- `Python `__ version 2.7 or >=3.4\n- `NumPy `__ &\n `SciPy `__\n- `pybind11 `__\n\nInstalling extremeText\n----------------------\n\nThe easiest way to get\n`extremeText `__ is to use\n`pip `__.\n\n::\n\n $ pip install extremetext\n\nInstalling on MacOS may require setting\n``MACOSX_DEPLOYMENT_TARGET=10.9`` first:\n\n::\n\n $ export MACOSX_DEPLOYMENT_TARGET=10.9\n $ pip install extremetext\n\nThe latest version of\n`extremeText `__ can be build\nfrom sources using pip or alternatively setuptools.\n\n::\n\n $ git clone https://github.com/mwydmuch/extremeText.git\n $ cd extremeText\n $ pip install .\n (or) $ python setup.py install\n\nNow you can import this library with:\n\n::\n\n import extremeText\n\nExamples\n--------\n\nIn general it is assumed that the reader already has good knowledge of\nfastText/extremeText. For this consider the main\n`README `__\nand `the tutorials on fastText\nwebsite `__.\n\nWe recommend you look at the `examples within the doc\nfolder `__.\n\nAs with any package you can get help on any Python function using the\nhelp function.\n\nFor example:\n\n::\n\n +>>> import extremeText\n +>>> help(extremeText.ExtremeText)\n\n Help on module extremeText.ExtremeText in extremeText:\n\n NAME\n extremeText.ExtremeText\n\n DESCRIPTION\n # Copyright (c) 2017-present, Facebook, Inc.\n # All rights reserved.\n #\n # This source code is licensed under the BSD-style license found in the\n # LICENSE file in the root directory of this source tree. An additional grant\n # of patent rights can be found in the PATENTS file in the same directory.\n\n FUNCTIONS\n load_model(path)\n Load a model given a filepath and return a model object.\n\n tokenize(text)\n Given a string of text, tokenize it and return a list of tokens\n [...]\n\nIMPORTANT: Preprocessing data / enconding conventions\n-----------------------------------------------------\n\nIn general it is important to properly preprocess your data. Example\nscripts in the `root\nfolder `__ do this.\n\nextremeText like fastText assumes UTF-8 encoded text. All text must be\n`unicode for\nPython2 `__\nand `str for\nPython3 `__.\nThe passed text will be `encoded as UTF-8 by\npybind11 `__\nbefore passed to the extremeText C++ library. This means it is important\nto use UTF-8 encoded text when building a model. On Unix-like systems\nyou can convert text using\n`iconv `__.\n\nextremeText will tokenize (split text into pieces) based on the\nfollowing ASCII characters (bytes). In particular, it is not aware of\nUTF-8 whitespace. We advice the user to convert UTF-8 whitespace / word\nboundaries into one of the following symbols as appropiate.\n\n- space\n- tab\n- vertical tab\n- carriage return\n- formfeed\n- the null character\n\nThe newline character is used to delimit lines of text. In particular,\nthe EOS token is appended to a line of text if a newline character is\nencountered. The only exception is if the number of tokens exceeds the\nMAX\\_LINE\\_SIZE constant as defined in the `Dictionary\nheader `__.\nThis means if you have text that is not separate by newlines, such as\nthe `fil9 dataset `__, it will be\nbroken into chunks with MAX\\_LINE\\_SIZE of tokens and the EOS token is\nnot appended.\n\nThe length of a token is the number of UTF-8 characters by considering\nthe `leading two bits of a\nbyte `__ to identify\n`subsequent bytes of a multi-byte\nsequence `__.\nKnowing this is especially important when choosing the minimum and\nmaximum length of subwords. Further, the EOS token (as specified in the\n`Dictionary\nheader `__)\nis considered a character and will not be broken into subwords.\n\nReference\n---------\n\nPlease cite below work if using this package for extreme classification.\n\nM. Wydmuch, K. Jasinska, M. Kuznetsov, R. Busa-Fekete, K. Dembczy\u0144ski.\n`*A no-regret generalization of hierarchical softmax to extreme\nmulti-label\nclassification* `__.\nAdvances in Neural Information Processing Systems 31, 2018.", "description_content_type": "", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/mwydmuch/extremeText", "keywords": "", "license": "BSD", "maintainer": "", "maintainer_email": "", "name": "extremetext", "package_url": "https://pypi.org/project/extremetext/", "platform": "", "project_url": "https://pypi.org/project/extremetext/", "project_urls": { "Homepage": "https://github.com/mwydmuch/extremeText" }, "release_url": "https://pypi.org/project/extremetext/0.8.4/", "requires_dist": null, "requires_python": "", "summary": "A Python interface for extremeText library", "version": "0.8.4" }, "last_serial": 4651605, "releases": { "0.8.3": [ { "comment_text": "", "digests": { "md5": "c99c29a0cad007a8ab4ed1c6d6f6abe6", "sha256": "3390e57f89f08996467440d93c0030dd3c931cc3cf10abe0e4fd6e9c87280377" }, "downloads": -1, "filename": "extremetext-0.8.3.tar.gz", "has_sig": false, "md5_digest": "c99c29a0cad007a8ab4ed1c6d6f6abe6", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 66151, "upload_time": "2018-11-29T02:49:12", "url": "https://files.pythonhosted.org/packages/f4/d5/466e8c6ccb8c8f68b9f10fe01a11c1424d377d644bb8a96e8f883f9ac87d/extremetext-0.8.3.tar.gz" } ], "0.8.4": [ { "comment_text": "", "digests": { "md5": "a048169eb808c311138c0c647537db2e", "sha256": "bb0843c232586df8c56ece6d59ddcc57b5c7d28c7c65e4b92fe237dc649b838c" }, "downloads": -1, "filename": "extremetext-0.8.4.tar.gz", "has_sig": false, "md5_digest": "a048169eb808c311138c0c647537db2e", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 66357, "upload_time": "2019-01-02T03:49:28", "url": "https://files.pythonhosted.org/packages/89/b9/037923bbb7fad83e46a864e27aac0d5f332d07530d3d6c4b7dcd643b0503/extremetext-0.8.4.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "a048169eb808c311138c0c647537db2e", "sha256": "bb0843c232586df8c56ece6d59ddcc57b5c7d28c7c65e4b92fe237dc649b838c" }, "downloads": -1, "filename": "extremetext-0.8.4.tar.gz", "has_sig": false, "md5_digest": "a048169eb808c311138c0c647537db2e", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 66357, "upload_time": "2019-01-02T03:49:28", "url": "https://files.pythonhosted.org/packages/89/b9/037923bbb7fad83e46a864e27aac0d5f332d07530d3d6c4b7dcd643b0503/extremetext-0.8.4.tar.gz" } ] }