{ "info": { "author": "Benjamin Heinzerling", "author_email": "benjamin.heinzerling@h-its.org", "bugtrack_url": null, "classifiers": [ "License :: OSI Approved :: MIT License", "Operating System :: OS Independent", "Programming Language :: Python :: 3" ], "description": "# BPEmb\n\nBPEmb is a collection of pre-trained subword embeddings in 275 languages, based on Byte-Pair Encoding (BPE) and trained on Wikipedia. Its intended use is as input for neural models in natural language processing.\n\n[Website](https://nlp.h-its.org/bpemb) \u30fb \n[Usage](#usage) \u30fb \n[Download](#downloads-for-each-language) \u30fb \n[MultiBPEmb](#multibpemb) \u30fb \n[Paper (pdf)](http://www.lrec-conf.org/proceedings/lrec2018/pdf/1049.pdf) \u30fb \n[Citing BPEmb](#citing-bpemb)\n\n\n\n## Usage\n\nInstall BPEmb with pip:\n\n```bash\npip install bpemb\n```\n\nEmbeddings and SentencePiece models will be downloaded automatically the first time you use them.\n\n```python\n>>> from bpemb import BPEmb\n# load English BPEmb model with default vocabulary size (10k) and 50-dimensional embeddings\n>>> bpemb_en = BPEmb(lang=\"en\", dim=50)\ndownloading https://nlp.h-its.org/bpemb/en/en.wiki.bpe.vs10000.model\ndownloading https://nlp.h-its.org/bpemb/en/en.wiki.bpe.vs10000.d50.w2v.bin.tar.gz\n```\n\nYou can do two main things with BPEmb. The first is subword segmentation:\n```python\n# apply English BPE subword segmentation model\n>>> bpemb_en.encode(\"Stratford\")\n['\u2581strat', 'ford']\n# load Chinese BPEmb model with vocabulary size 100k and default (100-dim) embeddings\n>>> bpemb_zh = BPEmb(lang=\"zh\", vs=100000)\n# apply Chinese BPE subword segmentation model\n>>> bpemb_zh.encode(\"\u8fd9\u662f\u4e00\u4e2a\u4e2d\u6587\u53e5\u5b50\") # \"This is a Chinese sentence.\"\n['\u2581\u8fd9\u662f\u4e00\u4e2a', '\u4e2d\u6587', '\u53e5\u5b50'] # [\"This is a\", \"Chinese\", \"sentence\"]\n```\n\nIf / how a word gets split depends on the vocabulary size. Generally, a smaller vocabulary size will yield a segmentation into many subwords, while a large vocabulary size will result in frequent words not being split:\n\n| vocabulary size | segmentation |\n| --- | --- |\n| 1000 | ['\u2581str', 'at', 'f', 'ord'] |\n| 3000 | ['\u2581str', 'at', 'ford'] |\n| 5000 | ['\u2581str', 'at', 'ford'] |\n| 10000 | ['\u2581strat', 'ford'] |\n| 25000 | ['\u2581stratford'] |\n| 50000 | ['\u2581stratford'] |\n| 100000 | ['\u2581stratford'] |\n| 200000 | ['\u2581stratford'] |\n\n\nThe second purpose of BPEmb is to provide pretrained subword embeddings:\n\n```python\n# Embeddings are wrapped in a gensim KeyedVectors object\n>>> type(bpemb_zh.emb)\ngensim.models.keyedvectors.Word2VecKeyedVectors\n# You can use BPEmb objects like gensim KeyedVectors\n>>> bpemb_en.most_similar(\"ford\")\n[('bury', 0.8745079040527344),\n ('ton', 0.8725000619888306),\n ('well', 0.871537446975708),\n ('ston', 0.8701574206352234),\n ('worth', 0.8672043085098267),\n ('field', 0.859795331954956),\n ('ley', 0.8591548204421997),\n ('ington', 0.8126075267791748),\n ('bridge', 0.8099068999290466),\n ('brook', 0.7979353070259094)]\n>>> type(bpemb_en.vectors)\nnumpy.ndarray\n>>> bpemb_en.vectors.shape\n(10000, 50)\n>>> bpemb_zh.vectors.shape\n(100000, 100)\n```\n\nTo use subword embeddings in your neural network, either encode your input into subword IDs:\n```python\n>>> ids = bpemb_zh.encode_ids(\"\u8fd9\u662f\u4e00\u4e2a\u4e2d\u6587\u53e5\u5b50\")\n[25950, 695, 20199]\n>>> bpemb_zh.vectors[ids].shape\n(3, 100)\n```\n\nOr use the `embed` method:\n```python\n# apply Chinese subword segmentation and perform embedding lookup\n>>> bpemb_zh.embed(\"\u8fd9\u662f\u4e00\u4e2a\u4e2d\u6587\u53e5\u5b50\").shape\n(3, 100)\n```\n\n# Downloads for each language\n\n[ab (Abkhazian)](http://nlp.h-its.org/bpemb/ab) \u30fb \n[ace (Achinese)](http://nlp.h-its.org/bpemb/ace) \u30fb \n[ady (Adyghe)](http://nlp.h-its.org/bpemb/ady) \u30fb \n[af (Afrikaans)](http://nlp.h-its.org/bpemb/af) \u30fb \n[ak (Akan)](http://nlp.h-its.org/bpemb/ak) \u30fb \n[als (Alemannic)](http://nlp.h-its.org/bpemb/als) \u30fb \n[am (Amharic)](http://nlp.h-its.org/bpemb/am) \u30fb \n[an (Aragonese)](http://nlp.h-its.org/bpemb/an) \u30fb \n[ang (Old English)](http://nlp.h-its.org/bpemb/ang) \u30fb \n[ar (Arabic)](http://nlp.h-its.org/bpemb/ar) \u30fb \n[arc (Official Aramaic)](http://nlp.h-its.org/bpemb/arc) \u30fb \n[arz (Egyptian Arabic)](http://nlp.h-its.org/bpemb/arz) \u30fb \n[as (Assamese)](http://nlp.h-its.org/bpemb/as) \u30fb \n[ast (Asturian)](http://nlp.h-its.org/bpemb/ast) \u30fb \n[atj (Atikamekw)](http://nlp.h-its.org/bpemb/atj) \u30fb \n[av (Avaric)](http://nlp.h-its.org/bpemb/av) \u30fb \n[ay (Aymara)](http://nlp.h-its.org/bpemb/ay) \u30fb \n[az (Azerbaijani)](http://nlp.h-its.org/bpemb/az) \u30fb \n[azb (South Azerbaijani)](http://nlp.h-its.org/bpemb/azb)\n\n[ba (Bashkir)](http://nlp.h-its.org/bpemb/ba) \u30fb \n[bar (Bavarian)](http://nlp.h-its.org/bpemb/bar) \u30fb \n[bcl (Central Bikol)](http://nlp.h-its.org/bpemb/bcl) \u30fb \n[be (Belarusian)](http://nlp.h-its.org/bpemb/be) \u30fb \n[bg (Bulgarian)](http://nlp.h-its.org/bpemb/bg) \u30fb \n[bi (Bislama)](http://nlp.h-its.org/bpemb/bi) \u30fb \n[bjn (Banjar)](http://nlp.h-its.org/bpemb/bjn) \u30fb \n[bm (Bambara)](http://nlp.h-its.org/bpemb/bm) \u30fb \n[bn (Bengali)](http://nlp.h-its.org/bpemb/bn) \u30fb \n[bo (Tibetan)](http://nlp.h-its.org/bpemb/bo) \u30fb \n[bpy (Bishnupriya)](http://nlp.h-its.org/bpemb/bpy) \u30fb \n[br (Breton)](http://nlp.h-its.org/bpemb/br) \u30fb \n[bs (Bosnian)](http://nlp.h-its.org/bpemb/bs) \u30fb \n[bug (Buginese)](http://nlp.h-its.org/bpemb/bug) \u30fb \n[bxr (Russia Buriat)](http://nlp.h-its.org/bpemb/bxr)\n\n[ca (Catalan)](http://nlp.h-its.org/bpemb/ca) \u30fb \n[cdo (Min Dong Chinese)](http://nlp.h-its.org/bpemb/cdo) \u30fb \n[ce (Chechen)](http://nlp.h-its.org/bpemb/ce) \u30fb \n[ceb (Cebuano)](http://nlp.h-its.org/bpemb/ceb) \u30fb \n[ch (Chamorro)](http://nlp.h-its.org/bpemb/ch) \u30fb \n[chr (Cherokee)](http://nlp.h-its.org/bpemb/chr) \u30fb \n[chy (Cheyenne)](http://nlp.h-its.org/bpemb/chy) \u30fb \n[ckb (Central Kurdish)](http://nlp.h-its.org/bpemb/ckb) \u30fb \n[co (Corsican)](http://nlp.h-its.org/bpemb/co) \u30fb \n[cr (Cree)](http://nlp.h-its.org/bpemb/cr) \u30fb \n[crh (Crimean Tatar)](http://nlp.h-its.org/bpemb/crh) \u30fb \n[cs (Czech)](http://nlp.h-its.org/bpemb/cs) \u30fb \n[csb (Kashubian)](http://nlp.h-its.org/bpemb/csb) \u30fb \n[cu (Church Slavic)](http://nlp.h-its.org/bpemb/cu) \u30fb \n[cv (Chuvash)](http://nlp.h-its.org/bpemb/cv) \u30fb \n[cy (Welsh)](http://nlp.h-its.org/bpemb/cy)\n\n[da (Danish)](http://nlp.h-its.org/bpemb/da) \u30fb \n[de (German)](http://nlp.h-its.org/bpemb/de) \u30fb \n[din (Dinka)](http://nlp.h-its.org/bpemb/din) \u30fb \n[diq (Dimli)](http://nlp.h-its.org/bpemb/diq) \u30fb \n[dsb (Lower Sorbian)](http://nlp.h-its.org/bpemb/dsb) \u30fb \n[dty (Dotyali)](http://nlp.h-its.org/bpemb/dty) \u30fb \n[dv (Dhivehi)](http://nlp.h-its.org/bpemb/dv) \u30fb \n[dz (Dzongkha)](http://nlp.h-its.org/bpemb/dz)\n\n[ee (Ewe)](http://nlp.h-its.org/bpemb/ee) \u30fb \n[el (Modern Greek)](http://nlp.h-its.org/bpemb/el) \u30fb \n[en (English)](http://nlp.h-its.org/bpemb/en) \u30fb \n[eo (Esperanto)](http://nlp.h-its.org/bpemb/eo) \u30fb \n[es (Spanish)](http://nlp.h-its.org/bpemb/es) \u30fb \n[et (Estonian)](http://nlp.h-its.org/bpemb/et) \u30fb \n[eu (Basque)](http://nlp.h-its.org/bpemb/eu) \u30fb \n[ext (Extremaduran)](http://nlp.h-its.org/bpemb/ext)\n\n[fa (Persian)](http://nlp.h-its.org/bpemb/fa) \u30fb \n[ff (Fulah)](http://nlp.h-its.org/bpemb/ff) \u30fb \n[fi (Finnish)](http://nlp.h-its.org/bpemb/fi) \u30fb \n[fj (Fijian)](http://nlp.h-its.org/bpemb/fj) \u30fb \n[fo (Faroese)](http://nlp.h-its.org/bpemb/fo) \u30fb \n[fr (French)](http://nlp.h-its.org/bpemb/fr) \u30fb \n[frp (Arpitan)](http://nlp.h-its.org/bpemb/frp) \u30fb \n[frr (Northern Frisian)](http://nlp.h-its.org/bpemb/frr) \u30fb \n[fur (Friulian)](http://nlp.h-its.org/bpemb/fur) \u30fb \n[fy (Western Frisian)](http://nlp.h-its.org/bpemb/fy)\n\n[ga (Irish)](http://nlp.h-its.org/bpemb/ga) \u30fb \n[gag (Gagauz)](http://nlp.h-its.org/bpemb/gag) \u30fb \n[gan (Gan Chinese)](http://nlp.h-its.org/bpemb/gan) \u30fb \n[gd (Scottish Gaelic)](http://nlp.h-its.org/bpemb/gd) \u30fb \n[gl (Galician)](http://nlp.h-its.org/bpemb/gl) \u30fb \n[glk (Gilaki)](http://nlp.h-its.org/bpemb/glk) \u30fb \n[gn (Guarani)](http://nlp.h-its.org/bpemb/gn) \u30fb \n[gom (Goan Konkani)](http://nlp.h-its.org/bpemb/gom) \u30fb \n[got (Gothic)](http://nlp.h-its.org/bpemb/got) \u30fb \n[gu (Gujarati)](http://nlp.h-its.org/bpemb/gu) \u30fb \n[gv (Manx)](http://nlp.h-its.org/bpemb/gv)\n\n[ha (Hausa)](http://nlp.h-its.org/bpemb/ha) \u30fb \n[hak (Hakka Chinese)](http://nlp.h-its.org/bpemb/hak) \u30fb \n[haw (Hawaiian)](http://nlp.h-its.org/bpemb/haw) \u30fb \n[he (Hebrew)](http://nlp.h-its.org/bpemb/he) \u30fb \n[hi (Hindi)](http://nlp.h-its.org/bpemb/hi) \u30fb \n[hif (Fiji Hindi)](http://nlp.h-its.org/bpemb/hif) \u30fb \n[hr (Croatian)](http://nlp.h-its.org/bpemb/hr) \u30fb \n[hsb (Upper Sorbian)](http://nlp.h-its.org/bpemb/hsb) \u30fb \n[ht (Haitian)](http://nlp.h-its.org/bpemb/ht) \u30fb \n[hu (Hungarian)](http://nlp.h-its.org/bpemb/hu) \u30fb \n[hy (Armenian)](http://nlp.h-its.org/bpemb/hy)\n\n[ia (Interlingua)](http://nlp.h-its.org/bpemb/ia) \u30fb \n[id (Indonesian)](http://nlp.h-its.org/bpemb/id) \u30fb \n[ie (Interlingue)](http://nlp.h-its.org/bpemb/ie) \u30fb \n[ig (Igbo)](http://nlp.h-its.org/bpemb/ig) \u30fb \n[ik (Inupiaq)](http://nlp.h-its.org/bpemb/ik) \u30fb \n[ilo (Iloko)](http://nlp.h-its.org/bpemb/ilo) \u30fb \n[io (Ido)](http://nlp.h-its.org/bpemb/io) \u30fb \n[is (Icelandic)](http://nlp.h-its.org/bpemb/is) \u30fb \n[it (Italian)](http://nlp.h-its.org/bpemb/it) \u30fb \n[iu (Inuktitut)](http://nlp.h-its.org/bpemb/iu)\n\n[ja (Japanese)](http://nlp.h-its.org/bpemb/ja) \u30fb \n[jam (Jamaican Creole English)](http://nlp.h-its.org/bpemb/jam) \u30fb \n[jbo (Lojban)](http://nlp.h-its.org/bpemb/jbo) \u30fb \n[jv (Javanese)](http://nlp.h-its.org/bpemb/jv)\n\n[ka (Georgian)](http://nlp.h-its.org/bpemb/ka) \u30fb \n[kaa (Kara-Kalpak)](http://nlp.h-its.org/bpemb/kaa) \u30fb \n[kab (Kabyle)](http://nlp.h-its.org/bpemb/kab) \u30fb \n[kbd (Kabardian)](http://nlp.h-its.org/bpemb/kbd) \u30fb \n[kbp (Kabiy\u00e8)](http://nlp.h-its.org/bpemb/kbp) \u30fb \n[kg (Kongo)](http://nlp.h-its.org/bpemb/kg) \u30fb \n[ki (Kikuyu)](http://nlp.h-its.org/bpemb/ki) \u30fb \n[kk (Kazakh)](http://nlp.h-its.org/bpemb/kk) \u30fb \n[kl (Kalaallisut)](http://nlp.h-its.org/bpemb/kl) \u30fb \n[km (Central Khmer)](http://nlp.h-its.org/bpemb/km) \u30fb \n[kn (Kannada)](http://nlp.h-its.org/bpemb/kn) \u30fb \n[ko (Korean)](http://nlp.h-its.org/bpemb/ko) \u30fb \n[koi (Komi-Permyak)](http://nlp.h-its.org/bpemb/koi) \u30fb \n[krc (Karachay-Balkar)](http://nlp.h-its.org/bpemb/krc) \u30fb \n[ks (Kashmiri)](http://nlp.h-its.org/bpemb/ks) \u30fb \n[ksh (K\u00f6lsch)](http://nlp.h-its.org/bpemb/ksh) \u30fb \n[ku (Kurdish)](http://nlp.h-its.org/bpemb/ku) \u30fb \n[kv (Komi)](http://nlp.h-its.org/bpemb/kv) \u30fb \n[kw (Cornish)](http://nlp.h-its.org/bpemb/kw) \u30fb \n[ky (Kirghiz)](http://nlp.h-its.org/bpemb/ky)\n\n[la (Latin)](http://nlp.h-its.org/bpemb/la) \u30fb \n[lad (Ladino)](http://nlp.h-its.org/bpemb/lad) \u30fb \n[lb (Luxembourgish)](http://nlp.h-its.org/bpemb/lb) \u30fb \n[lbe (Lak)](http://nlp.h-its.org/bpemb/lbe) \u30fb \n[lez (Lezghian)](http://nlp.h-its.org/bpemb/lez) \u30fb \n[lg (Ganda)](http://nlp.h-its.org/bpemb/lg) \u30fb \n[li (Limburgan)](http://nlp.h-its.org/bpemb/li) \u30fb \n[lij (Ligurian)](http://nlp.h-its.org/bpemb/lij) \u30fb \n[lmo (Lombard)](http://nlp.h-its.org/bpemb/lmo) \u30fb \n[ln (Lingala)](http://nlp.h-its.org/bpemb/ln) \u30fb \n[lo (Lao)](http://nlp.h-its.org/bpemb/lo) \u30fb \n[lrc (Northern Luri)](http://nlp.h-its.org/bpemb/lrc) \u30fb \n[lt (Lithuanian)](http://nlp.h-its.org/bpemb/lt) \u30fb \n[ltg (Latgalian)](http://nlp.h-its.org/bpemb/ltg) \u30fb \n[lv (Latvian)](http://nlp.h-its.org/bpemb/lv)\n\n[mai (Maithili)](http://nlp.h-its.org/bpemb/mai) \u30fb \n[mdf (Moksha)](http://nlp.h-its.org/bpemb/mdf) \u30fb \n[mg (Malagasy)](http://nlp.h-its.org/bpemb/mg) \u30fb \n[mh (Marshallese)](http://nlp.h-its.org/bpemb/mh) \u30fb \n[mhr (Eastern Mari)](http://nlp.h-its.org/bpemb/mhr) \u30fb \n[mi (Maori)](http://nlp.h-its.org/bpemb/mi) \u30fb \n[min (Minangkabau)](http://nlp.h-its.org/bpemb/min) \u30fb \n[mk (Macedonian)](http://nlp.h-its.org/bpemb/mk) \u30fb \n[ml (Malayalam)](http://nlp.h-its.org/bpemb/ml) \u30fb \n[mn (Mongolian)](http://nlp.h-its.org/bpemb/mn) \u30fb \n[mr (Marathi)](http://nlp.h-its.org/bpemb/mr) \u30fb \n[mrj (Western Mari)](http://nlp.h-its.org/bpemb/mrj) \u30fb \n[ms (Malay)](http://nlp.h-its.org/bpemb/ms) \u30fb \n[mt (Maltese)](http://nlp.h-its.org/bpemb/mt) \u30fb \n[mwl (Mirandese)](http://nlp.h-its.org/bpemb/mwl) \u30fb \n[my (Burmese)](http://nlp.h-its.org/bpemb/my) \u30fb \n[myv (Erzya)](http://nlp.h-its.org/bpemb/myv) \u30fb \n[mzn (Mazanderani)](http://nlp.h-its.org/bpemb/mzn)\n\n[na (Nauru)](http://nlp.h-its.org/bpemb/na) \u30fb \n[nap (Neapolitan)](http://nlp.h-its.org/bpemb/nap) \u30fb \n[nds (Low German)](http://nlp.h-its.org/bpemb/nds) \u30fb \n[ne (Nepali)](http://nlp.h-its.org/bpemb/ne) \u30fb \n[new (Newari)](http://nlp.h-its.org/bpemb/new) \u30fb \n[ng (Ndonga)](http://nlp.h-its.org/bpemb/ng) \u30fb \n[nl (Dutch)](http://nlp.h-its.org/bpemb/nl) \u30fb \n[nn (Norwegian Nynorsk)](http://nlp.h-its.org/bpemb/nn) \u30fb \n[no (Norwegian)](http://nlp.h-its.org/bpemb/no) \u30fb \n[nov (Novial)](http://nlp.h-its.org/bpemb/nov) \u30fb \n[nrm (Narom)](http://nlp.h-its.org/bpemb/nrm) \u30fb \n[nso (Pedi)](http://nlp.h-its.org/bpemb/nso) \u30fb \n[nv (Navajo)](http://nlp.h-its.org/bpemb/nv) \u30fb \n[ny (Nyanja)](http://nlp.h-its.org/bpemb/ny)\n\n[oc (Occitan)](http://nlp.h-its.org/bpemb/oc) \u30fb \n[olo (Livvi)](http://nlp.h-its.org/bpemb/olo) \u30fb \n[om (Oromo)](http://nlp.h-its.org/bpemb/om) \u30fb \n[or (Oriya)](http://nlp.h-its.org/bpemb/or) \u30fb \n[os (Ossetian)](http://nlp.h-its.org/bpemb/os)\n\n[pa (Panjabi)](http://nlp.h-its.org/bpemb/pa) \u30fb \n[pag (Pangasinan)](http://nlp.h-its.org/bpemb/pag) \u30fb \n[pam (Pampanga)](http://nlp.h-its.org/bpemb/pam) \u30fb \n[pap (Papiamento)](http://nlp.h-its.org/bpemb/pap) \u30fb \n[pcd (Picard)](http://nlp.h-its.org/bpemb/pcd) \u30fb \n[pdc (Pennsylvania German)](http://nlp.h-its.org/bpemb/pdc) \u30fb \n[pfl (Pfaelzisch)](http://nlp.h-its.org/bpemb/pfl) \u30fb \n[pi (Pali)](http://nlp.h-its.org/bpemb/pi) \u30fb \n[pih (Pitcairn-Norfolk)](http://nlp.h-its.org/bpemb/pih) \u30fb \n[pl (Polish)](http://nlp.h-its.org/bpemb/pl) \u30fb \n[pms (Piemontese)](http://nlp.h-its.org/bpemb/pms) \u30fb \n[pnb (Western Panjabi)](http://nlp.h-its.org/bpemb/pnb) \u30fb \n[pnt (Pontic)](http://nlp.h-its.org/bpemb/pnt) \u30fb \n[ps (Pushto)](http://nlp.h-its.org/bpemb/ps) \u30fb \n[pt (Portuguese)](http://nlp.h-its.org/bpemb/pt)\n\n[qu (Quechua)](http://nlp.h-its.org/bpemb/qu)\n\n[rm (Romansh)](http://nlp.h-its.org/bpemb/rm) \u30fb \n[rmy (Vlax Romani)](http://nlp.h-its.org/bpemb/rmy) \u30fb \n[rn (Rundi)](http://nlp.h-its.org/bpemb/rn) \u30fb \n[ro (Romanian)](http://nlp.h-its.org/bpemb/ro) \u30fb \n[ru (Russian)](http://nlp.h-its.org/bpemb/ru) \u30fb \n[rue (Rusyn)](http://nlp.h-its.org/bpemb/rue) \u30fb \n[rw (Kinyarwanda)](http://nlp.h-its.org/bpemb/rw)\n\n[sa (Sanskrit)](http://nlp.h-its.org/bpemb/sa) \u30fb \n[sah (Yakut)](http://nlp.h-its.org/bpemb/sah) \u30fb \n[sc (Sardinian)](http://nlp.h-its.org/bpemb/sc) \u30fb \n[scn (Sicilian)](http://nlp.h-its.org/bpemb/scn) \u30fb \n[sco (Scots)](http://nlp.h-its.org/bpemb/sco) \u30fb \n[sd (Sindhi)](http://nlp.h-its.org/bpemb/sd) \u30fb \n[se (Northern Sami)](http://nlp.h-its.org/bpemb/se) \u30fb \n[sg (Sango)](http://nlp.h-its.org/bpemb/sg) \u30fb \n[sh (Serbo-Croatian)](http://nlp.h-its.org/bpemb/sh) \u30fb \n[si (Sinhala)](http://nlp.h-its.org/bpemb/si) \u30fb \n[sk (Slovak)](http://nlp.h-its.org/bpemb/sk) \u30fb \n[sl (Slovenian)](http://nlp.h-its.org/bpemb/sl) \u30fb \n[sm (Samoan)](http://nlp.h-its.org/bpemb/sm) \u30fb \n[sn (Shona)](http://nlp.h-its.org/bpemb/sn) \u30fb \n[so (Somali)](http://nlp.h-its.org/bpemb/so) \u30fb \n[sq (Albanian)](http://nlp.h-its.org/bpemb/sq) \u30fb \n[sr (Serbian)](http://nlp.h-its.org/bpemb/sr) \u30fb \n[srn (Sranan Tongo)](http://nlp.h-its.org/bpemb/srn) \u30fb \n[ss (Swati)](http://nlp.h-its.org/bpemb/ss) \u30fb \n[st (Southern Sotho)](http://nlp.h-its.org/bpemb/st) \u30fb \n[stq (Saterfriesisch)](http://nlp.h-its.org/bpemb/stq) \u30fb \n[su (Sundanese)](http://nlp.h-its.org/bpemb/su) \u30fb \n[sv (Swedish)](http://nlp.h-its.org/bpemb/sv) \u30fb \n[sw (Swahili)](http://nlp.h-its.org/bpemb/sw) \u30fb \n[szl (Silesian)](http://nlp.h-its.org/bpemb/szl)\n\n[ta (Tamil)](http://nlp.h-its.org/bpemb/ta) \u30fb \n[tcy (Tulu)](http://nlp.h-its.org/bpemb/tcy) \u30fb \n[te (Telugu)](http://nlp.h-its.org/bpemb/te) \u30fb \n[tet (Tetum)](http://nlp.h-its.org/bpemb/tet) \u30fb \n[tg (Tajik)](http://nlp.h-its.org/bpemb/tg) \u30fb \n[th (Thai)](http://nlp.h-its.org/bpemb/th) \u30fb \n[ti (Tigrinya)](http://nlp.h-its.org/bpemb/ti) \u30fb \n[tk (Turkmen)](http://nlp.h-its.org/bpemb/tk) \u30fb \n[tl (Tagalog)](http://nlp.h-its.org/bpemb/tl) \u30fb \n[tn (Tswana)](http://nlp.h-its.org/bpemb/tn) \u30fb \n[to (Tonga)](http://nlp.h-its.org/bpemb/to) \u30fb \n[tpi (Tok Pisin)](http://nlp.h-its.org/bpemb/tpi) \u30fb \n[tr (Turkish)](http://nlp.h-its.org/bpemb/tr) \u30fb \n[ts (Tsonga)](http://nlp.h-its.org/bpemb/ts) \u30fb \n[tt (Tatar)](http://nlp.h-its.org/bpemb/tt) \u30fb \n[tum (Tumbuka)](http://nlp.h-its.org/bpemb/tum) \u30fb \n[tw (Twi)](http://nlp.h-its.org/bpemb/tw) \u30fb \n[ty (Tahitian)](http://nlp.h-its.org/bpemb/ty) \u30fb \n[tyv (Tuvinian)](http://nlp.h-its.org/bpemb/tyv)\n\n[udm (Udmurt)](http://nlp.h-its.org/bpemb/udm) \u30fb \n[ug (Uighur)](http://nlp.h-its.org/bpemb/ug) \u30fb \n[uk (Ukrainian)](http://nlp.h-its.org/bpemb/uk) \u30fb \n[ur (Urdu)](http://nlp.h-its.org/bpemb/ur) \u30fb \n[uz (Uzbek)](http://nlp.h-its.org/bpemb/uz)\n\n[ve (Venda)](http://nlp.h-its.org/bpemb/ve) \u30fb \n[vec (Venetian)](http://nlp.h-its.org/bpemb/vec) \u30fb \n[vep (Veps)](http://nlp.h-its.org/bpemb/vep) \u30fb \n[vi (Vietnamese)](http://nlp.h-its.org/bpemb/vi) \u30fb \n[vls (Vlaams)](http://nlp.h-its.org/bpemb/vls) \u30fb \n[vo (Volap\u00fck)](http://nlp.h-its.org/bpemb/vo)\n\n[wa (Walloon)](http://nlp.h-its.org/bpemb/wa) \u30fb \n[war (Waray)](http://nlp.h-its.org/bpemb/war) \u30fb \n[wo (Wolof)](http://nlp.h-its.org/bpemb/wo) \u30fb \n[wuu (Wu Chinese)](http://nlp.h-its.org/bpemb/wuu)\n\n[xal (Kalmyk)](http://nlp.h-its.org/bpemb/xal) \u30fb \n[xh (Xhosa)](http://nlp.h-its.org/bpemb/xh) \u30fb \n[xmf (Mingrelian)](http://nlp.h-its.org/bpemb/xmf)\n\n[yi (Yiddish)](http://nlp.h-its.org/bpemb/yi) \u30fb \n[yo (Yoruba)](http://nlp.h-its.org/bpemb/yo)\n\n[za (Zhuang)](http://nlp.h-its.org/bpemb/za) \u30fb \n[zea (Zeeuws)](http://nlp.h-its.org/bpemb/zea) \u30fb \n[zh (Chinese)](http://nlp.h-its.org/bpemb/zh) \u30fb \n[zu (Zulu)](http://nlp.h-its.org/bpemb/zu)\n\n## MultiBPEmb\n\n[multi (multilingual)](http://nlp.h-its.org/bpemb/multi)\n\n## Citing BPEmb\n\nIf you use BPEmb in academic work, please cite:\n\n```\n@InProceedings{heinzerling2018bpemb,\n author = {Benjamin Heinzerling and Michael Strube},\n title = \"{BPEmb: Tokenization-free Pre-trained Subword Embeddings in 275 Languages}\",\n booktitle = {Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)},\n year = {2018},\n month = {May 7-12, 2018},\n address = {Miyazaki, Japan},\n editor = {Nicoletta Calzolari (Conference chair) and Khalid Choukri and Christopher Cieri and Thierry Declerck and Sara Goggi and Koiti Hasida and Hitoshi Isahara and Bente Maegaard and Joseph Mariani and H\u00e9l\u00e8ne Mazo and Asuncion Moreno and Jan Odijk and Stelios Piperidis and Takenobu Tokunaga},\n publisher = {European Language Resources Association (ELRA)},\n isbn = {979-10-95546-00-9},\n language = {english}\n }\n```\n\n\n", "description_content_type": "text/markdown", "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://nlp.h-its.org/bpemb", "keywords": "", "license": "MIT", "maintainer": "", "maintainer_email": "", "name": "bpemb", "package_url": "https://pypi.org/project/bpemb/", "platform": "", "project_url": "https://pypi.org/project/bpemb/", "project_urls": { "Homepage": "https://nlp.h-its.org/bpemb" }, "release_url": "https://pypi.org/project/bpemb/0.3.0/", "requires_dist": [ "gensim", "numpy", "requests", "sentencepiece", "tqdm" ], "requires_python": "", "summary": "Byte-pair embeddings in 275 languages", "version": "0.3.0" }, "last_serial": 5395764, "releases": { "0.2.10": [ { "comment_text": "", "digests": { "md5": "66132fee0277c3361fd8e61beb38671a", "sha256": "32fb94305748b60c4a1bb5dd03ea334e5b99e017715cef812ba2379abd1432c1" }, "downloads": -1, "filename": "bpemb-0.2.10-py3-none-any.whl", "has_sig": false, "md5_digest": "66132fee0277c3361fd8e61beb38671a", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 19010, "upload_time": "2019-02-09T13:24:27", "url": "https://files.pythonhosted.org/packages/09/a3/004c3e96c28aeacef3ad3aad1358911526da054074fb0c561e0620ebf13a/bpemb-0.2.10-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "aff6d4d6d87b91a74a6711786b912e49", "sha256": "6896c89e49f1cb3255314fb589c054146653616d5544db631f0070bf09d95f96" }, "downloads": -1, "filename": "bpemb-0.2.10.tar.gz", "has_sig": false, "md5_digest": "aff6d4d6d87b91a74a6711786b912e49", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 23268, "upload_time": "2019-02-09T13:24:30", "url": "https://files.pythonhosted.org/packages/2a/7b/2d2444a9778510742e77ed3c082b08331cc092c1be12c774ec871bfc0b4d/bpemb-0.2.10.tar.gz" } ], "0.2.11": [ { "comment_text": "", "digests": { "md5": "5a91b45a129d035d86d9aa8403ddc7d2", "sha256": "f410569d4ec3e8fdfe635f479b55ea2b6da40ef0c8c86d31630524f6414e54cb" }, "downloads": -1, "filename": "bpemb-0.2.11-py3-none-any.whl", "has_sig": false, "md5_digest": "5a91b45a129d035d86d9aa8403ddc7d2", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 19290, "upload_time": "2019-02-25T22:12:32", "url": "https://files.pythonhosted.org/packages/fe/d5/229f4d1a8de7a08a34d3b205bf82f6487f3b624c665d3de58e797fba2a9f/bpemb-0.2.11-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "d866787c0506b5115b7a4b150efe16a3", "sha256": "aeaed41408c6f5842ad0071d661e040be735f4c1ef7902c0058609edf7f91564" }, "downloads": -1, "filename": "bpemb-0.2.11.tar.gz", "has_sig": false, "md5_digest": "d866787c0506b5115b7a4b150efe16a3", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 23556, "upload_time": "2019-02-25T22:12:35", "url": "https://files.pythonhosted.org/packages/45/5d/c4a00715552a8bc291a41bcf6508d8c9db382c2cef1bbd05613fe71a4628/bpemb-0.2.11.tar.gz" } ], "0.2.12": [ { "comment_text": "", "digests": { "md5": "166ba2b70151af5e315ba1598d896a46", "sha256": "ae6368e0ea315ebd0466833cb9f938fa0874153cd6bd89179bec08f821cc4955" }, "downloads": -1, "filename": "bpemb-0.2.12-py3-none-any.whl", "has_sig": false, "md5_digest": "166ba2b70151af5e315ba1598d896a46", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 19403, "upload_time": "2019-04-30T05:35:19", "url": "https://files.pythonhosted.org/packages/57/90/8760eaa97c5a2f676f3f350fd43e79f8d9e4f9c42362c62f733e81e37d33/bpemb-0.2.12-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "822346fed3ed3a01fbbf761462058623", "sha256": "f7aa00c3008c61956c02d08a9b7c4e133b3869894f9f656d2e852b83119869a7" }, "downloads": -1, "filename": "bpemb-0.2.12.tar.gz", "has_sig": false, "md5_digest": "822346fed3ed3a01fbbf761462058623", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 23657, "upload_time": "2019-04-30T05:35:30", "url": "https://files.pythonhosted.org/packages/13/a8/e45f2eac4649e6c433b85442a5db7cfc81c9dc3884a063af1151642c623f/bpemb-0.2.12.tar.gz" } ], "0.2.7": [ { "comment_text": "", "digests": { "md5": "52c9b50b1ceaf5158bf8975a49f5723f", "sha256": "29ecd4d5d26d0f77f2a500877ba363a4bf2d09c99a4401b892a8f1a5d7e39370" }, "downloads": -1, "filename": "bpemb-0.2.7-py3-none-any.whl", "has_sig": false, "md5_digest": "52c9b50b1ceaf5158bf8975a49f5723f", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 23797, "upload_time": "2018-11-19T21:48:03", "url": "https://files.pythonhosted.org/packages/2b/a3/a625dad491ee0565eb725ed6cbeb9b695c2bff07ce0f9058dea6811868f6/bpemb-0.2.7-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "852725e696c16b1b76578f66c8736002", "sha256": "4f125304d1a8c99d8430eb95e71da3bfc4743ff781e2f4307d19332adc2d830b" }, "downloads": -1, "filename": "bpemb-0.2.7.tar.gz", "has_sig": false, "md5_digest": "852725e696c16b1b76578f66c8736002", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 23835, "upload_time": "2018-11-19T21:48:04", "url": "https://files.pythonhosted.org/packages/e9/eb/0b13c2e0338dd9914bfdc4e93020bd4945ddb10a489b80416371e348de5a/bpemb-0.2.7.tar.gz" } ], "0.2.8": [ { "comment_text": "", "digests": { "md5": "377b4b8ff27dcb42a8f5bfac3189f8c8", "sha256": "c37987efe5b3067a15770415047add981850d82190e9e2a2f9af56f2e4b8593f" }, "downloads": -1, "filename": "bpemb-0.2.8-py3-none-any.whl", "has_sig": false, "md5_digest": "377b4b8ff27dcb42a8f5bfac3189f8c8", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 18781, "upload_time": "2018-12-01T12:02:22", "url": "https://files.pythonhosted.org/packages/40/18/3716da26d010af677d208f82a4ec9dec1b278fa3263074da93dddac6d562/bpemb-0.2.8-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "5bb1194c554fe4a519d3d2cb5bb22fcd", "sha256": "4db76ca4319d0cb1e15a9bad006e5664c2e7a2f4cdd51f9528da0822607d86f2" }, "downloads": -1, "filename": "bpemb-0.2.8.tar.gz", "has_sig": false, "md5_digest": "5bb1194c554fe4a519d3d2cb5bb22fcd", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 23726, "upload_time": "2018-12-01T12:02:24", "url": "https://files.pythonhosted.org/packages/bf/2a/1c37de721c4b245a6a3f33d1f80df6e31d0245637ddfe88136bc9bf7abbe/bpemb-0.2.8.tar.gz" } ], "0.2.9": [ { "comment_text": "", "digests": { "md5": "65eaa83ac7460b93497e11c56e5b5c9e", "sha256": "5eb90e7a62509ec4ae9c2954986941ad405db5d2b9f3725be5029fa52cc866ca" }, "downloads": -1, "filename": "bpemb-0.2.9-py3-none-any.whl", "has_sig": false, "md5_digest": "65eaa83ac7460b93497e11c56e5b5c9e", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 18752, "upload_time": "2018-12-01T14:00:44", "url": "https://files.pythonhosted.org/packages/be/52/c7b1062477416c7cfb66ff9b9abb3872105b613c38af2594fb7644a09988/bpemb-0.2.9-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "48abddf6afe4ffff12d429889afdb88a", "sha256": "a6380b22cdb2d3e6ff26fe1a64fffab2bd709cb2764466874cf741d05fec6d31" }, "downloads": -1, "filename": "bpemb-0.2.9.tar.gz", "has_sig": false, "md5_digest": "48abddf6afe4ffff12d429889afdb88a", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 23689, "upload_time": "2018-12-01T14:00:46", "url": "https://files.pythonhosted.org/packages/24/63/f7432313153924eadc5e4a5c2d2ceca6306d6443aff1a1978f3a5a24b7e4/bpemb-0.2.9.tar.gz" } ], "0.3.0": [ { "comment_text": "", "digests": { "md5": "4e515f6d9681d8c91e8cc33656acf893", "sha256": "fc921e287f336fa3144051e79efe8388b91369640b393c40a3c86d0cb98d4f93" }, "downloads": -1, "filename": "bpemb-0.3.0-py3-none-any.whl", "has_sig": false, "md5_digest": "4e515f6d9681d8c91e8cc33656acf893", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 19503, "upload_time": "2019-06-13T12:45:27", "url": "https://files.pythonhosted.org/packages/bc/70/468a9652095b370f797ed37ff77e742b11565c6fd79eaeca5f2e50b164a7/bpemb-0.3.0-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "ac309695c76fc4baa3bc452fb562cd75", "sha256": "dd6ba916560ab1c79df2699f7a3002091a31957e2661a770cb92ac8febfc06ca" }, "downloads": -1, "filename": "bpemb-0.3.0.tar.gz", "has_sig": false, "md5_digest": "ac309695c76fc4baa3bc452fb562cd75", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 23799, "upload_time": "2019-06-13T12:45:29", "url": "https://files.pythonhosted.org/packages/8f/c8/221481af07cb44c8e281f2e2e694c12baad3b85226fb9e419e4f9af2793c/bpemb-0.3.0.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "4e515f6d9681d8c91e8cc33656acf893", "sha256": "fc921e287f336fa3144051e79efe8388b91369640b393c40a3c86d0cb98d4f93" }, "downloads": -1, "filename": "bpemb-0.3.0-py3-none-any.whl", "has_sig": false, "md5_digest": "4e515f6d9681d8c91e8cc33656acf893", "packagetype": "bdist_wheel", "python_version": "py3", "requires_python": null, "size": 19503, "upload_time": "2019-06-13T12:45:27", "url": "https://files.pythonhosted.org/packages/bc/70/468a9652095b370f797ed37ff77e742b11565c6fd79eaeca5f2e50b164a7/bpemb-0.3.0-py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "ac309695c76fc4baa3bc452fb562cd75", "sha256": "dd6ba916560ab1c79df2699f7a3002091a31957e2661a770cb92ac8febfc06ca" }, "downloads": -1, "filename": "bpemb-0.3.0.tar.gz", "has_sig": false, "md5_digest": "ac309695c76fc4baa3bc452fb562cd75", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 23799, "upload_time": "2019-06-13T12:45:29", "url": "https://files.pythonhosted.org/packages/8f/c8/221481af07cb44c8e281f2e2e694c12baad3b85226fb9e419e4f9af2793c/bpemb-0.3.0.tar.gz" } ] }