{ "info": { "author": "Datos Argentina", "author_email": "datos@modernizacion.gob.ar", "bugtrack_url": null, "classifiers": [ "Development Status :: 2 - Pre-Alpha", "Intended Audience :: Developers", "License :: OSI Approved :: MIT License", "Natural Language :: Spanish", "Programming Language :: Python :: 2", "Programming Language :: Python :: 2.7" ], "description": "# Analizador de Textos\n\n[![Coverage Status](https://coveralls.io/repos/github/datosgobar/textar/badge.svg?branch=master)](https://coveralls.io/github/datosgobar/textar?branch=master)\n[![Build Status](https://travis-ci.org/datosgobar/textar.svg?branch=master)](https://travis-ci.org/datosgobar/textar)\n[![PyPI](https://badge.fury.io/py/textar.svg)](http://badge.fury.io/py/textar)\n[![Stories in Ready](https://badge.waffle.io/datosgobar/textar.png?label=ready&title=Ready)](https://waffle.io/datosgobar/textar)\n[![Documentation Status](http://readthedocs.org/projects/textar/badge/?version=latest)](http://textar.readthedocs.org/en/latest/?badge=latest)\n\nPaquete en python para an\u00e1lisis, clasificaci\u00f3n y recuperaci\u00f3n de textos, utilizado por el equipo de Datos Argentina.\n\n\n\n\n- [Instalaci\u00f3n](#instalaci%C3%B3n)\n - [Dependencias](#dependencias)\n - [Desde pypi](#desde-pypi)\n - [Para desarrollo](#para-desarrollo)\n- [Uso](#uso)\n - [B\u00fasqueda de textos similares](#b%C3%BAsqueda-de-textos-similares)\n - [Clasificaci\u00f3n de textos](#clasificaci%C3%B3n-de-textos)\n- [Tests](#tests)\n- [Cr\u00e9ditos](#cr%C3%A9ditos)\n- [Contacto](#contacto)\n\n\n\n* Licencia: MIT license\n\n## Instalaci\u00f3n\n\n### Dependencias\n\n`textar` usa `pandas`, `numpy`, `scikit-learn` y `scipy`. Para que funcionen, se requiere instalar algunas dependencias no pythonicas:\n\n* En Ubuntu: `sudo apt-get install libblas-dev liblapack-dev libatlas-base-dev gfortran`\n\n### Desde pypi\n\n`pip install textar`\n\n### Para desarrollo\n\n```\ngit clone https://www.github.com/datosgobar/textar.git\ncd path/to/textar\npip install -e .\n```\n\nCualquier cambio en el c\u00f3digo est\u00e1 disponible en el entorno virtual donde fue instalado de esta manera.\n\n## Uso\n\n### B\u00fasqueda de textos similares\n\n```python\nfrom textar import TextClassifier\n\ntc = TextClassifier(\n texts=[\n \"El \u00e1rbol del edificio moderno tiene manzanas\",\n \"El \u00e1rbol m\u00e1s chico tiene muchas mandarinas naranjas, y est\u00e1 cerca del monumento antiguo\",\n \"El edificio m\u00e1s antiguo tiene muchos cuadros caros porque era de un multimillonario\",\n \"El edificio m\u00e1s moderno tiene muchas programadoras que comen manzanas durante el almuerzo grupal\"\n ],\n ids=map(str, range(4))\n)\n\nids, distancias, palabras_comunes = tc.get_similar(\n example=\"Me encontr\u00e9 muchas manzanas en el edificio\", \n max_similars=4\n)\n\nprint ids\n['0', '3', '2', '1']\n\nprint distancias\n[0.92781458944579009, 1.0595805639371083, 1.1756638126839645, 1.3206413200640157]\n\nprint palabras_comunes\n[[u'edificio', u'manzanas'], [u'edificio', u'muchas', u'manzanas'], [u'edificio', u'muchas'], [u'muchas']]\n```\n\n### Clasificaci\u00f3n de textos\n\n```python\nfrom textar import TextClassifier\n\ntc = TextClassifier(\n texts=[\n \"Para hacer una pizza hace falta harina, tomate, queso y jam\u00f3n\",\n \"Para hacer unas empanadas necesitamos tapas de empanadas, tomate, jam\u00f3n y queso\",\n \"Para hacer un daiquiri necesitamos ron, una fruta y un poco de lim\u00f3n\",\n \"Para hacer un cuba libre necesitamos coca, ron y un poco de lim\u00f3n\",\n \"Para hacer una torta de naranja se necesita harina, huevos, leche, ralladura de naranja y polvo de hornear\",\n \"Para hacer un lemon pie se necesita crema, ralladura de lim\u00f3n, huevos, leche y harina\"\n ],\n ids=map(str, range(6))\n)\n\n# entrena un clasificador\ntc.make_classifier(\n name=\"recetas_classifier\",\n ids=map(str, range(6)),\n labels=[\"Comida\", \"Comida\", \"Trago\", \"Trago\", \"Postre\", \"Postre\"]\n)\n\nlabels_considerados, puntajes = tc.classify(\n classifier_name=\"recetas_classifier\", \n examples=[\n \"Para hacer un bizcochuelo de chocolate se necesita harina, huevos, leche y chocolate negro\",\n \"Para hacer un sanguche de miga necesitamos pan, jam\u00f3n y queso\"\n ]\n)\n\nprint labels_considerados\narray(['Comida', 'Postre', 'Trago'], dtype='|S6')\n\nprint puntajes\narray([[-3.52493526, 5.85536809, -6.05497008],\n [ 2.801027 , -6.55619473, -3.39598721]])\n\n# el primer ejemplo es un postre\nprint sorted(zip(puntajes[0], labels_considerados), reverse=True)\n[(5.8553680868184079, 'Postre'),\n (-3.5249352611212568, 'Comida'),\n (-6.0549700786502845, 'Trago')]\n\n# el segundo ejemplo es una comida\nprint sorted(zip(puntajes[1], labels_considerados), reverse=True)\n[(2.8010269985828997, 'Comida'),\n (-3.3959872063363505, 'Trago'),\n (-6.5561947275785393, 'Postre')]\n```\n\n## Tests\n\nLos tests s\u00f3lo se pueden correr habiendo clonado el repo. Luego instalar las dependencias de desarrollo:\n\n`pip install -r requirements_dev.txt`\n\ny correr los tests:\n\n`nosetests`\n\n## Cr\u00e9ditos\n\n* [Victor Lavrenko](http://homepages.inf.ed.ac.uk/vlavrenk/) nos ayud\u00f3 a entender el problema con sus explicaciones en youtube: https://www.youtube.com/user/victorlavrenko\n\n## Contacto\n\nTe invitamos a [crearnos un issue](https://github.com/datosgobar/textar/issues/new?title=Encontr\u00e9 un bug en textar) en caso de que encuentres alg\u00fan bug o tengas feedback de alguna parte de `textar`.\n\nPara todo lo dem\u00e1s, pod\u00e9s mandarnos tu comentario o consulta a [datos@modernizacion.gob.ar](mailto:datos@modernizacion.gob.ar).\n\n\nHistory\n===\n\n0.0.6 (2017-09-25)\n------------------\n\n* Arreglo de bugs en las palabras destacadas de los resultados sugeridos.\n\n\n0.0.5 (2017-07-14)\n------------------\n\n* Mejoras en la forma en que se seleccionan las palabras destacadas de la busqueda\n* Correcciones a los tests correspondientes\n\n0.0.4 (2016-11-25)\n------------------\n\n* Correcciones a los tests\n* Revisi\u00f3n de la documentaci\u00f3n\n\n0.0.1 (2016-11-22)\n------------------\n\n* First release on PyPI.\n\n\n", "description_content_type": null, "docs_url": null, "download_url": "", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/datosgobar/textar", "keywords": "textar", "license": "MIT license", "maintainer": "", "maintainer_email": "", "name": "textar", "package_url": "https://pypi.org/project/textar/", "platform": "", "project_url": "https://pypi.org/project/textar/", "project_urls": { "Homepage": "https://github.com/datosgobar/textar" }, "release_url": "https://pypi.org/project/textar/0.0.6/", "requires_dist": [ "numpy", "pandas", "scikit-learn", "scipy" ], "requires_python": "", "summary": "Paquete en python para an\u00e1lisis, clasificaci\u00f3n y recuperaci\u00f3n de textos, utilizado por el equipo de Datos Argentina.", "version": "0.0.6" }, "last_serial": 3201850, "releases": { "0.0.1": [ { "comment_text": "", "digests": { "md5": "3582dc1afc3348f65d5a05f4fcc7469d", "sha256": "fb47c7dd69f267f8b9d3f29a558cb7e1c13b9d8cb7d002439e88567909134c8b" }, "downloads": -1, "filename": "textar-0.0.1-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "3582dc1afc3348f65d5a05f4fcc7469d", "packagetype": "bdist_wheel", "python_version": "py2.py3", "requires_python": null, "size": 8749, "upload_time": "2016-11-24T13:16:26", "url": "https://files.pythonhosted.org/packages/5c/86/f3516dc3d95774543f33d2d38b0fdaa64c6d5b278391b5996114584743ac/textar-0.0.1-py2.py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "8e63a8192d78aa23dbd2cd0ab9ee7599", "sha256": "703e6590ee1e42f97fce42236f530391c306e40bf0ae663948af42edf1f915a6" }, "downloads": -1, "filename": "textar-0.0.1.tar.gz", "has_sig": false, "md5_digest": "8e63a8192d78aa23dbd2cd0ab9ee7599", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 16716, "upload_time": "2016-11-24T13:16:29", "url": "https://files.pythonhosted.org/packages/45/57/ba3a1e56fe4a74d45d1afc477e1131c5e2c5736dad07b2c0b07eec82cf06/textar-0.0.1.tar.gz" } ], "0.0.3": [ { "comment_text": "", "digests": { "md5": "09afd4b2c3df111b71219a0fbe7179b8", "sha256": "8f59998d955b8735d469cde6a88c2bcdaec607a2d2c2e0cc48f9aeaf353b4e59" }, "downloads": -1, "filename": "textar-0.0.3-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "09afd4b2c3df111b71219a0fbe7179b8", "packagetype": "bdist_wheel", "python_version": "py2.py3", "requires_python": null, "size": 12197, "upload_time": "2016-11-25T11:50:09", "url": "https://files.pythonhosted.org/packages/3d/ec/539293ff36a30f3acbaca15fbcbfb0ae7da6b8363ca1d49c7dd17d116d59/textar-0.0.3-py2.py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "0ff19ff8a753201727fb17abd97b70c1", "sha256": "d5ae4469b7dce9b1a55e1b43dd9fd4517b3446f25a8d4e98c47fbbe9b22e3007" }, "downloads": -1, "filename": "textar-0.0.3.tar.gz", "has_sig": false, "md5_digest": "0ff19ff8a753201727fb17abd97b70c1", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 17461, "upload_time": "2016-11-25T11:50:11", "url": "https://files.pythonhosted.org/packages/d1/cd/de329a4399a0d9bbb9fdd3c34f823c05f279162b5b4609e3a1b1ee7a43ef/textar-0.0.3.tar.gz" } ], "0.0.4": [ { "comment_text": "", "digests": { "md5": "228bcb388dff1f5fe070376ced607a33", "sha256": "33541fda461697c17d84dbb051f9232a0a2486cb553572d06ddc4f185b770fde" }, "downloads": -1, "filename": "textar-0.0.4-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "228bcb388dff1f5fe070376ced607a33", "packagetype": "bdist_wheel", "python_version": "py2.py3", "requires_python": null, "size": 12379, "upload_time": "2016-11-25T15:48:58", "url": "https://files.pythonhosted.org/packages/01/56/a7b37fc9145936b9a6c1f419b9641951ffd080919014d6863d64c3e3b8c4/textar-0.0.4-py2.py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "b3da2d41bea90029593a5e6fbe004300", "sha256": "b4523826ae6fe561a219e0f4f4ee0c8199ce95fe93b40d49b9da8ff8c86dbff5" }, "downloads": -1, "filename": "textar-0.0.4.tar.gz", "has_sig": false, "md5_digest": "b3da2d41bea90029593a5e6fbe004300", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 16537, "upload_time": "2016-11-25T15:49:00", "url": "https://files.pythonhosted.org/packages/64/88/ee88c450e1b247d6123c890841ed4c45036aae3d12823e3afea05123da86/textar-0.0.4.tar.gz" } ], "0.0.6": [ { "comment_text": "", "digests": { "md5": "021bfaba6732549b8b2a3e84fa33d681", "sha256": "53bb22655dbc1a5c4e642b0b784623367937b6c49bc34c14be4c972283d647b8" }, "downloads": -1, "filename": "textar-0.0.6-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "021bfaba6732549b8b2a3e84fa33d681", "packagetype": "bdist_wheel", "python_version": "py2.py3", "requires_python": null, "size": 12904, "upload_time": "2017-09-25T19:04:46", "url": "https://files.pythonhosted.org/packages/6d/0a/814314fa4a3bcbae7326d85bda7a90dcaf68f87164674828fe8a92b33818/textar-0.0.6-py2.py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "fdaf4167835edffe227d8581258cc86d", "sha256": "7391f6ad1a42aec310882a850edbd55095cc31931bc43fe590815641f0ad5257" }, "downloads": -1, "filename": "textar-0.0.6.tar.gz", "has_sig": false, "md5_digest": "fdaf4167835edffe227d8581258cc86d", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 16819, "upload_time": "2017-09-25T19:04:48", "url": "https://files.pythonhosted.org/packages/73/2a/6179e648e83da90f86dfc90022f3bfe68b9898923e04ac56bcd12c3e242b/textar-0.0.6.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "021bfaba6732549b8b2a3e84fa33d681", "sha256": "53bb22655dbc1a5c4e642b0b784623367937b6c49bc34c14be4c972283d647b8" }, "downloads": -1, "filename": "textar-0.0.6-py2.py3-none-any.whl", "has_sig": false, "md5_digest": "021bfaba6732549b8b2a3e84fa33d681", "packagetype": "bdist_wheel", "python_version": "py2.py3", "requires_python": null, "size": 12904, "upload_time": "2017-09-25T19:04:46", "url": "https://files.pythonhosted.org/packages/6d/0a/814314fa4a3bcbae7326d85bda7a90dcaf68f87164674828fe8a92b33818/textar-0.0.6-py2.py3-none-any.whl" }, { "comment_text": "", "digests": { "md5": "fdaf4167835edffe227d8581258cc86d", "sha256": "7391f6ad1a42aec310882a850edbd55095cc31931bc43fe590815641f0ad5257" }, "downloads": -1, "filename": "textar-0.0.6.tar.gz", "has_sig": false, "md5_digest": "fdaf4167835edffe227d8581258cc86d", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 16819, "upload_time": "2017-09-25T19:04:48", "url": "https://files.pythonhosted.org/packages/73/2a/6179e648e83da90f86dfc90022f3bfe68b9898923e04ac56bcd12c3e242b/textar-0.0.6.tar.gz" } ] }