{ "info": { "author": "Daniel Mondaca", "author_email": "daniel@analitic.cl", "bugtrack_url": null, "classifiers": [ "Development Status :: 4 - Beta", "License :: OSI Approved :: Apache Software License", "Operating System :: POSIX :: Linux", "Programming Language :: Python", "Topic :: Software Development" ], "description": "Framework \u201cpyscrap\u201d\n\nInstalaci\u00f3n (debian/ubuntu):\n\nsudo apt-get install libxml2-dev\n\nsudo apt-get install libxslt1-dev \n\nsudo apt-get install python2.6-dev\n\npip install pyscrap\n\nCaracter\u00edsticas:\n\nInspirado en \u201cscrapy\u201d; uno de los frameworks de webscraping m\u00e1s avanzados existentes a la fecha, pero m\u00e1s simple y ajustado a necesidades espec\u00edficas, a saber:\n*Scripts de extracci\u00f3n instalables como paquetes\n**Compatible entre sistemas basados en unix\n*M\u00f3dulos ejecutables\n*Separaci\u00f3n completa entre la capa de extracci\u00f3n y la de base de datos.\n*Ejecuci\u00f3n como proceso normal.\n\nEnfocado en el desarrollo r\u00e1pido, una vez instalado, se usa el comando \u201cwscrap\u201d para crear un proyecto nuevo con un demo funcional. \n\n\n*: A diferencia de scrapy.\n**: Incluye mac, excluye windows.\nLa estructura de cada proyecto \u201cwscrap\u201d corresponde a:\n\nejemplo/\n\u251c\u2500\u2500 ejemplo\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 exampleSpider.py\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 __init__.py\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 item.py\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 pipeline.py\n\u2502\u00a0\u00a0 \u2514\u2500\u2500 settings.py\n\u251c\u2500\u2500 MANIFEST.in\n\u251c\u2500\u2500 README.txt\n\u2514\u2500\u2500 setup.py\n\nDonde:\n\u201cexampleSpider.py\u201d corresponde al script de extracci\u00f3n. \n\u201citem.py\u201d corresponde a la definici\u00f3n de cada item a extraer y a cada conjunto de items con sus campos respectivos; un item por ejemplo podr\u00eda ser un comentario que contenga su texto y autor y el conjunto de comentarios podr\u00eda contener la url desde donde se extrayeron.\n \u201cpipeline.py\n\u201d contiene las funciones que se usar\u00e1n para almacenar o procesar cada item o cada conjunto de items definidos en \u201citem.py\u201d, adem\u00e1s de una funci\u00f3n pensada para obtener urls invocable desde dentro del script del script de extraci\u00f3n como m\u00e9todo est\u00e1tico (getUrls) sin importarla.\n\u201csettings.py\u201d contiene las relaciones sobre qu\u00e9 funci\u00f3n de \u201cpipeline.py\u201d se usar\u00e1 con qu\u00e9 item o conjunto de items de \u201citem.py\u201d adem\u00e1s de los headers que se usar\u00e1n por defecto al descargar una url.\n\u201cREADME.txt\n\u201d puede contener una descripci\u00f3n del proyecto.\n\u201csetup.py\n\u201d es el script est\u00e1ndar usado en python para empaquetar o instalar proyectos, aqu\u00ed se definen las dependencias entre otras cosas.\nLos dem\u00e1s archivos son necesarios para el empaquetamiento y en general no necesitan ser modificados.\n\nUn proyecto puede instalarse con:\n\npython setup.py install\n\nO instalarse como una referencia para seguir desarrollandolo sin moverlo con:\n\npython setup.py develop\n\n\nFuncionamiento: \n\nCada clase de extracci\u00f3n debe heredar desde la clase \u201cspider\u201d del framework, donde cada item retornado por la funci\u00f3n \u201cparse\u201d de dicha clase ser\u00e1 procesado de acuerdo a la configuraci\u00f3n establecida en \u201csettings.py\u201d usando las funciones definidas en \u201cpipeline.py\n\u201d. El proceso es transparente para el programador y se consigue gracias a la metaclase especial que usa \u201cspider\u201d. \n\nEn a pr\u00e1ctica, esto significa que no es necesario importar nada fuera del archivo \u201citem.py\u201d en el script de extracci\u00f3n; vale decir, la capa de extracci\u00f3n es independiente de la de almacenamiento.", "description_content_type": null, "docs_url": "https://pythonhosted.org/pyscrap/", "download_url": "UNKNOWN", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "http://github.com/Nievous/pyscrap", "keywords": "web scraping", "license": "Apache 2.0 License", "maintainer": null, "maintainer_email": null, "name": "pyscrap", "package_url": "https://pypi.org/project/pyscrap/", "platform": "UNKNOWN", "project_url": "https://pypi.org/project/pyscrap/", "project_urls": { "Download": "UNKNOWN", "Homepage": "http://github.com/Nievous/pyscrap" }, "release_url": "https://pypi.org/project/pyscrap/0.0.9/", "requires_dist": null, "requires_python": null, "summary": "micro framework for web scraping", "version": "0.0.9" }, "last_serial": 983598, "releases": { "0.0.1": [ { "comment_text": "", "digests": { "md5": "8ea652e847f75f0ec655d82257ea3b2e", "sha256": "fd39c94651d980267bd95de368bc5909f5940214c9baeedaf33fedd873c7a123" }, "downloads": -1, "filename": "pyscrap-0.0.1.tar.gz", "has_sig": false, "md5_digest": "8ea652e847f75f0ec655d82257ea3b2e", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 7757, "upload_time": "2012-07-28T00:03:36", "url": "https://files.pythonhosted.org/packages/c7/26/cdc46a01a27c768fc5760b1c74fc058bc4503d601fdcc3c8b064865bf887/pyscrap-0.0.1.tar.gz" } ], "0.0.2": [ { "comment_text": "", "digests": { "md5": "a19261c5a94ae37e352b433545fddccd", "sha256": "935a6ef2e05059265530cf6efcb56e00c3cdfb1372821c3663f3039ffb1aca39" }, "downloads": -1, "filename": "pyscrap-0.0.2.tar.gz", "has_sig": false, "md5_digest": "a19261c5a94ae37e352b433545fddccd", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 7811, "upload_time": "2012-07-31T21:56:27", "url": "https://files.pythonhosted.org/packages/5b/d3/583b33763bdea6dbf0f81cb4fbe6fd970d2c6c249c2f1ddf4ba51841e430/pyscrap-0.0.2.tar.gz" } ], "0.0.3": [ { "comment_text": "", "digests": { "md5": "934d4bd39897d6302dfbe91227299a6d", "sha256": "0bed350fccfb21fb3f47fc08087fca494ee93cb9be53f5a3c14191abf526ca93" }, "downloads": -1, "filename": "pyscrap-0.0.3.tar.gz", "has_sig": false, "md5_digest": "934d4bd39897d6302dfbe91227299a6d", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 8534, "upload_time": "2012-08-01T00:25:09", "url": "https://files.pythonhosted.org/packages/50/08/9733850d145158391c626989a5686417f3b029266833bb4b31f07625a78a/pyscrap-0.0.3.tar.gz" } ], "0.0.4": [], "0.0.5": [ { "comment_text": "", "digests": { "md5": "30b6dac79da35847bf46fd40a3b387a3", "sha256": "4c87115501a97adc82570f1ffed53dd5446bcd2730ba0e4971055e4e11357f12" }, "downloads": -1, "filename": "pyscrap-0.0.5.tar.gz", "has_sig": false, "md5_digest": "30b6dac79da35847bf46fd40a3b387a3", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 8335, "upload_time": "2013-05-17T16:30:29", "url": "https://files.pythonhosted.org/packages/ba/b7/776a76f67bef3a795580f9ad11c09d424e484efab8b6d99a684bc2bb0b7f/pyscrap-0.0.5.tar.gz" } ], "0.0.6": [ { "comment_text": "", "digests": { "md5": "b6e83228e7870bb0fe6554fc9fbec71d", "sha256": "31c3643aab566dac3a0724c6bd77592e3efd08a6328d426da427cb0c7186b287" }, "downloads": -1, "filename": "pyscrap-0.0.6.tar.gz", "has_sig": false, "md5_digest": "b6e83228e7870bb0fe6554fc9fbec71d", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 8340, "upload_time": "2013-05-17T21:17:08", "url": "https://files.pythonhosted.org/packages/a0/5c/f3f6a36098b4a4f3110e4521bdbe7da9d400e832432f5d78d1bb1b28724c/pyscrap-0.0.6.tar.gz" } ], "0.0.7": [ { "comment_text": "", "digests": { "md5": "e4a481c9973381907416327e5dd6572a", "sha256": "f48d704d56a7f1426168979e199b6febe4e540340f40b0414576727abddfce31" }, "downloads": -1, "filename": "pyscrap-0.0.7.tar.gz", "has_sig": false, "md5_digest": "e4a481c9973381907416327e5dd6572a", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 7871, "upload_time": "2013-12-06T14:15:02", "url": "https://files.pythonhosted.org/packages/21/41/d36cecb2f1b42c8774585d6e73b7860ccceb640fdc373971e750535b23b8/pyscrap-0.0.7.tar.gz" } ], "0.0.8": [ { "comment_text": "", "digests": { "md5": "44606e510bbfbbe1f6cdd326d6355a80", "sha256": "fdb8162592c088791aa1840c3b3610fe63d058e597c50482a1e30d57146a598e" }, "downloads": -1, "filename": "pyscrap-0.0.8.tar.gz", "has_sig": false, "md5_digest": "44606e510bbfbbe1f6cdd326d6355a80", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 7874, "upload_time": "2013-12-06T14:50:59", "url": "https://files.pythonhosted.org/packages/49/7f/730543905b64daac9d6928270be4093d4af50115831acf45ee8d5f7cb880/pyscrap-0.0.8.tar.gz" } ], "0.0.9": [ { "comment_text": "", "digests": { "md5": "9b4ec62d2383e199724903b580b6ea4b", "sha256": "b15d33ea688381ffccb281bb2a620f13db3a0cb1407544345dca4b80de4e8273" }, "downloads": -1, "filename": "pyscrap-0.0.9.tar.gz", "has_sig": false, "md5_digest": "9b4ec62d2383e199724903b580b6ea4b", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 7849, "upload_time": "2014-01-28T01:37:30", "url": "https://files.pythonhosted.org/packages/aa/cb/07ac450abb5c2eb7cdf52ceb394c8a35e4cca440e59e964a029d0696f181/pyscrap-0.0.9.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "9b4ec62d2383e199724903b580b6ea4b", "sha256": "b15d33ea688381ffccb281bb2a620f13db3a0cb1407544345dca4b80de4e8273" }, "downloads": -1, "filename": "pyscrap-0.0.9.tar.gz", "has_sig": false, "md5_digest": "9b4ec62d2383e199724903b580b6ea4b", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 7849, "upload_time": "2014-01-28T01:37:30", "url": "https://files.pythonhosted.org/packages/aa/cb/07ac450abb5c2eb7cdf52ceb394c8a35e4cca440e59e964a029d0696f181/pyscrap-0.0.9.tar.gz" } ] }