{ "info": { "author": "Jakub Jirutka", "author_email": "jakub@jirutka.cz", "bugtrack_url": null, "classifiers": [ "Development Status :: 4 - Beta", "Environment :: Web Environment", "Intended Audience :: Developers", "License :: OSI Approved :: GNU Lesser General Public License v3 or later (LGPLv3+)", "Operating System :: OS Independent", "Programming Language :: Python", "Topic :: Software Development :: Libraries :: Python Modules" ], "description": "# \u010cSFD Parser\n\nParser pro str\u00e1nky film\u016f a vyhled\u00e1v\u00e1n\u00ed na [\u010cSFD](http://www.csfd.cz).\n\n\n## Pozn\u00e1mky\n\n[\u010cSFD](http://www.csfd.cz) je nejzn\u00e1m\u011bj\u0161\u00ed \u010cesko-Slovensk\u00e1 filmov\u00e1 datab\u00e1ze. Obsahuje informace o v\u00edce ne\u017e \u010dtvrt milionu \u010desk\u00fdch a zahrani\u010dn\u00edch film\u016f, d\u00e1le informace o herc\u00edch, re\u017eis\u00e9rech atd. Bohu\u017eel ale neposkytuje \u017e\u00e1dn\u00e9 ve\u0159ejn\u00e9 API pro strojov\u00fd p\u0159\u00edstup k dat\u016fm, \u017e\u00e1dn\u00e9 webov\u00e9 slu\u017eby. Jedin\u00fd zp\u016fsob, jak\u00fdm z n\u00ed dostat data, je parsov\u00e1n\u00edm webov\u00fdch str\u00e1nek. A p\u0159esn\u011b k tomu vznikl tento k\u00f3d.\n\nWebov\u00e9 str\u00e1nky \u010cSFD maj\u00ed sice doctype XHTML+RDFa, ale p\u0159itom jsem v nich \u017e\u00e1dn\u00e9 RDFa zna\u010dky nena\u0161el a dokonce nejsou ani _well-formed_. Nelze je proto p\u0159\u00edmo zpracovat jako XML. V\u0161echny parsery, kter\u00e9 jsem pro \u010cSFD na\u0161el (a jeden z nich jsem kdysi s\u00e1m napsal), str\u00e1nky parsuj\u00ed sadou slo\u017eit\u00fdch regul\u00e9rn\u00edch v\u00fdraz\u016f. Jejich vym\u00fd\u0161len\u00ed sice m\u016f\u017ee b\u00fdt pom\u011brn\u011b z\u00e1bavn\u00e9 procvi\u010den\u00ed mozkov\u00fdch z\u00e1vit\u016f na dlouh\u00e9 zimn\u00ed ve\u010dery, ale n\u00e1sledn\u00e9 udr\u017eov\u00e1n\u00ed a upravov\u00e1n\u00ed p\u0159i zm\u011bn\u011b designu str\u00e1nek je nesm\u00edrn\u011b n\u00e1ro\u010dn\u00e9. Nav\u00edc takov\u00e9 \u0159e\u0161en\u00ed nen\u00ed p\u0159\u00edli\u0161 rychl\u00e9. Cht\u011bl jsem proto zkusit trochu jin\u00fd p\u0159\u00edstup.\n\nTento parser nen\u00ed postaven\u00fd na regul\u00e9rn\u00edch v\u00fdrazech, ale m\u00edsto toho vyu\u017e\u00edv\u00e1 [HTML parser](http://lxml.de/), kter\u00fd ze str\u00e1nky postav\u00ed XML DOM, nad kter\u00fdm se pot\u00e9 dotazuje p\u0159es XPath. Lokalizace po\u017eadovan\u00fdch element\u016f na str\u00e1nce je d\u00edky XPath neuv\u011b\u0159iteln\u011b snadn\u00e1 a v\u00fdsledn\u00fd k\u00f3d je kr\u00e1sn\u011b p\u0159ehledn\u00fd. M\u016f\u017eete sami porovnat nap\u0159\u00edklad s [t\u00edmto](http://www.phpclasses.org/browse/file/33086.html) k\u00f3dem.\n\n\n## Po\u017eadavky\n\n* Python 2.7 / 3.1\n* [lxml](http://lxml.de/) (testov\u00e1no na 2.3.1)\n\n\n## Upozorn\u011bn\u00ed\n\nPou\u017e\u00edvejte tento k\u00f3d pouze pro vlastn\u00ed pot\u0159ebu, nezneu\u017e\u00edvejte ho pro vykr\u00e1d\u00e1n\u00ed datab\u00e1ze \u010cSFD!\n\n\n## Licence\n\nTento projekt je uve\u0159ejn\u011bn\u00fd pod licenc\u00ed [LGPL version 3](http://www.gnu.org/licenses/lgpl.txt).\n\n## Contributors\n\n[Alex Rembish](http://github.com/rembish)", "description_content_type": null, "docs_url": null, "download_url": "UNKNOWN", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/jirutka/CSFD-parser", "keywords": null, "license": "LGPL version 3", "maintainer": null, "maintainer_email": null, "name": "csfd-parser", "package_url": "https://pypi.org/project/csfd-parser/", "platform": "UNKNOWN", "project_url": "https://pypi.org/project/csfd-parser/", "project_urls": { "Download": "UNKNOWN", "Homepage": "https://github.com/jirutka/CSFD-parser" }, "release_url": "https://pypi.org/project/csfd-parser/1.0.5/", "requires_dist": null, "requires_python": null, "summary": "Parser for movie pages and search on CSFD.cz", "version": "1.0.5" }, "last_serial": 788570, "releases": { "1.0.5": [ { "comment_text": "", "digests": { "md5": "a40d9d44fc6d805e9fbb0892b97c3f89", "sha256": "5e4e124d575942cf42b2526b53b9304d40dcf394da93b0bade846c44c1bcf9ff" }, "downloads": -1, "filename": "csfd-parser-1.0.5.tar.gz", "has_sig": false, "md5_digest": "a40d9d44fc6d805e9fbb0892b97c3f89", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 6513, "upload_time": "2013-04-01T11:42:51", "url": "https://files.pythonhosted.org/packages/7f/6c/2555f91bc9eac3edbf41aed42a66fa7ea1d901baf6d0b2f3808ab0574e00/csfd-parser-1.0.5.tar.gz" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "a40d9d44fc6d805e9fbb0892b97c3f89", "sha256": "5e4e124d575942cf42b2526b53b9304d40dcf394da93b0bade846c44c1bcf9ff" }, "downloads": -1, "filename": "csfd-parser-1.0.5.tar.gz", "has_sig": false, "md5_digest": "a40d9d44fc6d805e9fbb0892b97c3f89", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 6513, "upload_time": "2013-04-01T11:42:51", "url": "https://files.pythonhosted.org/packages/7f/6c/2555f91bc9eac3edbf41aed42a66fa7ea1d901baf6d0b2f3808ab0574e00/csfd-parser-1.0.5.tar.gz" } ] }