{ "info": { "author": "Alex Clark", "author_email": "aclark@aclark.net", "bugtrack_url": null, "classifiers": [ "Framework :: Buildout", "Framework :: Plone" ], "description": ".. contents:: :depth: 2\n\nmr.importer\n===========\n\nEasily import static websites on the file system into Plone via a command\nlike::\n\n $ bin/plone run bin/import /path/to/files\n\nIntroduction\n------------\n\n``mr.importer`` is a Buildout recipe that creates a script for you to easily\nget content from static HTML websites on the file system into Plone.\n\nWarning\n-------\n\nThis is a **Buildout** recipe for use with **Plone**; by itself it does nothing. If you\ndon't know what Plone is, please see: http://plone.org. If you don't know\nwhat Buildout is, please see: http://www.buildout.org/.\n\nGetting started\n---------------\n\nFirst, a couple caveats:\n\n* A Plone site object must exist in the Zope2 instance database. By default in mr.importer,\n that site object is assumed to be named \"Plone\".\n\n* An admin user must exist in the Zope2 instance database (or Plone site). By\n default in mr.importer that user is assumed to be named \"admin\".\n\nAnd because it drives the author nuts whenever he has to dig for a recipe's options,\nhere are this recipe's options with sample values::\n\n [import]\n recipe = mr.importer\n\n # core features\n path = /Plone\n user = admin\n illegal_chars = _ . +\n illegal_words =\n illegal_expressions =\n html_extensions = html\n image_extensions = png\n file_extensions = mp3\n target_tags = p\n \n # additional features\n force = false\n publish = false\n collapse = false\n create_spreadsheet = false\n replacetypes =\n rename =\n match =\n paths =\n\n.. Note::\n The parameters listed above are configured with their default values. Edit these\n values if you would like to change the default behavior; they are (mostly)\n self-explanatory. Now you can just cut and paste to get started or keep reading if\n you would like to know more.\n\n.. Note::\n This recipe creates a script that is **not** intended to be run directly.\n Due to technical limitations, the author was not able to implement a user\n friendly error message. So if you run ``bin/import`` directly you will see\n this::\n\n $ bin/import\n Traceback (most recent call last):\n File \"bin/import\", line 116, in \n mr.importer.main(app=app, path='/Plone', illegal_chars='_,.',\n illegal_words='id,start', illegal_expressions='[0-9]', html_extensions='html',\n image_extensions='gif,jpg,jpeg,png', file_extensions='mp3,xls',\n target_tags='a,div,font,h1,h2,p', force=True, publish=False, collapse=False,\n rename=None, replacetypes=None, match=None, create_spreadsheet=True)\n NameError: name 'app' is not defined\n\n To avoid this, run the script as intended::\n\n $ bin/plone run bin/import /path/to/files\n\n See the `execution`_ section below for more information.\n\nInstallation\n------------\n\nYou can install ``mr.importer`` by editing your ``buildout.cfg`` file like\nso. First add an ``import`` section::\n\n [import]\n recipe = mr.importer\n\nThen add the ``import`` section to the list of parts::\n\n [buildout]\n ...\n parts =\n ...\n import\n\nNow run ``bin/buildout`` as usual.\n\n.. Note::\n The section name ``import`` is arbitrary, you can call it whatever you\n want. Just keep in mind that the section name corresponds directly to the\n script name. In other words, whatever you name the section - that's what\n the script will be called.\n\n\nExecution\n---------\n\nNow you can run ``mr.importer`` like this::\n\n $ bin/plone run bin/import /path/to/files\n\n.. Note:: \n In the example above and examples below, ``bin/plone`` refers to a *Zope 2\n instance* script created by `plone.recipe.zope2instance`_.\n\n Your ``bin/plone`` script may be called ``bin/instance`` or\n ``bin/client``, etc. instead.\n\n.. _`plone.recipe.zope2instance`: http://pypi.python.org/pypi/plone.recipe.zope2instance\n\nExample\n-------\n\nIf you have a site in /var/www/html that contains the following::\n\n /var/www/html/index.html\n /var/www/html/about/index.html\n\nYou should run::\n\n $ bin/plone run bin/import /var/www/html\n\nAnd the following will be created:\n\n* http://localhost:8080/Plone/index.html\n* http://localhost:8080/Plone/about/index.html\n\nCustomization\n-------------\n\nModifying the default behavior of ``mr.importer`` is easy; just use the command\nline options or add parameters to your ``buildout.cfg`` file. Both approaches\nallow customization of the exact same set of options, but the command line\narguments will trump any settings found in your ``buildout.cfg`` file.\n\nBuildout options\n~~~~~~~~~~~~~~~~\n\nYou can configure the following parameters in your ``buildout.cfg`` file in\nthe ``mr.importer`` recipe section.\n\nOptions\n'''''''\n+----------------------+------------+----------------------------------------+\n| **Parameter** |**Default** | **Description** |\n| |**value** | |\n+----------------------+------------+----------------------------------------+\n| ``path`` |/Plone | Specify an alternate location in the |\n| | | database for the import to occur. |\n+----------------------+------------+----------------------------------------+\n| ``user`` |admin | Specify an alternate user to import |\n| | | content with. |\n+----------------------+------------+----------------------------------------+\n| ``illegal_chars`` |_ . | Specify illegal characters. |\n| | | ``mr.importer`` will ignore files that |\n| | | contain these characters. |\n+----------------------+------------+----------------------------------------+\n| ``html_extensions`` |html | Specify HTML file extensions. |\n| | | ``mr.importer`` will import HTML files |\n| | | with these extensions |\n+----------------------+------------+----------------------------------------+\n| ``image_extensions`` |png, gif, | Specify image file extensions. |\n| |jpg, jpeg, | ``mr.importer`` will import image files|\n| | | with these extensions. |\n+----------------------+------------+----------------------------------------+\n| ``file_extensions`` |mp3, xls | Specify image file extensions. |\n| | | ``mr.importer`` will import files with |\n| | | with these extensions as files in Plone|\n| | | (unless you configure |\n| | | create_spreadsheet=true, see below) |\n+----------------------+------------+----------------------------------------+\n| ``target_tags`` |a h1 h2 p | Specify target tags. ``mr.importer`` |\n| | | will parse the contents of HTML tags |\n| | | listed. If any tag is provided as an |\n| | | XPath expression (any expression |\n| | | begining with /) the matching elements |\n| | | will first be extracted from the root |\n| | | document. Selections for the contents |\n| | | of other tags will then be performed |\n| | | only on the document subset. |\n| | | If only XPath expressions are given, |\n| | | then the entire subtree of the matched |\n| | | elements are returned (including HTML) |\n+----------------------+------------+----------------------------------------+\n| ``force`` |false | Force create folders that do not exist.|\n| | | For example, if you do |\n| | | --path=/Plone/foo and foo does not |\n| | | exist, you will get an error message. |\n| | | Use --force to tell ``mr.importer`` to |\n| | | create it. |\n+----------------------+------------+----------------------------------------+\n| ``publish`` |false | Publish newly created content. |\n+----------------------+------------+----------------------------------------+\n| ``collapse`` |false | \"collapse\" content. (see |\n| | | collapse_parts() in mr.importer.py) |\n+----------------------+------------+----------------------------------------+\n| ``rename`` | | Rename content. (see rename_parts() |\n| | | in mr.importer.py | \n+----------------------+------------+----------------------------------------+\n| ``replacetypes`` | | Use custom types. (see replace_types())|\n+----------------------+------------+----------------------------------------+\n| ``match`` | | Match files. (see match_files()) |\n+----------------------+------------+----------------------------------------+\n| ``paths`` | | Specify a series of locations on the |\n| | | filesystem, with corresponding |\n| | | locations in the database for imports, |\n| | | with syntax: |\n| | | --paths=import_dirs:object_paths |\n| | | (--path will be ignored) |\n+----------------------+------------+----------------------------------------+\n|``create_spreadsheet``| false | Create \"spreadsheets\". (see |\n| | | create_spreadsheet() in mr.importer.py)|\n+----------------------+------------+----------------------------------------+\n\nExample\n'''''''\n\nInstead of accepting the default ``mr.importer`` behaviour, in your\n``buildout.cfg`` file you may specify the following::\n\n [import]\n recipe = mr.importer\n path = /Plone/foo\n html_extensions = htm\n image_extensions = png\n target_tags = p\n\nThis will configure ``mr.importer`` to (only) import content from:\n\n* Images ending in ``.png``\n* HTML files ending in ``.htm``\n* Text within ``p`` tags\n\n*to*: \n\n* A folder named ``/Plone/foo``.\n\nCommand line options\n~~~~~~~~~~~~~~~~~~~~\n\nThe following ``mr.importer`` command line options are supported.\n\nOptions\n'''''''\n\n``'--path'``, ``'-p'``\n**********************\n\nYou can specify an alternate import path ('/Plone' by default)\nwith ``--path`` or ``-p``::\n\n $ bin/plone run bin/import /path/to/files --path=/Plone/foo\n\n``'--html-extensions'``\n***********************\n\nYou can specify HTML file extensions with the ``--html-extensions`` option::\n\n $ bin/plone run bin/import /path/to/files --html-extensions=htm\n\n``'--image-extensions'``\n************************\n\nYou can specify image file extensions with the ``--image-extensions`` option::\n\n $ bin/plone run bin/import /path/to/files --image-extensions=png\n\n``'--file-extensions'``\n***********************\n\nYou can specify generic file extensions with the ``--file-extensions`` option::\n\n $ bin/plone run bin/import /path/to/files --file-extensions=pdf\n\n``'--target-tags'``\n*******************\n\nYou can specify the target tags to parse with the ``--target-tags`` option::\n\n $ bin/plone run bin/import /path/to/files --target-tags=p\n\n``'--force'``\n*************\n\nForce create folders that do not exist.\n\n``'--publish'``\n***************\n\nPublish newly created content.\n\n``'--collapse'``\n****************\n\n\"collapse\" content (see collapse_parts() in mr.importer.py).\n\n``'--rename'``\n***************\n\nRename content (see rename_files()).\n\n``'--replacetypes'``\n********************\n\nCustomize types (see replace_types() in mr.importer.py).\n\n``'--match'``\n****************\n\nMatch files (see match_files() mr.importer.py).\n\n``'--paths'``\n*************\n\nYou can specify a series of import paths and corresponding object paths::\n\n $ bin/plone run bin/import --paths=sample:Plone/sample,sample2:Plone/sample2\n\n``'--create-spreadsheet'``\n**************************\n\nYou can optionally tell ``mr.importer`` to try and import the contents of any\nspreadsheets it finds, by doing this::\n\n $ bin/plone run bin/import --create-spreadsheet /var/www/html\n\nIf /var/www/html/foo.xls exists and has content, then a \nhttp://localhost:8080/Plone/foo will be created as a page, with the contents\nof the spreadsheet in an HTML table.\n\n``'--help'``\n************\n\nAnd lastly, you can always ask ``mr.importer`` to tell you about its available options with\nthe ``--help`` or ``-h`` option::\n\n $ bin/plone run bin/import -h\n\nExample\n'''''''\n\nInstead of accepting the default ``mr.importer`` behaviour, on the command line you\nmay specify the following::\n\n $ bin/plone run bin/import /path/to/files -p /Plone/foo --html-extensions=html \\\n --image-extensions=png --target-tags=p\n\nThis will configure ``mr.importer`` to (only) import content from:\n\n* Images ending in ``.png``\n* HTML files ending in ``.htm``\n* Text within ``p`` tags\n\n*to*: \n\n* A Plone site folder named ``/Plone/foo``.\n\nTroubleshooting\n---------------\n\nHere are some trouble-shooting comments/tips.\n\nCompiling lxml\n~~~~~~~~~~~~~~\n\n``mr.importer`` requires ``lxml`` which in turn requires ``libxml2`` and\n``libxslt``. If you do not have ``lxml`` installed \"globally\" (i.e. in your\nsystem Python's site-packages directory) then Buildout will try to install it\nfor you. At this point ``lxml`` will look for the libxml2/libxslt2 development\nlibraries to build against, and if you don't have them installed on your system\nalready *your mileage may vary* (i.e. Buildout will fail).\n\nDatabase access\n~~~~~~~~~~~~~~~\n\nBefore running ``mr.importer``, you must either stop your Plone site or\nuse ZEO. Otherwise ``mr.importer`` will not be able to access the\ndatabase.\n\nContact\n-------\n\nQuestions, comments, or concerns? Please e-mail: aclark@aclark.net.\n\nCredits\n-------\n\nDevelopment sponsored by Radio Free Asia \n\nHistory\n-------\n\n1.0a5 (02/05/2011)\n~~~~~~~~~~~~~~~~~~\n\n* Rename ``parse2plone`` to ``mr.importer``\n\n * Repackage as needed\n\n* Switch to kwargs in main()\n\n * Better _SETTINGS handling\n\n* Add support for illegal_expressions check\n* Add \"Keep going!\" feature (to ignore errors)\n* Add all HTML4 tags to target_tags\n\n1.0a4 (01/12/2011)\n~~~~~~~~~~~~~~~~~~\n\n* Remove Plone dep\n\n1.0a3 (11/17/2010)\n~~~~~~~~~~~~~~~~~~\n\n* Bug fix: TypeError: join() takes exactly one argument (2 given) related to \n specifying import dir on on command line (as args[0]) fixed\n* Fix tests\n\n1.0a2 (11/17/2010)\n~~~~~~~~~~~~~~~~~~\n\n* Add spreadsheet import feature\n* Fix docs\n\n1.0a1 (11/17/2010)\n~~~~~~~~~~~~~~~~~~\n\n* Moved development to the (experimental) collective on Github\n\n0.9.9 (11/16/2010)\n~~~~~~~~~~~~~~~~~~\n\n* Added a large number of tests; performed associated refactoring; 50% test coverage\n\n0.9.8 (11/12/2010)\n~~~~~~~~~~~~~~~~~~\n\n* Add \"paths\" feature to allow multi-import dirs (on the\n file system), and corresponding object paths (in Plone)\n to be specified.\n\n0.9.7 (11/08/2010)\n~~~~~~~~~~~~~~~~~~\n\n* Fix import error\n* Add file handler to logger; saves output to a file called \"parse2plone.log\"\n\n0.9.6 (11/08/2010)\n~~~~~~~~~~~~~~~~~~\n\n* Fixes to \"match\" feature\n* Combine all modules into one\n* Remove a stray pdb (!)\n* Add tests (we're at 20% test coverage people!)\n* Update docs\n\n0.9.5 (11/08/2010)\n~~~~~~~~~~~~~~~~~~\n\n* Add match feature\n* Add more project justifications to the docs\n\n0.9.4 (11/06/2010)\n~~~~~~~~~~~~~~~~~~\n\n* Remove ``bin/import`` script whenever recipe is uninstalled [aclark4life]\n* Add support for XPath syntax in target_tags [derek]\n* Add \"typeswap\" feature [aclark4life]\n* Update docs [aclark4life]\n\n0.9.3 (11/04/2010)\n~~~~~~~~~~~~~~~~~~\n\n* Add Plone 2.5 compat\n* Bug fixes\n\n * Better handling of file system path; better base dir calculation\n\n0.9.2 (11/03/2010)\n~~~~~~~~~~~~~~~~~~\n\n* More doc fixes\n\n0.9.1 (11/03/2010)\n~~~~~~~~~~~~~~~~~~\n\n* Doc fixes\n\n0.9.0 (11/03/2010)\n~~~~~~~~~~~~~~~~~~\n\n* Fix regressions introduced (or unresolved as of) 0.8.2. Thanks Derek\n Broughton for the bug report(s)\n\n * Many fixes to convert_parameter_values() method which converts\n recipe parameters to arguments passed to main()\n * Fix \"slugify\" feature\n\n0.8.2 (11/02/2010)\n~~~~~~~~~~~~~~~~~~\n\n* Add rename feature\n* Fix regressions introduced in 0.8.1\n\n0.8.1 (10/29/2010)\n~~~~~~~~~~~~~~~~~~\n\n* Refactor options/parameters functionality to universally support _SETTINGS dict\n* Add \"slugify\" feature\n* Doc fixes\n* Add support to optionally publish content after creation\n* Add support for generic file import\n\n0.8 (10/27/2010)\n~~~~~~~~~~~~~~~~\n\n* Support the importing of content to folders within the Plone site object\n\n0.7 (10/25/2010)\n~~~~~~~~~~~~~~~~\n\n* Documentation fixes\n\n0.6 (10/25/2010)\n~~~~~~~~~~~~~~~~\n\n* Support customization via recipe parameters and command line arguments\n\n0.5 (10/22/2010)\n~~~~~~~~~~~~~~~~\n\n* Revert 'Add Plone to install_requires'\n\n0.4 (10/22/2010)\n~~~~~~~~~~~~~~~~\n\n* Add 'Plone' to install_requires\n\n0.3 (10/22/2010)\n~~~~~~~~~~~~~~~~\n\n* Another setuptools fix\n\n0.2 (10/22/2010)\n~~~~~~~~~~~~~~~~\n\n* Setuptools fix\n\n0.1 (10/21/2010)\n~~~~~~~~~~~~~~~~\n\n* Initial release", "description_content_type": null, "docs_url": null, "download_url": "UNKNOWN", "downloads": { "last_day": -1, "last_month": -1, "last_week": -1 }, "home_page": "https://github.com/collective/parse2plone", "keywords": null, "license": "UNKNOWN", "maintainer": null, "maintainer_email": null, "name": "mr.importer", "package_url": "https://pypi.org/project/mr.importer/", "platform": "UNKNOWN", "project_url": "https://pypi.org/project/mr.importer/", "project_urls": { "Download": "UNKNOWN", "Homepage": "https://github.com/collective/parse2plone" }, "release_url": "https://pypi.org/project/mr.importer/1.0a5/", "requires_dist": null, "requires_python": null, "summary": "Easily import static HTML websites on the file system into Plone", "version": "1.0a5" }, "last_serial": 795043, "releases": { "1.0a5": [ { "comment_text": "", "digests": { "md5": "03b31f87cf14a88a11774f83f027cf68", "sha256": "c5c8ea97f61019a76590d740e07ead9ecf880f08f862364e118636141ca05103" }, "downloads": -1, "filename": "mr.importer-1.0a5.zip", "has_sig": false, "md5_digest": "03b31f87cf14a88a11774f83f027cf68", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 81528, "upload_time": "2011-02-06T05:21:19", "url": "https://files.pythonhosted.org/packages/b0/23/94724ed0866ac65e61ce1c9ad3c6d855e780f125f9cf87caf1b235e497f2/mr.importer-1.0a5.zip" } ] }, "urls": [ { "comment_text": "", "digests": { "md5": "03b31f87cf14a88a11774f83f027cf68", "sha256": "c5c8ea97f61019a76590d740e07ead9ecf880f08f862364e118636141ca05103" }, "downloads": -1, "filename": "mr.importer-1.0a5.zip", "has_sig": false, "md5_digest": "03b31f87cf14a88a11774f83f027cf68", "packagetype": "sdist", "python_version": "source", "requires_python": null, "size": 81528, "upload_time": "2011-02-06T05:21:19", "url": "https://files.pythonhosted.org/packages/b0/23/94724ed0866ac65e61ce1c9ad3c6d855e780f125f9cf87caf1b235e497f2/mr.importer-1.0a5.zip" } ] }