Metadata-Version: 1.0
Name: transmogrify.siteanalyser
Version: 1.1
Summary: transmogrifier source blueprints for crawling html
Home-page: http://github.com/djay/transmogrify.siteanalyser
Author: Dylan Jay
Author-email: software@pretaweb.com
License: GPL
Description: Introduction
        ============
        
        Transmogrifier blueprints that look at how html items are linked to gather metadata
        about items. They can help you restructure your content.
        
        
        transmogrify.siteanalyser.urltidy
        =================================
        Will  normalize ids in urls to be suitable for adding to plone.
        
        The following will tidy up the URLs based on a TALES expression ::
        
        $> bin/funnelweb --urltidy:link_expr="python:item['_path'].endswith('.html') and item['_path'][:-5] or item['_path']"
        
        If you'd like to move content around before it's uploaded you can use the urltidy step as well e.g. ::
        
        $> bin/funnelweb --urltidy:link_expr=python:item['_path'].startswith('/news') and '/otn/news'+item['path'][5:] or item['_path']
        
        
        Options
        -------
        
        condition
        TAL Expression to apply transform
        
        locale
        TAL Expression to return the locale used for id normalisation. e.g. 'string:en'
        
        link_expr
        TAL Expression to alter the items '_path'
        
        use_title
        Condition TAL Expression to change the end path element to a normalised version of item['_title']
        
        
        
        transmogrify.siteanalyser.attach
        ================================
        Find items and move them if they are tightly linked to a single page. For example if an image
        is located in an images folder, but is only referenced from a single img element on a page in
        /page then the image will be 'merged' with the page.
        How the merge occurs depends on the 'fields' setting. Merging can either be moving the content
        of the subitem into a field of the parent item, or it can be via containment.
        
        
        
        or the following will only move attachments that are images and use ``index-html`` as the new
        name for the default page of the newly created folder ::
        
        [funnelweb]
        recipe = funnelweb
        attachmentguess-condition = python: subitem.get('_type') in ['Image']
        attachmentguess-defaultpage = index-html
        
        Options
        -------
        
        fields
        TAL Expression to return the a dictionary of changes to ``item``. It will use ``item``, ``subitem`` and ``i`` variables.
        e.g. python:{'attachment':subitem['text']}. This will be called for all subitems. The subitems will be deleted.
        
        condition
        TAL Expression to apply transform
        (default='python:True')
        
        defaultpage
        (default='index-html')
        
        
        
        transmogrify.siteanalyser.title
        ===============================
        
        This blueprint will take the _backlinks from the item generated by webcrawler
        and if no Title field has been given to the item it will attempt to guess
        it from the link names that linked to this document.
        You can specify an option 'ignore' option to specify titles never to use
        
        If it can't guess it from the backlinks it will default to using the file name after
        cleaning it up somewhat
        
        Options
        -------
        
        condition
        TAL Expression to apply transform
        
        ignore
        New line seperated list of strings which won't be use as titles. Defaults to 'next','previous'
        
        
        transmogrify.siteanalyser.sitemapper
        ====================================
        Rearrange content based on snippets of html arranged as a navigation tree or sitemap.
        A navigation tree is a set of href links arranged in nested html.
        
        Options
        -------
        
        field
        Name of a field from item which contains a sitemap
        
        field_expr
        Expression to determine the field which contains a sitemap
        
        condition
        Don't move this item
        
        transmogrify.siteanalyser.hidefromnav
        =====================================
        
        This blueprint will guess which folders should be hidden from the navigation tree.
        It does this by one of three rules
        
        1. Gather all links in the _template html left over after content extraction
        and assume anything linked from outside the content should have their folders shown and
        anything else should be hidden. #TODO
        2. Any folders with content found only via img links will also be hidden. #TODO
        3. The condition to set to tree for the item to hide
        
        Options
        -------
        
        key
        Default is '_exclude-from-navigation'.
        
        condition
        Default is 'python:False'
        
        template_key
        #TODO
        Default is '_template'
        
        hide_img_folders
        #TODO
        Default is 'True'
        
        
        transmogrify.siteanalyser.defaultpage
        =====================================
        To determine if an item is a default page for a container (it has many links
        to items in that container, even if not contained in that folder), and then move
        it to that folder.
        
        Options
        -------
        
        mode
        'links' or 'path' (default=links).
        'links' mode uses links
        to determine if a item is a defaultpage of a subtree by looking at it's links.
        'path' mode uses parent_path expression to
        determine if an item is a defaultpage of that parent.
        
        min_links
        If a page has as at least this number of links that point to content in a folder
        then move it there and make it the defaultpage. (default=2)
        
        max_uplinks
        If a page has more than max_uplinks it won't be moved. (default=2)
        
        parent_path
        Rule is defined by entered
        parent_path option which is expression with access to item,
        transmogrifier, name, options and modules variables.
        Returned value is used to find possible parent item by path. If found,
        item is moved to that parent item, parent item _defaultpage key is set
        appropriately, and we turn to processing another item in a pipeline. So
        the first item in pipeline will take precedence in case parent_path rule
        returns more than one item for the same parent.
        
        condition
        default=python:True
        
        
        transmogrify.siteanalyser.relinker
        ==================================
        Help restructure your content.
        If you'd like to move content from one path to another then in a
        previous blueprints adjust the '_path' to the new path. Create a new field
        called '_origin' and put the old path into that. Once you pass it through
        the relinker all href, img tags etc will be changed in any html content where they
        pointed to content that has since moved. All '_origin' fields will be removed
        after relinking.
        IsIndex
        =======
        
        IsIndex attempts to guess if a html file is really an index that should
        be the default page on a folder. It does this by looking at the links in
        the content. If it contains many links all pointing to objects in a
        certain folder then it will make this as teh index.
        If multiple are indexes then only one will win.
        If the file is not in the folder for which its an index, this will
        adjust the path to put it inside the folder.
        
        The strategy used is as follows:
        
        - get all the potential indexes and determine what they are most likely to be
        index of.
        
        - rank them on the depth of that dir
        
        - pick most deep dir. move all indexes that point to it into there.
        
        - choose one of those to be the index
        
        - loop (this move indexes that point to indexes)
        
        
        
        >>> from collective.transmogrifier.tests import registerConfig
        >>> from collective.transmogrifier.transmogrifier import Transmogrifier
        >>> transmogrifier = Transmogrifier(plone)
        
        
        >>> config = """
        ... [transmogrifier]
        ... pipeline =
        ...     source
        ...     isindex
        ...     printer
        ...
        ... [source]
        ... blueprint = transmogrify.webcrawler.test.htmlbacklinksource
        ... content=<a href="f1/blah1"></a><a href="f1/blah2"></a>
        ... f1/blah1=blah1
        ... f1/blah2=blah2
        ...
        ... [isindex]
        ... blueprint = transmogrify.webcrawler.isindex
        ...
        ... [printer]
        ... blueprint = collective.transmogrifier.sections.tests.pprinter
        ... """
        >>> registerConfig(u'test1', config)
        >>> transmogrifier(u'test1')
        {'_mimetype': 'text/html',
        '_origin': 'content',
        '_path': 'f1/content',
        '_site_url': 'http://test.com/',
        'text': '<a href="f1/blah1"></a><a href="f1/blah2"></a>'}
        {'_backlinks': [('http://test.com/content', '')],
        '_mimetype': 'text/html',
        '_path': 'f1/blah1',
        '_site_url': 'http://test.com/',
        'text': 'blah1'}
        {'_backlinks': [('http://test.com/content', '')],
        '_mimetype': 'text/html',
        '_path': 'f1/blah2',
        '_site_url': 'http://test.com/',
        'text': 'blah2'}
        
        >>> config = """
        ... [transmogrifier]
        ... pipeline =
        ...     source
        ...     isindex
        ...     printer
        ... [source]
        ... blueprint = transmogrify.webcrawler.test.htmlbacklinksource
        ... f1/content=<a href="blah1"></a><a href="blah2"></a>
        ... f1/blah1=blah1
        ... f1/blah2=blah2
        ...
        ... [isindex]
        ... blueprint = transmogrify.webcrawler.isindex
        ...
        ... [printer]
        ... blueprint = collective.transmogrifier.sections.tests.pprinter
        ... """
        
        >>> registerConfig(u'test2', config)
        >>> transmogrifier(u'test2')
        {'_mimetype': 'text/html',
        '_path': 'f1/content',
        '_site_url': 'http://test.com/',
        'text': '<a href="blah1"></a><a href="blah2"></a>'}
        {'_backlinks': [('http://test.com/f1/content', '')],
        '_mimetype': 'text/html',
        '_path': 'f1/blah1',
        '_site_url': 'http://test.com/',
        'text': 'blah1'}
        {'_backlinks': [('http://test.com/f1/content', '')],
        '_mimetype': 'text/html',
        '_path': 'f1/blah2',
        '_site_url': 'http://test.com/',
        'text': 'blah2'}
        Relinker
        ==========
        
        >>> from collective.transmogrifier.tests import registerConfig
        >>> from collective.transmogrifier.transmogrifier import Transmogrifier
        >>> transmogrifier = Transmogrifier(plone)
        >>> config = """
        ... [transmogrifier]
        ... pipeline =
        ...     webcrawler
        ...     relinker
        ...     printer
        ...
        ... [webcrawler]
        ... blueprint = transmogrify.webcrawler.test.htmlsource
        ... level3/index=<a href="../level2/index">Level 2</a>
        ... level2/index=<a href="../level3/index">Level 3</a><img src="+&image%20blah">
        ... level2/+&image%20blah=<h1>content</h1>
        ...
        ... [relinker]
        ... blueprint = transmogrify.webcrawler.relinker
        ... link_expr = python:item['_path']+'/image_web'
        ...
        ... [moves]
        ... blueprint = transmogrify.webcrawler.pathmover
        ... moves =
        ... 	level2	level3
        ... 	level3	level2
        ...
        ... [printer]
        ... blueprint = collective.transmogrifier.sections.tests.pprinter
        ... """
        
        >>> registerConfig(u'test', config)
        >>> transmogrifier = Transmogrifier(plone)
        >>> transmogrifier(u'test')
        {'_mimetype': 'text/html',
        '_path': 'level3/index',
        '_site_url': 'http://test.com/',
        'text': '<html>\n  <a href="../level2/index/image_web">Level 2</a>\n</html>\n'}
        {'_mimetype': 'text/html',
        '_path': 'level2/index',
        '_site_url': 'http://test.com/',
        'text': '<html>\n  <a href="../level3/index/image_web">Level 3</a>\n  <img src="image-blah/image_web"/>\n</html>\n'}
        {'_mimetype': 'text/html',
        '_path': 'level2/image-blah',
        '_site_url': 'http://test.com/',
        'text': '<html>\n  <h1>content</h1>\n</html>\n'}
        
        It is designed to cope with any combination of quoting of urls
        
        >>> config = """
        ... [transmogrifier]
        ... pipeline =
        ...     webcrawler
        ...     relinker
        ...     printer
        ...
        ... [webcrawler]
        ... blueprint = transmogrify.webcrawler.test.htmlsource
        ... one%20two's+strange1=<a href="one two+is+strange2">Level 2</a>
        ... one%20two%20is+strange2=<a href="one two's%20strange1">Level 2</a>
        ...
        ... [relinker]
        ... blueprint = transmogrify.webcrawler.relinker
        ...
        ... [printer]
        ... blueprint = collective.transmogrifier.sections.tests.pprinter
        ...
        ... """
        >>> registerConfig(u'test2', config)
        >>> transmogrifier(u'test2')
        {'_mimetype': 'text/html',
        '_path': 'one-twos-strange1',
        '_site_url': 'http://test.com/',
        'text': '<html>\n  <a href="one-two-is-strange2">Level 2</a>\n</html>\n'}
        {'_mimetype': 'text/html',
        '_path': 'one-two-is-strange2',
        '_site_url': 'http://test.com/',
        'text': '<html>\n  <a href="one-twos-strange1">Level 2</a>\n</html>\n'}
        
        It will deal with moving many parts at the same time
        
        >>> config = """
        ... [transmogrifier]
        ... pipeline =
        ...     source
        ...     moves
        ...     relinker
        ...     treeserializer
        ...     printer
        ...
        ... [source]
        ... blueprint = transmogrify.webcrawler.test.htmlbacklinksource
        ... a/img=blah
        ... a/content1=<a href="img">
        ...
        ... [moves]
        ... blueprint = transmogrify.webcrawler.pathmover
        ... moves =
        ...    a	b
        ...
        ... [relinker]
        ... blueprint = transmogrify.webcrawler.relinker
        ...
        ... [treeserializer]
        ... blueprint = transmogrify.webcrawler.treeserializer
        ...
        ... [printer]
        ... blueprint = collective.transmogrifier.sections.tests.pprinter
        ... """
        >>> registerConfig(u'test3', config)
        >>> transmogrifier(u'test3')
        {'_type': 'Folder', '_site_url': 'http://test.com/', '_path': 'b'}
        {'_mimetype': 'text/html',
        '_path': 'b/content1',
        '_site_url': 'http://test.com/',
        'text': '<html>\n  <a href="img"/>\n</html>\n'}
        {'_backlinks': [('http://test.com/b/content1', '')],
        '_mimetype': 'text/html',
        '_path': 'b/img',
        '_site_url': 'http://test.com/',
        'text': '<html>blah</html>\n'}
        
        MakeAttachments
        ===============
        
        Will look for items that are linked from just one place and also have no
        other links out. These 'deadends' will then be moved 'into' the linking item.
        
        If the fields option is set to a list of tuples then these indicate changes
        to make to item to merge in the subitem. The head of the list will be used as
        the filename to relink any html links to.
        
        If no fields are set then a folder will be created, the item set as its default
        view and any subitems moved into that folder.
        
        
        Our condition ensures in this doesn't produce a move there are only one subitem.
        
        >>> from collective.transmogrifier.tests import registerConfig
        >>> from collective.transmogrifier.transmogrifier import Transmogrifier
        >>> transmogrifier = Transmogrifier(plone)
        >>> config = """
        ... [transmogrifier]
        ... pipeline =
        ...     source
        ...     makeattachments
        ...     treeserializer
        ...     printer
        ...
        ... [source]
        ... blueprint = transmogrify.htmltesting.htmlbacklinksource
        ... level3/index=<a href="../level2/index">Level 2</a>
        ... level2/index=<a href="../level3/index">Level 3</a><img src="+&image%20blah">
        ... level2/+&image%20blah=<h1>content</h1>
        ...
        ... [makeattachments]
        ... blueprint = transmogrify.webcrawler.makeattachments
        ... fields = python:i>=0 and (('attachment'+str(i+1)+'Image', subitem['text']),('attachment'+str(i+1)+'Title', 'blah'), )
        ...
        ... [treeserializer]
        ... blueprint = transmogrify.webcrawler.treeserializer
        ...
        ... [printer]
        ... blueprint = collective.transmogrifier.sections.tests.pprinter
        ... """
        
        Add two more subitems and then we get attachments
        
        >>> registerConfig(u'test', config)
        >>> transmogrifier(u'test')
        {'_type': 'Folder', '_site_url': 'http://test.com/', '_path': 'level2'}
        {'_backlinks': [('http://test.com/level3/index', 'Level 2')],
        '_mimetype': 'text/html',
        '_path': 'level2/index',
        '_site_url': 'http://test.com/',
        'attachment1Image': '<h1>content</h1>',
        'attachment1Title': 'blah',
        'text': '<a href="../level3/index">Level 3</a><img src="+&image%20blah">'}
        {'_origin': 'level2/+&image%20blah',
        '_path': 'level2/index/attachment1Image',
        '_site_url': 'http://test.com/'}
        {'_type': 'Folder', '_site_url': 'http://test.com/', '_path': 'level3'}
        {'_backlinks': [('http://test.com/level2/index', 'Level 3')],
        '_mimetype': 'text/html',
        '_path': 'level3/index',
        '_site_url': 'http://test.com/',
        'text': '<a href="../level2/index">Level 2</a>'}
        
        >>> config = """
        ... [transmogrifier]
        ... include = test
        ...
        ... [source]
        ... level3/index=<a href="../level2/index">Level 2</a>
        ... level2/index=<a href="../level3/index">Level 3</a><img src="+&image%20blah"><img src="pdf">
        ... level2/+&image%20blah=<h1>content</h1>
        ... level2/pdf=<img src="pdf2">
        ... level2/pdf2=pdf2
        ...
        ... """
        >>> registerConfig(u'test2', config)
        >>> transmogrifier(u'test2')
        {'_type': 'Folder', '_site_url': 'http://test.com/', '_path': 'level2'}
        {'_backlinks': [('http://test.com/level3/index', 'Level 2')],
        '_mimetype': 'text/html',
        '_path': 'level2/index',
        '_site_url': 'http://test.com/',
        'attachment1Image': '<h1>content</h1>',
        'attachment1Title': 'blah',
        'text': '<a href="../level3/index">Level 3</a><img src="+&image%20blah"><img src="pdf">'}
        {'_origin': 'level2/+&image%20blah',
        '_path': 'level2/index/attachment1Image',
        '_site_url': 'http://test.com/'}
        {'_backlinks': [('http://test.com/level2/index', '')],
        '_mimetype': 'text/html',
        '_path': 'level2/pdf',
        '_site_url': 'http://test.com/',
        'attachment1Image': 'pdf2',
        'attachment1Title': 'blah',
        'text': '<img src="pdf2">'}
        {'_origin': 'level2/pdf2',
        '_path': 'level2/pdf/attachment1Image',
        '_site_url': 'http://test.com/'}
        {'_type': 'Folder', '_site_url': 'http://test.com/', '_path': 'level3'}
        {'_backlinks': [('http://test.com/level2/index', 'Level 3')],
        '_mimetype': 'text/html',
        '_path': 'level3/index',
        '_site_url': 'http://test.com/',
        'text': '<a href="../level2/index">Level 2</a>'}
        
        >>> config = """
        ... [transmogrifier]
        ... include = test2
        ...
        ... [makeattachments]
        ... blueprint = transmogrify.webcrawler.makeattachments
        ... condition = python:subitem['_path'].count('pdf') and i>=0
        ...
        ... """
        >>> registerConfig(u'test3', config)
        >>> transmogrifier(u'test3')
        {'_type': 'Folder', '_site_url': 'http://test.com/', '_path': 'level2'}
        {'_backlinks': [('http://test.com/level2/index', '')],
        '_mimetype': 'text/html',
        '_path': 'level2/+&image%20blah',
        '_site_url': 'http://test.com/',
        'text': '<h1>content</h1>'}
        {'_backlinks': [('http://test.com/level3/index', 'Level 2')],
        '_mimetype': 'text/html',
        '_path': 'level2/index',
        '_site_url': 'http://test.com/',
        'text': '<a href="../level3/index">Level 3</a><img src="+&image%20blah"><img src="pdf">'}
        {'_backlinks': [('http://test.com/level2/index', '')],
        '_mimetype': 'text/html',
        '_path': 'level2/pdf',
        '_site_url': 'http://test.com/',
        'attachment1Image': 'pdf2',
        'attachment1Title': 'blah',
        'text': '<img src="pdf2">'}
        {'_origin': 'level2/pdf2',
        '_path': 'level2/pdf/attachment1Image',
        '_site_url': 'http://test.com/'}
        {'_type': 'Folder', '_site_url': 'http://test.com/', '_path': 'level3'}
        {'_backlinks': [('http://test.com/level2/index', 'Level 3')],
        '_mimetype': 'text/html',
        '_path': 'level3/index',
        '_site_url': 'http://test.com/',
        'text': '<a href="../level2/index">Level 2</a>'}
        
        It is possible to not use fields for attachments but rather use a folder with a
        default view. Just set fields to False (default).
        
        >>> config = """
        ... [transmogrifier]
        ... include = test
        ...
        ... [source]
        ... blueprint = transmogrify.webcrawler.test.htmlbacklinksource
        ... level3/index=<a href="level3"
        ... level2/index=<a href="../level3/index">Level 3</a><img src="+&image%20blah">
        ... level2/+&image%20blah=<h1>content</h1>
        ...
        ... """
        
        >>> registerConfig(u'test4', config)
        >>> transmogrifier(u'test4')
        {'_type': 'Folder', '_site_url': 'http://test.com/', '_path': 'level2'}
        {'_mimetype': 'text/html',
        '_path': 'level2/index',
        '_site_url': 'http://test.com/',
        'attachment1Image': '<a href="level3"',
        'attachment1Title': 'blah',
        'attachment2Image': '<h1>content</h1>',
        'attachment2Title': 'blah',
        'text': '<a href="../level3/index">Level 3</a><img src="+&image%20blah">'}
        {'_origin': 'level3/index',
        '_path': 'level2/index/attachment1Image',
        '_site_url': 'http://test.com/'}
        {'_origin': 'level2/+&image%20blah',
        '_path': 'level2/index/attachment2Image',
        '_site_url': 'http://test.com/'}
        
        >>> config = """
        ... [transmogrifier]
        ... include = test
        ...
        ... [source]
        ... blueprint = transmogrify.webcrawler.test.htmlbacklinksource
        ... level3/index=<a href="level3"
        ... level2/index=<a href="../level3/index">Level 3</a><img src="+&image%20blah">
        ... level2/+&image%20blah=<h1>content</h1>
        ...
        ... [makeattachments]
        ... fields = python:False
        ...
        ... """
        >>> registerConfig(u'test5', config)
        >>> transmogrifier(u'test5')
        {'_type': 'Folder', '_site_url': 'http://test.com/', '_path': 'level2'}
        {'_defaultpage': 'index-html',
        '_path': 'level2/index',
        '_site_url': 'http://test.com/',
        '_type': 'Folder'}
        {'_backlinks': [('http://test.com/level2/index', '')],
        '_mimetype': 'text/html',
        '_origin': 'level2/+&image%20blah',
        '_path': 'level2/index/+&image%20blah',
        '_site_url': 'http://test.com/',
        'text': '<h1>content</h1>'}
        {'_backlinks': [('http://test.com/level2/index', 'Level 3')],
        '_mimetype': 'text/html',
        '_origin': 'level3/index',
        '_path': 'level2/index/index',
        '_site_url': 'http://test.com/',
        'text': '<a href="level3"'}
        {'_mimetype': 'text/html',
        '_origin': 'level2/index',
        '_path': 'level2/index/index-html',
        '_site_url': 'http://test.com/',
        'text': '<a href="../level3/index">Level 3</a><img src="+&image%20blah">'}
        
        Test content that isn't linked up to makes sure its still passed through
        
        >>> config = """
        ... [transmogrifier]
        ... pipeline =
        ...     source
        ...     makeattachments
        ...     treeserializer
        ...     printer
        ...
        ... [source]
        ... blueprint = transmogrify.webcrawler.test.htmlbacklinksource
        ... blah1=blah1
        ... blah2=blah2
        ...
        ... [makeattachments]
        ... blueprint = transmogrify.webcrawler.makeattachments
        ...
        ... [treeserializer]
        ... blueprint = transmogrify.webcrawler.treeserializer
        ...
        ... [printer]
        ... blueprint = collective.transmogrifier.sections.tests.pprinter
        ... """
        >>> registerConfig(u'test5.5', config)
        >>> transmogrifier(u'test5.5')
        {'_mimetype': 'text/html',
        '_path': 'blah1',
        '_site_url': 'http://test.com/',
        'text': 'blah1'}
        {'_mimetype': 'text/html',
        '_path': 'blah2',
        '_site_url': 'http://test.com/',
        'text': 'blah2'}
        
        You can use a combination of folder and field attachments
        
        >>> config = """
        ... [transmogrifier]
        ... pipeline =
        ...     source
        ...     makeattachments
        ...     treeserializer
        ...     printer
        ...
        ... [source]
        ... blueprint = transmogrify.webcrawler.test.htmlbacklinksource
        ... content=<img src="blah1"><img src="blah2">
        ... blah1=blah1
        ... blah2=blah2
        ...
        ... [makeattachments]
        ... blueprint = transmogrify.webcrawler.makeattachments
        ... fields = python:i<1 and [('attach%i'%i,subitem['text'])]
        ...
        ... [treeserializer]
        ... blueprint = transmogrify.webcrawler.treeserializer
        ...
        ... [printer]
        ... blueprint = collective.transmogrifier.sections.tests.pprinter
        ... """
        >>> registerConfig(u'test6', config)
        >>> transmogrifier(u'test6')
        {'_defaultpage': 'index-html',
        '_path': 'content',
        '_site_url': 'http://test.com/',
        '_type': 'Folder'}
        {'_backlinks': [('http://test.com/content', '')],
        '_mimetype': 'text/html',
        '_origin': 'blah2',
        '_path': 'content/blah2',
        '_site_url': 'http://test.com/',
        'text': 'blah2'}
        {'_mimetype': 'text/html',
        '_origin': 'content',
        '_path': 'content/index-html',
        '_site_url': 'http://test.com/',
        'attach0': 'blah1',
        'text': '<img src="blah1"><img src="blah2">'}
        {'_origin': 'blah1',
        '_path': 'content/index-html/attach0',
        '_site_url': 'http://test.com/'}
        
        
        Changelog
        =========
        
        1.1 (2012-04-18)
        ----------------
        
        - added transmogrify.siteanalyser.sitemapper [djay]
        - split transmogrify.siteanalyser.urltidy out of relinker [djay]
        - ensure urltidy always create unique urls [djay]
        - Added ability to take id from title to urltidy [djay]
        - improved logging [djay]
        - fixed bug in attach where two items can end up with same path [djay]
        
        
        1.0 (2011-06-29)
        ----------------
        
        - 1.0 release
        
        1.0b8 (2011-02-12)
        ------------------
        - more robust parsing of html
        
        1.0b7 (2011-02-06)
        ------------------
        
        - show error if text is None
        - fix bug with bad chars in rewritten links
        - fix bug in losing items
        - add hidefromnav blueprint. does manual hiding
        
        
        1.0b6 (2010-12-15)
        ------------------
        
        - remove nulls from links which cause lxml errors
        - summarise info in log to single entry
        
        1.0b5 (2010-12-13)
        ------------------
        
        - condition was in the wrong place. resulted in dropping items
        - improve logging
        - handle default pages that don't exist
        
        1.0b4 (2010-11-11)
        ------------------
        
        - fix bug where _defaultpage wasn't being relinked
        
        1.0b3 (2010-11-09)
        ------------------
        
        - fix bug in quoting links in relinker
        
        
        1.0b2 (2010-11-08)
        ------------------
        
        - Add conditions to site analyser blueprints
        
Keywords: transmogrifier blueprint funnelweb source plone import conversion microsoft office
Platform: UNKNOWN
Classifier: Programming Language :: Python
Classifier: Topic :: Software Development :: Libraries :: Python Modules
