Steven Looman
Steven Looman
19 augustus 2013

Handling history during a migration

Recently we blogged about the migration of our support system from Plone 3 to Plone 4. We are migrating over 100.000 objects to Dexterity.

Our support site includes trackers with tickets, but also documents, instructions, and other types. Some of these types can have history. For several reason, such as ISO 9001 certification, we needed to keep the history of the objects.

The exporter was changed to export all versions of the object. Turns out this is fairly simple. The following code listing shows the part of the exporter which yields all objects in the site.

if item.__class__.__name__ in VERSIONED_TYPES:
    portal_repository = item.portal_repository
    # get all versions in order of oldest to newest
    history = portal_repository.getHistory(item, oldestFirst=True)
    if history:
        for hist in history:
            yield hist.object
        yield item
    yield item

VERSIONED_TYPES is a list which contains the types of which we want to export with their history. In case the current object has history, retrieve the history (from oldest to newest) and yield all those objects. Otherwise, simply yield the object.

Ultimately, the history information is saved in a dictionary under ‘_history’ which is used later on.

In the importer, the original constructor section is already handling our case properly: in case an object already exists, don’t create a new one. Otherwise, create a new one. No changes required for the constructor section.

Later on in the pipeline, the current object (a specific version of the real object) needs to be saved. To handle this, we created a new section: save_version. This section explicitly calls the method. A bit of patching was required, as we wanted to keep the original timestamp and the original principal. The original save method in the portal_repository does not support this. This is the patched version:

def CopyModifyMergeRepositoryTool__save(self, obj, comment='', metadata={}, timestamp=None, principal=None):
    """See ICopyModifyMergeRepository.
    self._assertAuthorized(obj, SaveNewVersion, 'save')
    sp = transaction.savepoint(optimistic=True)
        sys_metadata = self._prepareSysMetadata(comment)
        # overwrite timestamp if param is given
        if timestamp:
            sys_metadata['timestamp'] = timestamp

        self._recursiveSave(obj, metadata, sys_metadata, autoapply=self.autoapply)

        if principal:
            sys_metadata['principal'] = principal

    except ModifierException:
        # modifiers can abort save operations under certain conditions

A part of the save_version section:

class SaveVersion(object):def __init__(self, transmogrifier, name, options, previous):def __iter__(self):
        for item in self.previous:
            history = item.get('_history', None)
            if history:
                obj = self.getObject(item)
                portal_repository = getToolByName(obj, 'portal_repository')
      , comment=comment, metadata=metadata, timestamp=timestamp, principal=principal)

            yield item

Pretty simple! Versions are kept in tact, with correct timestamps and owners.

When I started looking at this problem, I looked at the internals of Products.CMFEditions. The architecture.pdf file in the documentation directory contains some information. It generates and keeps ids (version_id, history_id) of the different versions of objects that are created. I was getting worried, as the IDs from the old site might collide with the IDs from the new site. My head started spinning with possible situations and ways to handle this. Turns out there was no need at all!

We love code