Handling history during a migration

By Steven Looman | On Aug 19, 2013
Recently we blogged about the migration of our support system from Plone 3 to Plone 4. We are migrating over 100.000 objects to Dexterity.

Our support site includes trackers with tickets, but also documents, instructions, and other types. Some of these types can have history. For several reason, such as ISO 9001 certification, we needed to keep the history of the objects.

The exporter was changed to export all versions of the object. Turns out this is fairly simple. The following code listing shows the part of the exporter which yields all objects in the site.

  if item.__class__.__name__ in VERSIONED_TYPES:
portal_repository = item.portal_repository
# get all versions in order of oldest to newest
history = portal_repository.getHistory(item, oldestFirst=True)
if history:
for hist in history:
yield hist.object
else:
yield item
else:
yield item

VERSIONED_TYPES is a list which contains the types of which we want to export with their history. In case the current object has history, retrieve the history (from oldest to newest) and yield all those objects. Otherwise, simply yield the object.

Ultimately, the history information is saved in a dictionary under '_history' which is used later on.

In the importer, the original constructor section is already handling our case properly: in case an object already exists, don't create a new one. Otherwise, create a new one. No changes required for the constructor section.

Later on in the pipeline, the current object (a specific version of the real object) needs to be saved. To handle this, we created a new section: save_version. This section explicitly calls the portal_repository.save method. A bit of patching was required, as we wanted to keep the original timestamp and the original principal. The original save method in the portal_repository does not support this. This is the patched version:

def CopyModifyMergeRepositoryTool__save(self, obj, comment='', metadata={}, timestamp=None, principal=None):
"""See ICopyModifyMergeRepository.
"""
self._assertAuthorized(obj, SaveNewVersion, 'save')
sp = transaction.savepoint(optimistic=True)
try:
sys_metadata = self._prepareSysMetadata(comment)
# overwrite timestamp if param is given
if timestamp:
sys_metadata['timestamp'] = timestamp

self._recursiveSave(obj, metadata, sys_metadata, autoapply=self.autoapply)

if principal:
sys_metadata['principal'] = principal

except ModifierException:
# modifiers can abort save operations under certain conditions
sp.rollback()
raise

A part of the save_version section:

class SaveVersion(object):

def __init__(self, transmogrifier, name, options, previous):


def __iter__(self):
for item in self.previous:
history = item.get('_history', None)
if history:
obj = self.getObject(item)

portal_repository = getToolByName(obj, 'portal_repository')
portal_repository.save(obj, comment=comment, metadata=metadata, timestamp=timestamp, principal=principal)

yield item

Pretty simple! Versions are kept in tact, with correct timestamps and owners.

When I started looking at this problem, I looked at the internals of Products.CMFEditions. The architecture.pdf file in the documentation directory contains some information. It generates and keeps ids (version_id, history_id) of the different versions of objects that are created. I was getting worried, as the IDs from the old site might collide with the IDs from the new site. My head started spinning with possible situations and ways to handle this. Turns out there was no need at all!