Converting Plone data to Django
Getting data out of a Plone ZODB and into something else, like PostgreSQL.
Use case
The old application was written on Plone. We re-created the application in Django. The main objective of this operation was to create a lighter, more easily scalable application.
Statistics
We converted 6.468 Member objects, 22.104 Content objects, 97.433 Custom ZODB objects, and 21.575 blobs. Conversion time was about 2 hours for non-blob data. Another 3 hours were needed for the blobs, mainly because of image processing.
Database storage requirements went down from 2+ gigabyte Data.fs to 31 megabyte .sql - or 125 megabyte in PostgreSQL.
Blobs storage went down from 92GB (including scaled images) to 27GB (no scaled images). Not much difference here; we now have the option to remove scales older then three months using file system tools only.
Response times went down from +/- 1s to <400ms uncached. Blob response time are now < 100ms (<50ms on our fibre connection).
Instead of 12 Zope instances we now run 4 Django processes on uwsgi.
TL;DR
- ZODB.broken.Broken is awesome!
- patch zope.interface to allow unpickling data with unknown interfaces
- Password hashing
- Moving blobs around
- No. automatic. solution.
The meat of the matter
The ZODB ships with ZODB.broken. This allows us to read the data out of a ZODB, without the need for the actual classes. If you ever have seen an error containing 'persistent broken [some.dotted.name]', this is because we couldn't import the class (removed, or moved, or an exception).
We need to patch zope.interface for a bit, so it doesn't crash while trying to unpickle Interface classes that do not exist.
import zope.interface.declarations # noqa
def _normalizeargs(sequence, output=None):
"""
Normalize declaration arguments.
Normalization arguments might contain Declarions, tuples, or single
interfaces.
Anything but individial interfaces or implements specs will be expanded.
"""
if output is None:
output = []
cls = sequence.__class__
if zope.interface.declarations.InterfaceClass in cls.__mro__ or \
zope.interface.declarations.Implements in cls.__mro__:
output.append(sequence)
elif type(sequence) is not type: # -- THIS IS THE PATCH. It prevents TypeError.
for v in sequence:
_normalizeargs(v, output)
return output
zope.interface.declarations._normalizeargs = _normalizeargs
We're completely ignoring the classes Plone, CMF and Zope2, so we do not install them. We're using our FileStorage and BlobStorage directly here. You could use any of the other storages available, ie ClientStorage or Relstorage.
Even though we're not actually modifying the objects, it still is a bad idea to connect to a production database. Just saying.
DBROOT = '...' # Set this to where your Data.fs en blobs are stored on disk.
import ZODB
import ZODB.FileStorage
storage = ZODB.FileStorage.FileStorage(DBROOT + 'Data.fs', blob_dir=DBROOT + 'blobstorage/')
db = ZODB.DB(storage)
connection = db.open()
root = connection.root() # ZODB root object.
app = root['Application'] # Zope 2 app object
site = app.__Broken_state__['Plone'] # Plone object
folder = site.__Broken_state__['project']
# <persistent broken client.contenttypes.content.projectfolder.Projectfolder instance '\x00\x00\x00\x00\x00\x005u'>
children = folder.__Broken_state__['_tree'] # ProjectFolder is a BTree based folder, which stores its children in the _tree attribute.
# Grab the first child from that tree
zope_project = iter(children.values()).next()
# <persistent broken client.contenttypes.content.project.Project instance '\x00\x00\x00\x00\x02\x1c\xc5\xc3'>
zope_project.__Broken_state__['id']
'my-first-project'
Now we can make new Django objects
from myproject.models import Project
# Convert the Zope data to Django data, YMMV here.
target_state = {'title': zope_project.__Broken_state__['title'], }
# I'm assuming you created Django models which use varchar for the id field.
# By passing the expected state as 'defaults', we can run this scripts multiple times and update the target DB.
django_project, created = Project.object.update_or_create(id=zope_project.__Broken_state__['id'], defaults=target_state)
Converting blobs
This converts a ZODB BlobImage.
from django.core.files.images import ImageFile
from myproject.models import ProjectImage
# z_image is an BlobImage object.
fname = z_image.__Broken_state__['image'].__Broken_state__['filename']
blob = z_image.__Broken_state__['image'].__Broken_state__['_blob']
with blob.open() as f:
defaults = {'image': ImageFile(f, name=fname)}
try:
fname = str(fname)
except UnicodeError:
fname = z_image.__Broken_state__['id']
image, created = ProjectImage.objects.update_or_create(project=django_project, ref_filename=fname, defaults=defaults)
Exporting blobs
This exports blobs to a folder on disk.
z_image_state = z_image.__Broken_state__
fname = z_image.__Broken_state__['id']
blob = z_image.__Broken_state__['image'].__Broken_state__['_blob']
target_dir = '/some/path/'
target_fname = os.path.join(target_dir, fname)
if not os.path.exists(target_dir):
os.makedirs(target_dir)
with blob.open() as f:
# Copies the actual blob (f.name) to our target location (target_fname)
shutil.copyfile(f.name, target_fname)
Converting RelationValues
def rv_to_object(plone_site, rv):
# This takes a Plone site object and a RelationValue.
It returns the target object. The target object may be Broken!
_components = plone_site.__Broken_state__['_components']
intids = _components.__Broken_state__['intids']
refs = intids.__Broken_state__['refs']
_ref = refs[rv.__Broken_state__['to_id']]
return _ref.__Broken_state__['object
Converting Zope DateTime objects
import datetime as dt
def to_datetime(zope_dt):
# Make a datetime.datetime from a Broken DateTime object.
micros, timezone_naive, tz = zope_dt.__Broken_state__
return dt.datetime.fromtimestamp(micros)
def to_timezone(obj):
# Make a timezone aware datetime object from a datetime object.
return timezone('Europe/Amsterdam').localize(obj).astimezone(pytz.UTC)
# For example, convert the 'creation_date' of a Member object to 'date_joined':
date_joined = to_timezone(to_datetime(z_state['creation_date']))
Converting Plone passwords
The Plone SHA1 passwords aren't Django compatible. Plone stores 'password' + 'salt', Django stores 'salt' + 'password'.
Put this in hashers.py
import base64
import hashlib
from collections import OrderedDict
from django.contrib.auth.hashers import SHA1PasswordHasher, mask_hash
from django.utils.crypto import constant_time_compare
from django.utils.encoding import force_bytes
from django.utils.translation import ugettext_lazy as _
class PloneSHA1PasswordHasher(SHA1PasswordHasher):
"""
The SHA1 password hashing algorithm used by Plone.
Plone uses `password + salt`, Django has `salt + password`.
"""
algorithm = "plonesha1"
_prefix = '{SSHA}'
def encode(self, password, salt):
"""Encode a plain text password into a plonesha1 style hash."""
assert password is not None
assert salt
password = force_bytes(password)
salt = force_bytes(salt)
hashed = base64.b64encode(hashlib.sha1(password + salt).digest() + salt)
return "%s$%s%s" % (self.algorithm, self._prefix, hashed)
def verify(self, password, encoded):
"""Verify the given password against the encoded string."""
algorithm, data = encoded.split('$', 1)
assert algorithm == self.algorithm
# throw away the prefix
if data.startswith(self._prefix):
data = data[len(self._prefix):]
# extract salt from encoded data
intermediate = base64.b64decode(data)
salt = intermediate[20:].strip()
password_encoded = self.encode(password, salt)
return constant_time_compare(password_encoded, encoded)
def safe_summary(self, encoded):
algorithm, hash = encoded.split('$', 1)
assert algorithm == self.algorithm
return OrderedDict([
(_('algorithm'), algorithm),
(_('hash'), mask_hash(hash)),
])
and add to settings.py:
PASSWORD_HASHERS = [
... insert the current set here
'myproject.hashers.PloneSHA1PasswordHasher',
]
and set the Django member's password using:
PASSWORD_PREFIX = PloneSHA1PasswordHasher.algorithm + '$'
pwd = PASSWORD_PREFIX + z_member.__Broken_state__['password']
member.password = pwd
member.save()
Getting groupmembers from acl_users
This returns a list of user ids from the 'Administrators' group. We used this for setting member.is_staff.
# site is a Plone site object.
adminusers = list(site.__Broken_state__['acl_users'].__Broken_state__['source_groups'].__Broken_state__['_group_principal_map']['Administrators'])
Getting an objects current review state
If you need an objects current workflow state:
review_state = obj.__Broken__state__['workflow_history'].__Broken_state__['data'][ insert workflow_id here ][-1]['review_state']
Reading a folder's children in order
This loops over an ordered folder.
for key in z_folder.__Broken_state__['__annotations__']['plone.folder.ordered.order']:
z_child = z_folder.__Broken_state__['_tree'][key]
Uniquely identifying a ZODB object
Use the Plone uuid:
uuid = obj.__Broken_state__['_plone.uuid']