HA cluster setup for Plone and Zope

Nowadays there are a lot of options how to setup your server configuration and a lot of choices which software to use. Virtual machines, data duplication, HA clusters, OS choice, what kind of servers. What is the best choice for a Plone and Zope hosting provider like us? After a week of investigation and testing we have found a setup which satisfies our needs.

Our requirements

Our goal was to build a HA (High Availability) server setup. In our case : If one physical machine fails, another should take over its tasks automatically. However, it's a waste if you have one machine doing nothing but waiting to take over, so it would be nice to have a combination of HA and load balancing, and a way to extend this setup with extra capacity if necessary.

Ganeti

Our first thought went to Ganeti, a tool developed by Google to manage virtual machines in a cluster with one or more physical machines. It doesn't support automatic failover yet, and the technology is very new so it would be risky to use this software for our setup. It's pretty neat software though,check out this PDF file for more information about Ganeti.

Traditional HA

A traditional setup is perfectly described here by the guys of Goldmund, Wyldebeast & Wunderliebe which is a HA setup which has proven to work. However, we wanted to extend this setup a little more, as we really liked the idea and flexibility using virtual machines as Ganeti does. It's easier to migrate this setup to a new machine, and you can move or copy virtual machines to another physical machine if necessary. You can also manage and configure the resources more easily, having the ZEO client VM having more memory and CPU power then the other VM's.

Our setup

The basis of our setup is almost the same as the HA cluster howto on plone.org : A master and a slave node. We added DRBD for data synchronization and Xen to have a virtual system for each layer in the setup. On each server we have three (Debian) virtual machines : One for the Apache frontend (and caching if you want to), one for running the ZEO clients and one for the ZEO backend (and MySQL in our case). We mount a NFS disk for the actual data storage, but to keep it simple for now we assume the ZEO backend virtual machine stores the data. See the following image:

How it works

In a normal condition (when both nodes are active) the Apache VM and ZEO backend VM are running on the master node. The Apache VM has the public IP address (or floating address as mentioned in the HA cluster howto) to access the plone websites. The ZEO backend VM and the Apache VM are installed on a DRBD partition. This means these VM's are always being synchronized with the slave node. On both nodes there is a ZEO client VM which runs the ZEO clients. These are not installed on a DRBD partition (and not automatically synced) as they both need a different IP address (for the load balancer), and you may want to start a ZEO client in debug mode (for testing purposes for example). You can add new physical machines which run a ZEO client VM as well so you can extend capacity easily this way.

The main idea is actually the same : Heartbeat is configured on the slave node to monitor the master node. If the master server goes down it will start the Apache VM and the ZEO server VM on the second node. As these VM's are installed on DRBD partitions they are identical to the VM's of the first node. The Apache VM on the second node has the same IP address so it will take over all requests and the ZEO clients will reconnect to the 'same' ZEOserver, which is now active on the second node.

In Part II we will describe in more detail how we configured this setup, including our used configuration.