Tuesday 31 March 2009

ZFS Replication: HA backup

A small introduction of ZFS - ZFS is a transactional file system, which means that the file system state is always consistent on disk. With a transactional file system, data is managed using copy on write semantics. Data is never overwritten, and any sequence of operations is either entirely committed or entirely ignored. In addition, synchronous data (written using the O_DSYNC flag) is always guaranteed to be written before returning, so it is never lost.
http://docs.sun.com/app/docs/doc/819-5461/zfsover-2?a=view

Steps taken to build the ZFS replication:

1. Create a ZFS pool to store the data directory in. The command here will vary, there are several methods to create a pool. You can use a file, partition, slice or whole disk. The only resource I have available was a file. I created the file first:


shell>mkfile 1G /export/home/data


2. Then create a ZFS pool from the file:



shell>zpool create datadir /export/home/data


Now there is a folder in the root named 'datadir'. You can create from a partition, slice or whole drive in a similar manner by indicating the drive name as it is listed in /dev/dsk:


3. Once the 'datadir' pool is created we can now copy our data directory to the datadir pool. Shutdown the MySQL server and check the error log to ensure a clean shutdown

4. Add a datadir entry to you my.cnf file that points to the new folder:



in my.cnf

[mysqld]

datadir = /datadir

Now we have our data directory inside a ZFS pool. Now we need to send it to another pool on another machine likely for HA backup. Use steps 1-3 to create another pool on another machine. For example we will have 'SlaveA' machine with a pool named 'slavepool'.


5. We will need to snaphot the data directory in zfs to be able to send it to SlaveA.



shell >zfs snapshot datadir@snap1



This creates snapshot 'snap1' of the datadir pool.



You can verify the snapshots with the list command: 'zfs list -t snapshot'

6. Next we will need to send the snapshot to the slaveA server and apply it to the 'slavepool' pool:



shell >zfs send datadir@snap1 |ssh user_name@SlaveA pfexec zfs recv -F slavepool



This will 'send' the datadir snapshot 'datadir@snap1' over ssh to SlaveA server which then uses the stream of data to 'recv' receive it into the 'slavepool' pool.



7. At this point we now have a snapshot of the data directory in the slavepool on SlaveA. To allow us to write updates to it we need to make the slavepool readonly so no data or metadata can be changed by the OS.



On SlaveA server:



shell >zfs set readonly=on slavepool


8. Start the MySQL server back up and verify that it is using the new data directory by checking the global variables:



mysql>SHOW GLOBAL VARIABLES LIKE '%datadir%';



+---------------+------------------------+

| Variable_name | Value |

+---------------+------------------------+

| datadir | /datadir/ |

+---------------+------------------------+

At this point you will need to create a shell script that will snapshot the datadir and send the incremental snapshots to the slave.



Our original snapshot is datadir@snap1. Now we take another snapshot and call it datadir@snap2. We can then apply the changes in the file system to the SlaveA by sending the incremental changes to its slavepool:



Take the snapshot:



shell >zfs snapshot datadir@snap2



Then send the incremental (-i) change to the slave:



shell >zfs send -i datadir@snap1 datadir@snap2 |ssh user_name@SlaveA pfexec zfs recv slavepool


The above to steps will need to be handled in a scheduled script to update the SlaveA server so that its snapshot is current. Here you will need to test frequency to determine what is a good balance of load and staying current with your snapshots.







If you need to switch to the Slave server remember you will need to disable the read only we set earlier:



$>zfs set readonly=off slavepool


Then you can start your SlaveA MySQL server. It shoud go through crash recovery on the InnoDB tablespace. You may need to perform a repiar table on MyISAM tables. It is best you use an automatic crash recovery engine like InnoDB for this configuration.

No comments:

Post a Comment