Home

Section 3: Backups


Why To Backup

Hopefully you already understand the general importance of maintaining regular backups. For example, if a server fails and all data on that server is lost, you can create a new server just like the old by restoring a backup. If someone runs a bad query and accidentally deletes data from your database, you can restore the lost data from a backup. If you make a change to your website and later find that it was a buggy change, you can roll back to the previous version of your website from a backup. If you're setting up multiple web servers, you can build the second server from a backup of the first. When you need to test changes before deploying them on a live website, you can create a copy of your actual website on a development server by restoring a backup.

What To Backup

Generally speaking, it is important to back up anything that you can't afford to lose and you can't easily recreate. For example, you will certainly want to make regular backups of your database. If you have written custom themes and modules, they too should be backed up. If you written custom patches for Drupal unique to your website, back them up. Any customized configuration files on your servers should also be backed up. If your users upload files such as pictures or sounds, this data should also be backed up.

Backups are an inexpensive insurance policy for when things go wrong, as well as a useful tool for duplicating servers. When backups are combined with a revision control system, they can also be useful for reviewing changes over time, and for understanding how changes have affected your website. Often times data loss is not immediately detected, in which case it is important to have multiple copies of backups.

The following list offers a suggestion of data that you should consider backing up. When deciding what from the following list you will be backing up, ask yourself, "what happens if I lose this data?"

Data to include in your backups

  • Database
  • Database configuration file(s)
  • Web server configuration file(s)
  • PHP configuration file(s)
  • User uploaded content
  • Custom modules and themes
  • Custom patches

What You May Not Want To Backup

While it is possible to back up your entire server, including the underlying operating system, this is often not necessary. The underlying operating system can be re-installed on a new server with minimal fuss. Then, the various customized configuration changes can be restored from backups. Furthermore, backing up your entire server will require significantly more storage space. This becomes more and more of a concern as you add additional servers to your infrastructure. Finally, a backup one server may not easily restore to another server if it has different hardware, such as different network cards or a hard drive of another size.

When backing up your database tables, it is possible to not back up up certain tables. For example, you don't have to back up Drupal 6's four search tables as they can be regenerated if they are lost. The many cache tables also do not have to be backed up. As the watchdog and access log tables are already automatically flushed after a certain amount of time, they are also good candidates for tables to skip if trying to minimize the size of your backups. If you decide to skip certain tables when making your backups, be aware that this can complicate the restoration process. If you are building a new server from backups, in addition to restoring your backup you will also have to manually create any tables that weren't included in your backup.

Redundancy vs. Backups

You may have set up redundant systems, and expect this to take the place of backups. For example, you may two databases with one replicating to the other. Or, your data may be stored on a high end RAID system, mirrored onto multiple physical drives. However, remember that you're not only trying to protect yourself from system failures. One of the most common reasons for data loss is human error. If you accidentally run a query that deletes half your users, this errant query will run on your database slave as well and delete your users in both places. Or, if you accidentally delete a directory containing user-contributed content, again this change will also be made on the mirrored drives. For this reason, it's important to not assume that redundancy replaces the need for regular backups.

When To Backup

A single backup of the above data from all your servers is a good start. But most websites are constantly changing, with new content being posted, old content being updated, and new users signing up all the time. Any changes made between the time of your last backup and when something goes wrong will be lost. Thus, it is important to make regular backups.

In the first section of this chapter one of the discussed goals asked you to define how much data you can afford to lose. Can you afford to lose an hour of data? Can you afford to lose 24 hours of data? Can you afford to lose a week of data? Obviously you would prefer to not have any lost data, but at the end of the day it comes down to a question of practicality and budget. Set realistic goals for yourself, and then figure out how you can meet those goals. If you can afford to lose a week of data, obviously your backup strategy can be much simpler than someone who can't afford to lose more than an hour of data.

Also note that different types of data may change with different frequency. For example, your database is likely to be constantly changing, while your custom themes and modules are rarely changing. Thus, different data can be backed up at a different frequency. It'

Backup Schedules

Now that you've defined how much data you can afford to lose in the event of a catastrophic failure, it's time to set up a regular backup schedule that meets your requirements. Your backup schedule needs to take into account two significant questions:

  1. How often does the backed up data change?
  2. How much data can you afford to lose?

If the data being backed up never or very rarely changes, you can update your backup each time you make a change. If your data changes all the time, then you'll instead need to automate regular backups that happen at least as frequently as your needs dictate. For example, if you can only afford to lose 6 hours of data should your database fail, set up your backup scripts to backup your database once every 6 hours.

Examples

Tracking Multiple Text Database Backups With Git

The following script is a simple yet powerful example of how you could efficiently store multiple backups of your database within a revision control system. In this example, we are using 'git', however you could easily replace git with your favorite source control system. Note that git is designed for storing lots of small files, not for storing one large file, so it is may not be the best choice of tools for maintaining backups of a growing database. Our use of the "--single-transaction" flag for mysqldump assumes that you are using MySQL's InnoDB storage engine.

To use this script, you should edit the configuration section as appropriate for your system. You then need to create an empty directory at the path defined by the script's BACKUP_DIRECTORY variable. Next, create a new git repository by moving into this directory and typing 'git init'. With the repository initialized, manually run the mysqldump command to generate the first copy of your database. Add this text backup to the repository using 'git add', and check it in using 'git commit -a'.

The steps described in the previous paragraph could have been automated, however my goal was to keep the script as simple as possible. Furthermore, you may end up deciding to use a different revision control system than 'git', in which case you will need to set things up differently.

The actual backup script follows:

#!/bin/sh

# Configuration:
BACKUP_DIRECTORY="/var/backup/mysql.git"
DATABASE="database_name"
DATABASE_USERNAME="username"
DATABASE_PASSWORD="password"
# End of configuration.

export PATH="/usr/bin:/usr/local/bin:$PATH"

cd $BACKUP_DIRECTORY

START=`date +'%m-%d-%Y %H:%M:%S'`

mysqldump -u$DATABASE_USERNAME -p$DATABASE_PASSWORD \
           --single-transaction --add-drop-table \
           $DATABASE > $DATABASE.sql

END=`date +'%m-%d-%Y %H:%M:%S'`
CHANGES=`git diff --stat`
SIZE=`ls -lh $DATABASE.sql | awk '{print $5}'`

/usr/bin/git-commit -v -m "Started:  $START
Finished: $END
File size: $SIZE
$CHANGES" -v $DATABASE.dump

Each time you run the above script, it will generate a current backup of your database and check in the difference between this backup and the previous backup. The script should be called from a regular cronjob, causing your database to be backed up every few hours or every day, depending on your needs.

Using 'git log', you can review the versions of your database that have been checked in, and you can see the information that is logged each time you make a backup:

Author: Jeremy Andrews
Date:   Sun Jul 20 15:14:09 2008 -0400

    Started:  07-20-2008 15:13:01
    Finished: 07-20-2008 15:14:02
    File size: 14M
     database.sql |   44 ++++++++++++++++++++++----------------------
     1 files changed, 22 insertions(+), 22 deletions(-)

There are many simple improvements you could make to increase the usefulness of this script, including:

  • Occasionally run 'git gc' to compress all the older copies of your database stored in your git repository.
  • Replace 'git' with your favorite source control system.
  • Push a copy of your repository to a remote server, so the backups don't live only on the same server as your database. It is important that you can access the backups if your database server fails.
  • Generate an email each time the backup is completed, sending a brief status report.
  • Redirect stdout and stderr to a log file so you can see any errors that happen when running the script from crontab.
  • Minimize the size of the changes between each backup by making two backups of your database. One backup should only include your table definition using the --no-data option to mysqldump, and one backup should only include your data using the --no-create-info option.

Backing Up Your Website With Git

Git provides a very simple method for backing up your website. It offers much more than a backup, but that's all we're concerned about in this section. In preparation, first create an empty Git repository on your backup server. If you have multiple servers or web directories you wish to back up, you should create an empty Git repository for each. By using the "--bare" flag, we reduce the size of our backup as it won't maintain an uncompressed copy of the latest version of the files:

$ mkdir backup.git
$ cd backup.git
$ git --bare init
Initialized empty Git repository in /home/user/backup.git/

Next, on the web server that you are backing up, "initialize" a repository in your web directory. Add your website files to this repository, and then "push" it to the empty repository on the backup server. It is safe to initialize a Git repository on your live server and check files into it as this does not modify your files in any way. Instead, it creates a ".git" subdirectory where the local repository is stored. In this example, we'll assume that your backup server has an IP address of 10.10.10.10:

$ cd /var/www/html
$ git init
Initialized empty Git repository in .git/
$ git add .
$ git commit -a -m "Backup all files in website."
$ git remote add backup-server user@10.10.10.10:backup.git
$ git push backup-server master

Now, as you add new files to your web server, add them to your git repository by running "git add". Commit these new files and any changed files by running "git commit -a". And finally, push these updates to the backup server by running "git push backup-server master".

You will learn more about using Git in the next section of this chapter.

Testing Backups

Simply making backups of your data is only half of the job. It's also critical that you regularly validate your backups, insuring that they are not corrupt and that they contain everything you need to rebuild your websites.

One way to test your backups is to restore them to your development server, building an up-to-date development environment. Doing this one time is not enough, as though this does validate your general backup strategy, it doesn't regularly validate the integrity of each backup. You should instead update your development environment from backups on a regular schedule, such as once a week. The process can be automated through simple scripts.