Menu:

Sunday, August 17, 2003

Automating Website Backups

Proper backups are vitally important and websites are no exception. The following article provides a blueprint for an automated backup procedure of a website running on Linux or other Unix-like operating systems. The scripts will likely function under cygwin on Windows. The procedures are aimed at a shared hosting environment without shell (SSH) access, although they can be extended to dedicated or co-located servers. It is assumed that the reader is familiar with Linux system administration.

In the typical hosting environment, there are three classes of data to be backed up, files, databases, and control panel configuration.

File Backup

Backing up files is generally a straightforward task, since all the relevant files can typically be accessed by ftp. The samples below use the Perl mirror utility, but there are many other tools that could be used instead (like ftpcopy or wget). After mirror is installed, the automated file backup requires a mirror package file and a crontab entry.

Here is a sample mirror package file:

# arbitrary package name
      package=www.mysite.com
      comment=www.mysite.com website backup

# remote ftp server and login information
      site=ftp.mysite.com
      remote_user=ftp-user
      remote_password=ftp-password

# remote starting directory path
      remote_dir=/
# local backup directory
      local_dir=/home/myself/www.mysite.com

# files you do not want to mirror
      #exclude_patt+|^logs|

# delete files from local copy if they're gone from the ftp site
# see also max_delete_files and max_delete_dirs
      do_deletes=true

# email notification
      mail_to=myself@somewhere.domain

Since this file stores the ftp login and password in clear text, it must be heavily protected.

A sample script to run from cron:

#!/bin/sh
PATH=/bin:/usr/bin
export PATH
mirror -d /path/to/package-file

# Change Management - rotating set up 4 tarballs
cd /backup/directory
mv mysite-2.tar.gz mysite-3.tar.gz
mv mysite-1.tar.gz mysite-2.tar.gz
mv mysite-0.tar.gz mysite-1.tar.gz
tar cfz mysite-0.tar.gz http://www.mysite.com

# Change Management - CVS
# (requires a CVS repository for http://www.mysite.com and a
#  sandbox for it at /backup/directory/www.mysite.com)
cd www.mysite.com
	# fixme: escape funny characters
find . -print | grep -v /CVS | xargs cvs add -kb -m "NEW" `
cvs commit -m "AUTOMAGIC"

And finally the crontab entry:

0 1 * * * /path/to/file-backup-script 

As a backup tool of last resort, the following CGI script can be customized as needed. Please note that this script should be secured by a password or other means; see the database backup CGI below for a sample .htaccess configuration.

/cgi-bin/filebackup.cgi

#!/bin/sh
# top of directories to back up
BACKUPDIR="/path/to/user/files"
export BACKUPDIR
# modify search PATH as necessary
PATH="/bin:/usr/bin"
export PATH

echo -n "Content-type: application/x-gzip"
echo
echo
cd $BACKUPDIR
tar cf - . | gzip -c

Database Backup

Backing up databases is more complex than the file backup. The following discussion is limited to MySQL as the most common database used by hosting providers, but similar arrangements can be made for PostgreSQL and other databases.

If remote access to the database is possible (by direct connection or tunneling), backups can be done by simply running mysqldump from a cron job. However, there are security ramifications to allowing remote access to the database.

Another way to back up the database is to install a CGI or PHP script on the server that will return the database dump. Here is a very simple CGI script for this purpose:

/cgi-bin/dbdump.cgi

#!/bin/sh
PATH="/bin:/usr/bin"
export PATH
echo "Content-type: text/plain"
echo
echo
mysqldump -a -c -e --opt \
  --host=localhost \
  --user=DBUSER \
  --password=DBPASS \
  DBNAME

The script can be modified to query for the database user credentials, but since these are almost certainly stored on the server already there is little incremental risk to storing them in a second location. Even so, the scripts should be protected by at least basic authentication:

/cgi-bin/.htaccess

AuthUserFile /path/to/htpasswd
AuthName MySiteBackup
AuthType Basic

<Files "dbdump.cgi">
require valid-user
</Files>

The command-line options to mysqldump should be carefully reviewed. If the database grows too large, the script can be modified to compress the database dump on the fly.

Automated backups can be performed by a script like the one below:

#!/bin/sh
PATH="/bin:/usr/bin"
export PATH

BACKUPDIR="/path/to/backup/directory"

backup () {
	site=$1
	url="http://$site/cgi-bin/dbdump.cgi"
	sql="$site.sql"

	rm -f $sql
	wget -q -O - --http-user=$2 --http-passwd=$3 $url > $sql
	# use RCS for revision control
	# (can be replaced with CVS or something else)
	rcs -l $sql
	ci -u -m"CRON `date`" $sql < /dev/null
}

cd $BACKUPDIR
backup http://www.mysite.com AUTHNAME AUTHPASS
#backup http://www.mysite2.com AUTH2NAME AUTH2PASS
#...

Finally, a sample crontab entry:

   0 2 * * * /path/to/database-backup-script

Control Panel Configuration

The control panel configuration is the hardest to back up. The configuration data may not be exportable or accessible to the user and even then there are portability issues. Since the control panel configuration usually changes infrequently, the easy way out is to save screen dumps or web pages.

Windows

As mentioned before, all the scripts should run with no or trivial modifications only under cygwin, using either the cygwin port of cron or possibly the Windows Scheduler.

Posted by markus in • Generic Geekery
(0) CommentsPermalink

Next entry: Shared Virtual Reference Disks

Previous entry: How to Survive a Bad Hosting Provider and Preserve Your Sanity

Comments


Add a comment

Name:

Email:

Location:

URL:

Smileys

Remember my personal information

Notify me of follow-up comments?

Submit the word you see below: