Automating Website Backups
Proper backups are vitally important and websites are no exception. The following article provides a blueprint for an automated backup procedure of a website running on Linux or other Unix-like operating systems. The scripts will likely function under cygwin on Windows. The procedures are aimed at a shared hosting environment without shell (SSH) access, although they can be extended to dedicated or co-located servers. It is assumed that the reader is familiar with Linux system administration.
In the typical hosting environment, there are three classes of data to be backed up, files, databases, and control panel configuration.
File Backup
Backing up files is generally a straightforward task, since all the relevant files can typically be accessed by ftp. The samples below use the Perl mirror utility, but there are many other tools that could be used instead (like ftpcopy or wget). After mirror is installed, the automated file backup requires a mirror package file and a crontab entry.
Here is a sample mirror package file:
# arbitrary package name
package=www.mysite.com
comment=www.mysite.com website backup
# remote ftp server and login information
site=ftp.mysite.com
remote_user=ftp-user
remote_password=ftp-password
# remote starting directory path
remote_dir=/
# local backup directory
local_dir=/home/myself/www.mysite.com
# files you do not want to mirror
#exclude_patt+|^logs|
# delete files from local copy if they're gone from the ftp site
# see also max_delete_files and max_delete_dirs
do_deletes=true
# email notification
mail_to=myself@somewhere.domain
Since this file stores the ftp login and password in clear text, it must be heavily protected.
A sample script to run from cron:
#!/bin/sh PATH=/bin:/usr/bin export PATH mirror -d /path/to/package-file # Change Management - rotating set up 4 tarballs cd /backup/directory mv mysite-2.tar.gz mysite-3.tar.gz mv mysite-1.tar.gz mysite-2.tar.gz mv mysite-0.tar.gz mysite-1.tar.gz tar cfz mysite-0.tar.gz http://www.mysite.com # Change Management - CVS # (requires a CVS repository for http://www.mysite.com and a # sandbox for it at /backup/directory/www.mysite.com) cd www.mysite.com # fixme: escape funny characters find . -print | grep -v /CVS | xargs cvs add -kb -m "NEW" ` cvs commit -m "AUTOMAGIC"
And finally the crontab entry:
0 1 * * * /path/to/file-backup-script
As a backup tool of last resort, the following CGI script can be customized as needed. Please note that this script should be secured by a password or other means; see the database backup CGI below for a sample .htaccess configuration.
/cgi-bin/filebackup.cgi
#!/bin/sh # top of directories to back up BACKUPDIR="/path/to/user/files" export BACKUPDIR # modify search PATH as necessary PATH="/bin:/usr/bin" export PATH echo -n "Content-type: application/x-gzip" echo echo cd $BACKUPDIR tar cf - . | gzip -c
Database Backup
Backing up databases is more complex than the file backup. The following discussion is limited to MySQL as the most common database used by hosting providers, but similar arrangements can be made for PostgreSQL and other databases.
If remote access to the database is possible (by direct connection or tunneling), backups can be done by simply running mysqldump from a cron job. However, there are security ramifications to allowing remote access to the database.
Another way to back up the database is to install a CGI or PHP script on the server that will return the database dump. Here is a very simple CGI script for this purpose:
/cgi-bin/dbdump.cgi
#!/bin/sh PATH="/bin:/usr/bin" export PATH echo "Content-type: text/plain" echo echo mysqldump -a -c -e --opt \ --host=localhost \ --user=DBUSER \ --password=DBPASS \ DBNAME
The script can be modified to query for the database user credentials, but since these are almost certainly stored on the server already there is little incremental risk to storing them in a second location. Even so, the scripts should be protected by at least basic authentication:
/cgi-bin/.htaccess
AuthUserFile /path/to/htpasswd AuthName MySiteBackup AuthType Basic <Files "dbdump.cgi"> require valid-user </Files>
The command-line options to mysqldump should be carefully reviewed. If the database grows too large, the script can be modified to compress the database dump on the fly.
Automated backups can be performed by a script like the one below:
#!/bin/sh
PATH="/bin:/usr/bin"
export PATH
BACKUPDIR="/path/to/backup/directory"
backup () {
site=$1
url="http://$site/cgi-bin/dbdump.cgi"
sql="$site.sql"
rm -f $sql
wget -q -O - --http-user=$2 --http-passwd=$3 $url > $sql
# use RCS for revision control
# (can be replaced with CVS or something else)
rcs -l $sql
ci -u -m"CRON `date`" $sql < /dev/null
}
cd $BACKUPDIR
backup http://www.mysite.com AUTHNAME AUTHPASS
#backup http://www.mysite2.com AUTH2NAME AUTH2PASS
#...
Finally, a sample crontab entry:
0 2 * * * /path/to/database-backup-script
Control Panel Configuration
The control panel configuration is the hardest to back up. The configuration data may not be exportable or accessible to the user and even then there are portability issues. Since the control panel configuration usually changes infrequently, the easy way out is to save screen dumps or web pages.
Windows
As mentioned before, all the scripts should run with no or trivial modifications only under cygwin, using either the cygwin port of cron or possibly the Windows Scheduler.
