Automate Your Web Site Backup! 05/05/08
During the weekend (Saturday), UbuntuLinuxHelp was down for almost 12 hours. Fortunately the hosting provider had data backups and there was no data loss. In any event, I also keep backups, so the added redundancy helps to protect the content. Up to now, the server has been configured to create a daily backup of databases and certain directories; and those (.gz files) are downloaded manually to another location later.
But, what if there were no backups? What if your hosting provider cannot restore data at their end? To be blunt, you’d be back to square one! Developing a whole new site or blog from the beginning! That’s a chilling thought, to lose everything and start again.
For peace of mind and data (intellectual property) , today’s post will highlight some of the steps we’ve taken to fully automate the backup process. Hopefully this will help many of you who may encounter the same issues, or are simply looking for a proactive, automated backup system for your web sites, blogs, ecommerce sites, etc.
We’ll need 5 things to ensure this system works:
- The remote host (your web hosting server).
- The local host (your Ubuntu or other Linux based desktop).
- The open source Rsync package.
Let’s start with our desktop, which is the ‘localhost’. In my case the desktop is Ubuntu Linux 7.10, but this can be any Linux based system. This could also be another Linux server, if you tweak this a bit more.
I know ‘cron’ is enabled (because it’s part of the default installation) of my Linux desktop. I also know SSH is installed (because it’s installed by default and I’ve used it), but I’m not sure if ‘rsych’ is there and if it works over SSH.
Side note: For those not familiar with Rsync, “rsync is an open source utility that provides fast incremental file transfer. rsync is freely available under the GNU General Public License and is currently being maintained by Wayne Davison.” Source: http://samba.anu.edu.au/rsync/
To see if rsych is installed, use the following terminal command:
apt-cache search rsync
If you see it’s installed, to determine if rsync works over SSH, open a terminal and type the following command (substituting your correct information):
rsync -avz -e ssh Your Remote Username@Your Remote Server Host:/The Remote/dir /Your Local/dir/
Here is what the switches mean:
a: Use ‘archive’ mode.
v: Use ‘verbose’ output.
z: Use ‘compression’ during file transfer.
e: Specify the ‘command’ to run. In this case SSH.
In my case the command could look something like this:
rsync -avz -e ssh email@example.com :/backupdir/daily /home/ubplay/sitebackups
After entering the above command, I’m prompted to enter the password and the file transfer begins.
In my case this is simple because the hosting provider uses ‘The’ industry standard software (Linux) as the standard applications, openssh, rsych, cron, etc. And my local Linux system already had the tools installed. Now that I’ve determined it works, cron can automate the system. However, before moving to cron, make sure your server is configured to backup the files and databases on a daily (or other) schedule.
If you’re using industry standard hosting services, you’ll be on a Linux box using cPanel. Personally, I’ve tried several others including Plesk, ISPConfig, etc, however in my opinion, they don’t have the amount of flexibility or options that cPanel does. In terms of a LAN however, in my opinion nothing beats Webmin. Webmin has the greatest flexibility and options. However, I’m going off topic here, back to the subject at hand!… Log into your hosting control panel and use the interface to configure your scheduled backups to occur during low-traffic periods. Make a note of the directory the backups are saved to. WHM/cPanel is great for this as it’s configured via a simple GUI, and is easy to use. In my case the server backs up the web site files and databases and stores them in /backupdir (so that my cron job can download any files in this directory later). For privacy issues, I’m not going to post the script as it contains a username and password among other “exposures”.
Before moving to cron itself, I needed to configure a script that will rsync over the SSH connection. Here are some example I found on the rsync site: Rsync Examples. Another great resource we found is here: resync-incr. On this site you’ll see another methodology and example scripts. And finally another great backup scripting resource here: Backup Script. I’m sure some of you have other great sites and resources listed, please comment below and add them.
After you’ve set up your script, however you want it (there are hundreds of ways!), use cron to run it. Setting up the cron job is not very difficult:
0 2 * * * /home/ubplay/cron/rsync-ubuntulinuxhelp
This (above) downloads the backup at 2am every day. Remember to ensure that your server has finished creating its backup by this time. Otherwise you’ll not be downloading the files you expect. In my case I use nano to create the file called “rsync-ubuntulinuxhelp” placed in the …/cron director. The file named rsync-ubuntulinuxhelp contains the actual bash script. To create the cron job itself (that calls the script), complete the following in a terminal:
sudo crontab -e
and use the following parameters:
* * * * * path to script/command to be executed and script/command
- – – – -
| | | | |
| | | | — Day of week (0 – 7)
| | | ——- Month (1 – 12)
| | ——— Day of month (1 – 31)
| ———– Hour (0 – 23)
————- Minute (0 – 59)
(‘*’ means ‘every’).
Side note: to view your existing cron jobs, in a terminal, type:
sudo cron -l
to delete a cron job:
sudo cron -r
As usual, I hope this helps some of you!
[tags]linux, ubuntu, automatic, backup, website, cron, rsync, how to, openssh, save website[/tags]