(WARNING: Implementing this system requires a moderate amount of knowledge of UNIX, OS X, the RsyncX utility, and various other Macintosh support and administration concepts. If you have trouble understanding any part of this document, I recommend not following these instructions until your knowledge has reached a point where you DO grasp all the information presented here. Otherwise, you could be putting your data at significant risk!)
Implementing an Automated Backup System on OS X Using RsyncX and OS X Server
A while ago, I was tasked with the job of providing a production-quality environment for our graphic designers, based on OS X. One of the first tasks I needed to tackle was that of backup and recovery. If you can't restore a "production" system to service quickly, you are causing a lot of unnecessary downtime for the users.
In my OS 9 days, I used Dantz Retrospect to handle backup of desktop systems. In OS 9, I had a number of problems with Retrospect which I won't bore you with here. Bottom line is that it wasn't a very good solution for us, but it was the best available. I tried Retrospect with OS X and found it no better for what I wanted. I also benchmarked its use of network bandwidth in FTP mode and found it significantly lacking. A typical OS X system spent about 10 seconds transmitting data to the FTP server, then 30 seconds "thinking about it" and not transmitting. That meant it was using at most 25% of the available bandwidth during that 40-second window. The problem only got worse when I tried a 100 megabit connection. It then became 1 second of transmission followed by 30 seconds of waiting, meaning that about 97% of the available bandwidth was going unused. I needed something better, especially since the new systems we were building for the designers would have 500GB+ of disk space.
I tried every Mac OS X backup utility I could find. I was about ready to give up when I found RsyncX. I spent quite a while pulling my hair out just learning how Rsync's command syntax worked, how to specify names containing spaces, what needed to be backed up and what didn't, etc. In the end, I was able to create a script that would backup all the local hard disk drives on a system to a Mac OS X Server as efficiently as I thought it could be done. The script below is the result of all that research and effort. I'm not saying it's perfect, but it's the best my skills can currently provide. As much as my memory will permit, I'm going to try to share with you the details of how to implement this system for yourself.
First, the Assumptions...
Aside from assuming that you have a moderate to advanced level of Mac OS X support and administration skill, I am also going to assume that you have at least two systems to set this up on. One should be running Mac OS X Server (preferably 10.3.x since that's what I'm using and tested this all with). The other should be running Mac OS X "not Server". (There are can be lots more, but I'm talking about testing this out first before you jump in with both feet!) The two should be on the same TCP/IP network. You should have administrator access to both systems. You should have at least some basic shell scripting knowledge. You should have RsyncX 2.1 or later installed on both the server and the non-server machines already. You should setup a directory on the non-server (hereafter "desktop") system called "/Library/Admin" where you'll be putting the scripts and files used herein. The server should have as much free space on it as you would need to store all the file that are on the desktop system. And there might be other assumptions I'm forgetting, but I think these are the basics.
What we're going to be doing...
When this system is fully in place and working, the server will be running RsyncX at all times in "daemon" mode. This means that it's essentially just going to be sitting there humming along waiting for desktop systems to connect to backup their data. When one of them connects, it will authenticate them and start assisting them in performing the backup. The backup on the server will be a replica of the files on the desktop at the time of the last backup. There won't be any snapshots like Retrospect had (but if you see my "archive" article you'll see a way to provide comparable functionality), just one current backup.
The script we're going to be using will attempt to detect all the writable drives attached to our desktop Mac and determine which ones aren't network volumes (e.g., mounted file shares on a server). It will then backup everything that is writable and non-networked. That will include removable disks like Jaz disks or Rev disks. (If you don't want to include removables in your backups, I'll leave it as an exercise for you to figure out a way to adjust the script to ignore those.)
To maximize our use of network bandwidth, we will instruct RsyncX to backup (after the initial backup, anyway) only changed files, only the changed parts of the changed files, and to compress that data before pushing it across the network. To be "good citizens" on the network, we going to tell RsyncX that it can only use 8Mbits of a 10Mbit LAN connection. (If you want different options, learn RsyncX's command line options and adjust the script accordingly.)
Setting up the Server...
As mentioned earlier, I'm assuming your server is configured, connected to the LAN, has enough available disk space somewhere, and that you have administrator access to it. I'm also assuming RsyncX 2.1 or later is installed.
Login to the server as administrator. Create a folder called "backups" somewhere on the server, preferably where you have lots of disk space. Launch RsyncX from the Applications/Utilities directory. Run the RsyncX Server Setup Assistant from the Assistants menu of RsyncX. Set the server to be "read-write". Set the module name to "backup" (if you use something else, you'll need to modify the script below). Enter the full path of the "backups" folder in the "Mount Point" box. Enter the user name "backup" in the user name box. On the Security page of the wizard/assistant, enter "backup" as the UserName and select a password you want to use to control access to the backup server. Make a note of this password, because you'll need it later.
On the RsyncX Server Monitor window, click "Start" and then "Enabled" for "At Next Reboot". Quit RsyncX.
This is going to sound stupid, but if you read the RsyncX documentation and/or forums, you'll eventually find out I'm right. When you reboot the server someday, and you will, RsyncX is going to stop working. The developers don't know why this happens or how yet to fix it. I know how to temporarily get around it, but it requires a manual intervention for some reason. Basically, any time you find that RsyncX is getting an error connecting to the server from the desktop, you need to restart RsyncX on the server manually. (Don't try to script it. It won't work. Trust me.)
To restart RsyncX, login as root (preferably) and issue the following commands from a Terminal window:
killall rsync rsync --daemon
This kills the errant rsync processes that may be running and starts a new one in daemon mode. The desktops should be able to connect now.
Inside your "backups" directory, create a directory whose name matches the name you've given your test desktop Mac. (You'll later do this for each new Mac you add to the backup system.) This directory has to be in place before the client will connect properly.
Setting up a Desktop...
Login as root. Copy the script below into your /Library/Admin directory. Copy the two ".txt" files into the same directory. Create a new text file called "rspw.txt" that contains a single line with the password you setup for the "backup" UserName on the RsyncX Server. Change permissions on that file so that only "root" has access to it.
Modify the backup script so that the "macname" variable is set to match the name you used for the directory on the server (i.e., not "backups" but the name of the folder for the test desktop system).
Modify the backup script so that the "rsyncserver" variable is set to "rsync://" plus the name of the backup account you setup on the server (i.e., "backup"), followed by "@", followed by the name or IP address of the backup server.
Modify the "rsyncmodule" variable to match the name of the module you created in the Server setup assistant, if you didn't use the "backup" name I recommended.
Save the changed script.
Bring up a terminal window. At the command line, navigate ("cd") to the /Library/Admin directory. Use chmod to make the script an executable file.
Issue the command "csh newback" (where "newback" is the name of the backup script file). The script should begin executing.
If the script seems to run very, very fast (i.e., it runs and is done a couple of seconds when you know there is a lot of data to backup), suspect a problem on the server end. Either I've left out an important setup step, or you need to reboot the server, and/or you need to restart the daemon as described in the server section. If that doesn't work, get on the RsyncX forums and ask some questions. Someone will be able to sort it out for you. I'm probably not going to be much help because I've probably forgotten the critical bit of info needed here.
If the script gives an error message indicating that it can't seem to find rsync, suspect an install problem on the desktop. Try reinstalling RsyncX.
If the script gives an error message indicating that it doesn't understand the option "--eahfs" then you probably still have a configuration issue. The rsync software bundled in OS X doesn't understand this option (which is critical for getting a usable backup!), but the third-party RsyncX should. I recommend getting on the RsyncX forums and looking for some help.
If anything else goes wrong, I'd again try the RsyncX forums.
Assuming the first backup attempt worked...
Once you've managed to get your first successful backup done, congratulations! It's a lot easier from here on. What I do is setup a cron task to run this script (as root - that's critical) each night at some pre-determined time. I try to stagger the times I use for each desktop so that no two should normally be starting a backup at the same time. That will ensure better response for all the desktops and make your network people happier because rsyncx isn't hogging up all the available bandwidth.
On an ongoing basis, I recommend looking in the "backups" directory on the server to verify that the script is still running. The easiest way to do this is to look for the log file that the script generates when it runs. If the modification date on that file is more than a day or two old, odds are something is wrong. Suspect the server first and the client second. RsyncX is generally pretty reliable, but more times than not that server daemon has gotten "confused" and needs restarting.
As with all my scripts and advice, this is provided "as is" without warranty. If you choose to atttempt to use it, you assume all responsibility and liability for the results (good or bad). This script has been tested on several machines under MacOS X 10.3.x and 10.4 and appears to work, but your mileage may vary.
This is the main backup script, which I name "newbackup" on my system:
#! /bin/csh # # This script will backup all Volumes on the system, unless it # determines that the volume in question is a CD-ROM or other # read-only device. # # ------------------------------------------------------------ # Files used with this script: # /Library/CASAdmin/rspw.txt (Password file for rsync server) # /Library/CASAdmin/rsyncsysexc.txt (Excludes for system disks) # /Library/CASAdmin/rsyncallexc.txt (Excludes for all disks) # # ------------------------------------------------------------ # # Updated: July 29, 2005 # By: Michael Salsbury # # Set up the machine-specific variables we will use later. # # macname is the name we'll use to refer to this machine # echo " " echo "Beginning backup script..." set macname = "`hostname -s`" echo " " echo "Backing up: $macname" # # # Set the working directory. # set workdir = "/Library/CASAdmin" if (! -d "$workdir") then echo " " echo "*** PROBLEM! ***" echo " " echo "The directory you specified in the 'workdir' variable" echo "does not appear to exist on this system." echo " " exit else echo "Working directory ($workdir) found." endif # # Specify the name of the file containing files to be excluded # on disks where the operating system (OS X) is installed. # set sysexclude = "newbacksysexc.txt" if (! -e "$workdir/$sysexclude") then echo "*** PROBLEM! ***" echo " " echo "Can't find the exclude file: $workdir/$sysexclude" echo " " exit else echo "Found the OSX-disk-exclude file: $workdir/$sysexclude" endif # # Specify the name of the file containing file/folder names to # be excluded from non-operating-system disks. # set allexclude = "newbackallexc.txt" if (! -e "$workdir/$allexclude") then echo "*** PROBLEM! ***" echo " " echo "Can't find the exclude file: $workdir/$allexclude" echo " " exit else echo "Found the non-OSX-disk-exclude file: $workdir/$allexclude" endif # # Specify the name of the file containing the password to the # rsync account you'll be using on the server. # set pwfile = "newbackpw.txt" if (! -e "$workdir/$pwfile") then echo "*** PROBLEM! ***" echo " " echo "Can't find the exclude file: $workdir/$pwfile" echo " " exit else echo "Found the password file: $workdir/$pwfile" endif # # rsyncserver is the rsync URL for the backup server # set rsyncserver = "rsync://
" echo " " echo "Backing up to server: $rsyncserver" # # rsyncmodule is the name of the backup module on the server # set rsyncmodule = "backup" echo " " echo "Backing up to module: $rsyncmodule" # # Text needed to detect the right version of RsyncX on the Mac. # set detmsg = "HFS+ filesystem support for OSX" # # Verify that the installed Rsync version support HFS+ # /usr/local/bin/rsync -h > "$workdir/rsyncout.txt" set rsynchfsok = `grep -c "$detmsg" "$workdir/rsyncout.txt"` if ($rsynchfsok == 1) then echo " " echo "Found HFS+ aware version of RsyncX." echo " " rm -f "$workdir/rsyncout.txt" else echo " " echo "*** PROBLEM! ***" echo " " echo "Could not find RsyncX with HFS+ support." echo " " echo "This could be because you've installed RsyncX in a non-standard" echo "location. It could also be that you don't have RsyncX 2.1 or" echo "later installed." echo " " rm -f "$workdir/rsyncout.txt" exit endif # # Change to the /Volumes directory... # echo " " date cd /Volumes # # For each subdirectory, we're going to see if we need to back # it up to the server. # foreach d ( * ) # # If this isn't the shared directory, it's a user directory, so # we need to eliminate Internet cache files from it. # # Change to the disk volume we just found... # cd "/Volumes/$d" # # Echo the name to the terminal window... # echo " " echo Considering the volume: $d # # Check to see if this volume is a network volume. # Network volumes will need to be skipped. # # The UNIX "df" command will tell you about the free space left on # a mounted volume. Local disk volumes will be shown as # /dev/... # # Network volumes will show as: # afp_... # //... # ftp:// # # So below I'm setting a variable to the outcome of a grep of the # df command's output for the current volume, looking for the "/dev/" # which indicates we've got a local disk drive and not a network # volume. If I find it, $status will be 0. If I don't, $status # will be set to 1. ($status is a system variable that indicates if # the last command completed successfully. It would only complete # successfully if it finds "/dev/" in the df command's output. # set mpt = `df "/Volumes/$d" | grep /dev/` if ($status == 0) then echo Directory $d is a local disk device. # # Check to see if the directory is writable... # if (! -w .) then # # The directory is not writable, so it's probably a CD/DVD or # something else that we don't want to back up. # # Tell the user we're not backing it up. # echo Directory $d is not writable... echo Directory $d will NOT be backed up... echo " " else # # Directory is writable, so we back it up... # # Tell the user we're going to back it up. # echo Directory $d is writable... # # Now that we know we need to back up this directory, we need # to determine if it's a boot disk or not. If it's a boot disk, # we need to treat it special because it has a "/Volumes" # directory on it. If we try to back that up, we'll go into an # infinite loop until we eventually crash. # # An OS X boot disk will have the file: # /System/Library/CoreServices/BootX # On it. A non-boot-disk will not have this file on it. # if (-e "/Volumes/$d/System/Library/CoreServices/BootX") then # # We found the boot file, so we know this is a boot disk. # echo Directory $d is a system disk. # # Call rsync with the usual options, but tell it to exclude # the kinds of directories we find on a boot disk that we # should not or do not want to back up. # echo "Backing up this volume as an OS X system disk..." time rsync -c -r -l -H -p -o -g -D -t -z --bwlimit=8000 --rsync-path="/usr/local/bin/rsync" --eahfs --password-file="$workdir/$pwfile" --exclude-from="$workdir/$sysexclude" . "$rsyncserver/$rsyncmodule/$macname/$d/" echo "RsyncX completed execution." echo " " else if (-e "/Volumes/$d/System Folder/Finder") then # # If we see the /SystemFolder/Finder file, then this is an # OS 9 (Classic) system disk and we should back it up in a # slightly different manner to be sure we get everything. # echo "Backing up this volume as an OS 9 Classic boot disk..." time rsync -c -r -l -H -t --delete -z --eahfs --bwlimit=8000 --rsync-path="/usr/local/bin/rsync" --exclude-from="$workdir/$allexclude" --password-file="$workdir/$pwfile" . "$rsyncserver/$rsyncmodule/$macname/$d" echo "RsyncX completed execution." echo " " else # # The disk is not a boot disk, so we can back it up almost # completely without worrying about "/Volumes". We'll call # rsync with a less-thorough list of directories to exclude. # # Tell the user this disk isn't a system disk. # echo Directory $d is not a system disk. # # Call rsync # echo "Backing up this volume as a non-system disk..." time rsync -c -r -l -H -p -o -g -D -t -z --eahfs --bwlimit=8000 --rsync-path="/usr/local/bin/rsync" --password-file="$workdir/$pwfile" --exclude-from="$workdir/$allexclude" . "$rsyncserver/$rsyncmodule/$macname/$d/" echo "RsyncX completed execution." echo " " endif endif else # # If we drop down to this "else" then we've found a drive that # isn't local to the machine and therefore shouldn't be backed # up by the system. # echo Directory $d is a network directory and will not be backed up. endif
# # After dealing with a volume, drop back up to the "/Volumes" level. # cd /Volumes # # Done with the loop. # end echo " " # # At this point, we will have backed up every directory on the system # that isn't a read-only directory and isn't a network directory. # echo "Finished backing up the system..." date
Contents of rsyncallexc.txt:
#
# Directories that rsync should exclude from the backup
# of a boot disk on a MacOS X 10.3 system
#
/Trash
/.Trashes
/.Trash
/Cache
/Caches
#
# Exclude any restore disk images to minimize backup time.
# Images should be available on the server if we need them.
#
*.dmg
.DS_Store
.localized
*Read Me*
*.log
*_log
Contents of rsyncsysexc.txt:
#
# Directories that rsync should exclude from the backup
# of a boot disk on a MacOS X 10.3 system
#
/tmp/*
/proc
/Network/*
/var/vm/swapfile*
/Volumes/*
/cores/*
*/.Trash
/dev/*
/afs/*
/automount/*
/private/tmp/*
/private/var/run/*
/private/var/spool/postfix/*
#
# Exclude any restore disk images to minimize backup time.
# Images should be available on the server if we need them.
#
*_asr.dmg
*.dmg
#
# Exclude specific files
#
com.apple.ATS.System*.fcache
com.apple.ATSServer.FODB_*System
fontTablesAnnex*
Extensions.kextcache
.DS_Store
.localized
*Read Me*
*.log
*_log
*.cache
The file "rspw.txt" should be saved in the same directory as this script and contains a single line of text in it containing the password used by the RsyncX backup module on the server.