Monday, January 02, 2006

Simple Backups with rdiff-backup

I admit I am not vigilant enough when it comes to backing up personal files. For the new year I decided to at least keep a copy of important directories on a backup server, and keep those copies fairly fresh. rsync initially seemed attractive, thanks to its ability to transfer only incremental changes to files. Upon further reflection it didn't meet my needs. For example, if I delete a file from a directory and then rsync it, the file will be deleted from the backup server.

Today I learned of rdiff-backup, in the ports tree as sysutils/rdiff-backup. This program makes incremental backups of files, but it also supports restoring old versions of files and even deleted files.

In the following example, I want to back up selected directories in my /data partition on laptop orr. Here are the contents of /data:

orr:/data$ ls
Recycled iso misc vmware zip
code lpc tgz work
documents media tmp writing

I invoke rdiff-backup by telling it to exclude certain directories, include everything else in /data, and send backups to server janney in the /home/richard/rdiff-backup_orr directory:

orr:/data$ rdiff-backup --exclude /data/Recycled/ --exclude /data/iso/ --exclude /data/media/audio/
--exclude /data/media/video/ --exclude /data/misc --exclude /data/tgz --exclude /data/tmp/
--exclude /data/vmware --exclude /data/zip /data janney::/home/richard/rdiff-backup_orr

I get a password prompt, which is an indication that rdiff-backup is using SSH to copy files from orr to janney.

When this is done, I have the following on janney:

janney:/home/richard/rdiff-backup_orr$ ls
code media writing
documents rdiff-backup-data
lpc work

Those are the directories I wanted backed up. A look into the rdiff-backup-data directory gives some insight into rddiff-backup's workings.

janney:/home/richard/rdiff-backup_orr$ ls -al rdiff-backup-data/
total 94
drwx------ 3 richard richard 512 Jan 2 13:36 .
drwxrwxrwx 9 richard richard 512 Jan 1 1980 ..
-rw------- 1 richard richard 0 Jan 2 11:34 backup.log
-rw------- 1 richard richard 0 Jan 2 11:34 chars_to_quote
-rw------- 1 richard richard 10 Jan 2 13:36 current_mirror.2006-01-02T11:34:22-05:00.data
-rw------- 1 richard richard 110 Jan 2 13:36 error_log.2006-01-02T11:34:22-05:00.data.gz
-rw------- 1 richard richard 34536 Jan 2 13:36 file_statistics.2006-01-02T11:34:22-05:00.data.gz
drwx------ 2 richard richard 512 Jan 2 11:34 increments
-rw------- 1 richard richard 48537 Jan 2 13:36 mirror_metadata.2006-01-02T11:34:22-05:00.snapshot.gz
-rw------- 1 richard richard 516 Jan 2 13:36 session_statistics.2006-01-02T11:34:22-05:00.data

The session.statistics file has information on the last backup.

janney:/home/richard/rdiff-backup_orr/rdiff-backup-data$ cat session_statistics.2006-01-02T11:34:22-05:00.data
StartTime 1136219662.00 (Mon Jan 2 11:34:22 2006)
EndTime 1136226986.54 (Mon Jan 2 13:36:26 2006)
ElapsedTime 7324.54 (2 hours 2 minutes 4.54 seconds)
SourceFiles 3023
SourceFileSize 3312777703 (3.09 GB)
MirrorFiles 1
MirrorFileSize 0 (0 bytes)
NewFiles 3022
NewFileSize 3312777703 (3.09 GB)
DeletedFiles 0
DeletedFileSize 0 (0 bytes)
ChangedFiles 1
ChangedSourceSize 0 (0 bytes)
ChangedMirrorSize 0 (0 bytes)
IncrementFiles 0
IncrementFileSize 0 (0 bytes)
TotalDestinationSizeChange 3312777703 (3.09 GB)
Errors 0

Let's say I now return to laptop orr. These are the contents of the lpc directory.

orr:/data/lpc$ ls
aggtap-hub.lpc httprint.lpc
bgp-dos.taosecurity.lpc ldp-dos.taosecurity.lpc

For purposes of demonstration, I delete the httprint.lpc file.

orr:/data/lpc$ rm httprint.lpc
orr:/data/lpc$ ls
aggtap-hub.lpc bgp-dos.taosecurity.lpc ldp-dos.taosecurity.lpc

I then invoke the same command line to run rdiff-backup (shown previously). It only takes rdiff-backup about 30 seconds to realize only one change has been made to orr -- the delete of /data/lpc/httprint.lpc. A look in the backup directory on janney shows httprint.lpc has disappeared.

janney:/home/richard/rdiff-backup_orr$ ls lpc/
aggtap-hub.lpc bgp-dos.taosecurity.lpc ldp-dos.taosecurity.lpc

A look at the new rdiff-backup statistics show how fast the update happened.

janney:/home/richard/rdiff-backup_orr/rdiff-backup-data$ cat session_statistics.2006-01-02T13:41:37-05:00.data
StartTime 1136227297.00 (Mon Jan 2 13:41:37 2006)
EndTime 1136227314.64 (Mon Jan 2 13:41:54 2006)
ElapsedTime 17.64 (17.64 seconds)
SourceFiles 3022
SourceFileSize 3312723832 (3.09 GB)
MirrorFiles 3023
MirrorFileSize 3312777703 (3.09 GB)
NewFiles 0
NewFileSize 0 (0 bytes)
DeletedFiles 1
DeletedFileSize 53871 (52.6 KB)
ChangedFiles 2
ChangedSourceSize 0 (0 bytes)
ChangedMirrorSize 0 (0 bytes)
IncrementFiles 3
IncrementFileSize 8978 (8.77 KB)
TotalDestinationSizeChange -44893 (-43.8 KB)
Errors 0

Can I recover httprint.lpc? Yes, the file remains in the increments directory kept by rdiff-backup.

janney:/home/richard/rdiff-backup_orr/rdiff-backup-data/increments/lpc$ ls -al
total 14
drwx------ 2 richard richard 512 Jan 2 13:41 .
drwx------ 3 richard richard 512 Jan 2 13:41 ..
-rwxrwxrwx 1 richard richard 8978 Dec 29 16:22 httprint.lpc.2006-01-02T11:34:22-05:00.snapshot.gz

This means I can restore it. I tell rdiff-backup to look for httprint.lpc from the following day, which for my purposes will find the file I need.

orr:/data/lpc$ rdiff-backup -r 1D janney::/home/richard/rdiff-backup_orr/lpc/httprint.lpc /tmp/httprint.lpc

That's it. When finished, the file is restored.

orr:/data/lpc$ ls -al /tmp/httprint.lpc
-rwxrwxrwx 1 richard wheel 53871 Dec 29 16:22 /tmp/httprint.lpc
orr:/data/lpc$ file /tmp/httprint.lpc
/tmp/httprint.lpc: tcpdump capture file (little-endian) - version 2.4 (Ethernet, capture length 1515)

I would note that I had trouble restoring the file to the original location. Therefore, I put it in /tmp in this demonstration.

rdiff-backup can be run where the backup server polls the laptop, as well.

This is only scratching the surface of the backup issue, but I now have a simple yet thorough backup solution to run at the end of every day.

6 comments:

rkt said...

I had written about something similar using "cp" and "rsync" commands.

Its interesting how complex backup and recovery could be.

Jonathan said...

I use rsnapshot, a Perl wrapper around rsync. It is very easy to configure and uses hardlinks and scheduled backups for filesystem snapshots.

I've written a tutorial here.

Anonymous said...

It's interesting that people seem to prefer rsync and such, compared to old school rdump, find/cpio, etc.

Richard Bejtlich said...

Anonymous,

It's the ability to do incremental backups securely over the network that I like. I couldn't imagine doing it another way.

Anonymous said...

Richard:

If you want an incremental backup done securely, you can simply do a level 1 dump piped into ssh. No imagination needed (that's what I use it!). Similarly for the find/cpio combination. I think the Solaris man page for cpio even has an example.

Dominic White said...

I've been using rdiff-backup for the last couple of months for my thesis. It's flexibility has allowed for me to recover from all sorts of slip ups.