September 2
2008
In this article i’m going to explain a recently backup plan i’ve implemented for my company.
The backup plan refers to a single Linux client machine which sends backup data to a backup server, but it can be straightly applied to an arbitrary number of client machines. After a quite extensive search over the net i’ve decided to implement the backup plan using dar - Disk ARchive to actually make the backups and rsync to ship data to the backup storage remote server.
I’ve chosen dar as backup software because i’ve found it enough simple to deploy and at the same time plenty of features; and i’ve chosen rsync to transfer data because of it smart file transfer: rsync uses the "rsync algorithm" which provides a very fast method for bringing remote files into sync. It does this by sending just the differences in the files across the link, without requiring that both sets of files are present at one of the ends of the link beforehand.
The backup strategy
dar is executed:
- every night for differential backups;
- on the first day of the month to make a full backup;
After successful backup rsync is invoked to transfer data to the backups storage server.
Sysadmin notifications: backups scheduling is done by using cron, so i’m using the cron built-in feature which sends the output of executed commands via mail.
Requirements
Client machine:
- Linux o.s.
- dar - Disk ARchive (features) - a shell command that backs up directory trees and files. It has been tested under Linux, Windows, Solaris, FreeBSD, NetBSD, MacOS X and several other systems, it is released under the GNU General Public License (GPL).
- rsync (features) - an open source utility that provides fast incremental file transfer. rsync is freely available under the GNU General Public License and is currently being maintained by Wayne Davison.
Server machine:
- Linux o.s. (or Windows o.s. using cygwin)
- rsync installed as server
Client machine(s) configuration
Dar installation
I’ve installed dar by building it from source, but various packages are available for Redhat, Suze, Gentoo, Ubuntu. If you are installing from the source, pay attention to the configure script output, to see if it claims a "Libbz2 compression (bzip2) : YES": in case it says NO, then you need to install the bzip2-devel package to enable the bizp2 compression.
To install from source:
wget <dar_source_package_url> tar zxvf dar-<release_version>.tar.gz cd dar-<release_version> ./configure [--enable-mode=64] make make install-strip
Rsync installation
Rsync should be already installed on you Linux system, if not you can install it with yum or apt-get (for example yum install rsync).
Dar configuration
Dar has many options for its configuration, in this article i’m going to explain only the ones that i’ve used (that should be the common ones); check the complete documentation and the good mini-howto for further details and explanations.
The main dar command line that i’ve used (running as root) is:
dar -m 256 -y -s 600M -D -R / -c `date -I`_data \ -Z "*.gz" -Z "*.zip" .... \ -X "<file_exclusion_pattern_1>" \ -X ... \ -g <include_dir_1> \ -g ... \ -P <exclude_dir_1> \ -P ... \ [-A previous_backup]
As you can see, the command line is splitted into several lines for easy reading and editing; this is possible by terminating each line with a ‘\’ char. This way you can add or remove file exclusions lines, include/exclude paths lines, etc. by simply adding or removing lines in that showed syntax.
Let’s see the various switches meaning:
- -m 256
-
Files lesser of 256 bytes are not compressed (by default files with 100 bytes or less won’t be compressed).
-
-
- -y [level]
-
This option activates Bzip2 archive compression, which by default is turned off. You can even specify a numeric compression level, which goes from 0 (no compression) to 9 (best compression, slow processing). Bzip2 by default uses 6, which is the best speed/compression ratio for most files. I don’t specify compression level, 6 is fine for me.
-
- -s 600M
-
Here comes DAR’s slice feature. The specified size of 600 Megabytes is the maximum file size DAR will create. If your backup is bigger, you will end up with different backup files each with a slice number before the file extension, so you can save each file to a different unit of your backup media (floppies, zip, CDROM, etc).
-
- -D
-
Stores directories excluded by the -P option or absent from the command line path list as empty directories. This is helpful when you are recovering a backup from scratch, so you don’t have to create manually all the excluded directories.
-
- -R /
-
Specifies the root directory for saving or restoring files. By default this points to the current working directory. We are doing a system backup here, so it will be the root directory.
-
- -c `date -I`_data
-
This mandatory switch means to create a backup archive. `date -I` will provide a date as YYYY-MM-DD format. This way you can create backup archives with the creation date embedded in the name.
-
- -Z file_pattern
-
Using normal file name globing you can specify patterns of files you want to store in your archive without compression. This only has sense if you use the -y switch. Compressing compressed files only yields bigger files and wasted CPU time.
-
- -X mask
-
The mask is a string with wild cards (like * and ?) which is applied to filenames which are not directories. If a given file matches the mask, it is excluded from the operation. By default (no -X on the command line), no file is excluded from the operation. -X may be present several times on the command line, in that case a file will not be considered for the given operation if it matches at least one -X mask
-
- -g path
-
Files or directory to only take in account. -g may be present several time on command-line. By default all files under the -R directory are considered. Else, if one or more -g option is given, just those are selected (if they do not match any -P option). All paths given this way must be relative to the -R directory. This is equivalent as giving <path> out of any option. Warning, -g option cannot receive wild-cards, theses would not be interpreted.
-
- -P relative_path
-
With this switch you tell DAR which paths you don’t want to store in your backup archive. Note that the paths you specify must be relative to the path specified by the -R switch.
-
- -A base_name
-
specifies the archive to use as reference (mandatory with -C). By default (default is only possible with -c option) no archive is used and all files are saved. This switch is used to make a differential backup.
Basically the command lines for making a full backup and a differential backup differs only for the final -A switch.
For example, to make a full backup, you can use:
/usr/local/bin/dar -m 256 -y -s 600M -D -R / -c backup_data \ -g dir1/ \ -g dir2/
this will result in various backup_data.xxx.dar archives (where xxx is the slice number); usually you will get only the backup_data.1.dar
To make instead a differential backup, you can use:
/usr/local/bin/dar -m 256 -y -s 600M -D -R / -c backup_diff \ -g dir1/ \ -g dir2/ \ -A backup_data
this will make a differential backup by using the backup_data archive as reference (note the excluded xxx.dar), resulting in various backup_diff.xxx.dar archives.
After running regular backups, you backups folder will be popolated with the dar archive files:
# ls -al total 709884 drwxr-xr-x 2 root root 4096 Sep 2 01:15 . drwxr-xr-x 29 root root 4096 Aug 28 17:48 .. -rw-r--r-- 1 root root 344959943 Aug 29 18:25 2008-08-29_data.1.dar # <-- full backup -rw-r--r-- 1 root root 3344515 Aug 29 18:59 2008-08-29_diff.1.dar # <-- differential backup -rw-r--r-- 1 root root 3608510 Aug 30 01:15 2008-08-30_diff.1.dar # <-- differential backup -rw-r--r-- 1 root root 14285626 Aug 31 15:20 2008-08-31_diff.1.dar # <-- differential backup -rw-r--r-- 1 root root 345299771 Sep 1 01:23 2008-09-01_data.1.dar # <-- full backup -rw-r--r-- 1 root root 14666260 Sep 2 01:15 2008-09-02_diff.1.dar # <-- differential backup
Rsync configuration
Once you have created your backups, it is recommended to put them on a remote backup server, and in this context comes in play rsync; the client machine configuration is limited to the single command to actually do the job; the command line is quite simple:
rsync --verbose --progress --stats --compress \
--recursive --times --perms --links --delete \
--password-file <password_file> \
<backups_directory> <user>@<remote_host>::<module_name>
Let’s see the various switches meaning:
- –verbose
- Turn on verbose output
- –progress
- show progress during transfer
- –stats
- give some file-transfer stats
- –compress
- With this option, rsync compresses the file data as it is sent to the destination machine, which reduces the amount of data being transmitted.
- –recursive
-
recurse into directories
-
- –times
-
preserve modification times
-
- –perms
-
preserve permissions
-
- –links
-
copy symlinks as symlinks
-
- –delete
-
delete extraneous files from dest dirs
-
- –password-file <password_file>
-
This option allows you to provide a password in a file for accessing an rsync daemon. The file must not be world read- able. It should contain just the password as a single line. For example:
#su #echo MyPassword > secretFile #chmod 600 secretFile
-
- <backups_directory>
-
specifies the directory that contains the files to be transferred, for example /backups/*
-
- <user>@<remote_host>::<module_name>
-
specifies the user name, the remote host and the rsync server module name to use, for example john@www.mybackupserver.com::backups_storage
-
Here is a sample output from rsync:
building file list ...
0 files...
6 files to consider
2008-09-02_diff.1.dar
32768 0% 0.00kB/s 0:00:00
849356 5% 791.12kB/s 0:00:17
1493682 10% 707.68kB/s 0:00:18
2116274 14% 670.40kB/s 0:00:18
2804402 18% 666.67kB/s 0:00:17
3492530 23% 627.42kB/s 0:00:17
14776414 100% 2.42MB/s 0:00:05 (xfer#1, to-check=0/6)
Number of files: 6
Number of files transferred: 1
Total file size: 726274779 bytes
Total transferred file size: 14776414 bytes
Literal data: 3771070 bytes
Matched data: 11005344 bytes
File list size: 156
File list generation time: 0.001 seconds
File list transfer time: 0.000 seconds
Total bytes sent: 3731735
Total bytes received: 23000
sent 3731735 bytes received 23000 bytes 326498.70 bytes/sec
total size is 726274779 speedup is 193.43










(1 votes, average: 4.00 out of 5)
Leave a Reply