Synthetic backups are created from a full backup of a file and subsequent incremental backups. With an incremental backup only changed information in the file is backed up. So a file's backup could look like this:-

- Full backup - 10-January 2005 - 1000 blocks
- Incremental backup - 12 January 2005 - 100 blocks
- Incremental backup - 15 January 2005 - 120 blocks

A synthetic backup is needed if the backed up file needs copying as a single backup entity, or if it needs restoring. The contents of a synthetic backup differ depending upon the date chosen. In our example two synthetic backups could be produced: one for 12 January 2005; and one for 15 January 2005. The data they contain would be different; the 12 January 2005 synthetic backup would not contain any changes to the file subsequent to that, whereas the 15 January 2005 synthetic backup would.

Backup file meta data
Synthetic backups don't work unless incremental backup data is timed and dated and the file locations they refer to are known.

For example, the 12 Jan 2005 incremental backup could refer to paragraph 5 in a 200 paragraph Word document. It also has a new paragraph 6 making the original paragraph 6 now paragraph 7 and so on throughout the document.

The 15 Jan 2005 incremental backup could refer to paragraph 5 again and also paragraph 7. The synthetic backup wil only work if it has, in effect, a map of the components of the original backup file and keeps track of how they are altered. For example, the 15 Jan 2005 synthetic backup would need to know that paragraph 7 refered to paragraph 7 in the 12 Jan 2005 version of the document and not the 10 Jan 2005 version. The synthtic backup application needs to create and maintain metadata for the backups it deals with so as to create fully accurate synthetic backups.

What can be done is to create a fresh full backup periodically by combining the current full backup file and subsequent incremental backups. The two kinds of files are merged by the backup application to create a new full but synthetic backup. It is synthetic because it has not been copied directly from the original data.

Where is the synthetic backup run?
Synthetic backups should be run on a media server dedicated to the job of front-ending a backup and restore type set of media resources. If it is carried out on general application servers then they can lose significant numbers of cycles and consume significant network bandwidth performing a synthetic backup run.

Why run a synthetic backup scheme?
This question really means why do incremental backups? There are two reasons. One is to reduce the resources needed for a backup. A full backup wil require a set of server cpu cycles, some network bandwidth and backup media space. Each time you do a full backup you would need the same level of these three resources.

By running an incremental backup you need only a fraction of the server CPU cycles, network bandwidth and additional backup media space. That reduces the backup window, the time needed to run the backup, considerably.

But it increases the time needed for a restore because now a restore would need to return to the user the original backup file and all subsequent incremental backup files, and then the restored file needs re-building to merge in each incremental backup file's data. It takes more time, uses more server cycles and network bandwidth.

By consolidating the original backup file and incremental backup files into a single synthetic backup then restore speed is faster because just one file now needs streaming to the user. Also the server CPU cycles needed for the restore are cut back.

Another benefit is disaster recovery. If you wish to store full backup files in an offsite vault then storing synthetic fulls lets you keep the backup window advantages of incremental backups and not suffer the disadvantage of user server system-based merging of full and incremental backups when a restore from the vault is needed.

Most mainstream backup application suppliers offer synthetic backup facilities: Veritas; Legato; IBM; and CA. Sepaton, a virtual tape system supplier (Sepaton = 'no tapes' backwards) uses a point-in-time snapshot as its source file. It could take a point-in-time snapshot of a Veritas NetBackup file and, because it 'understands' that file's format, can build a set of pointers to create a new synthetic full backup. This is done in a minute or two rather than hours needed for a tape-based synthetic backup to be created.

However, there may be limitations. A backup application may have limitis it imposes on the number or size of incremental backups it supports. Also incremental backup schemes may not be generally efficacious on files which have a high change rate; over 10 per cent say. It's always a good idea to check with your backup vendor for such restrictons.