Beginner’s Guide to Backups

Backups are something most people never think about until it’s too late. Computers can be finicky, and if you value your digital data then you’ll want to have a backup solution in place. This post explores two aspects of backups: the various types of backups, and everyday tools for performing those backups.

Disclaimer: Parts of this guide include instructions that, if misused, could result in data loss. Never run a command without being 100% sure of the outcome!

The 3-2-1 Rule of Backups

The de facto rule of thumb for managing backups is the “3-2-1” rule. This rule is meant to cover all possible scenarios that could result in data loss. What exactly does it stand for? Let’s break it down:

  • 3 copies of the same data. Backups can become corrupt, and having multiple copies means you can use one copy to restore another in case of a failure.
  • 2 different storage mediums. Don’t just store your backups on one computer. If that computer breaks or gets stolen then all of your data goes with it. Store a separate copy on an external hard drive, an SD card, a DVD, or some other storage medium.
  • 1 copy stored in a separate location. As much as we don’t like to think about it, a fire or flood is all it takes to eradicate everything we own. Store at least one copy outside of your home or office, preferably with a third-party service.

Backup Types

There are three main types of backups: full, incremental, and differential.

A full backup creates a complete copy of a set of files. The copy contains all of the same data as the original, meaning you can use it exactly as you would the original. The downside is that full backups require twice the space as the original set. In other words, the space required is the size of original set multiplied by the number of full backups performed.

On the other hand, incremental backups only copy the files that have changed since the last full backup. This adds a bit of complexity, since you need to have a way of tracking which files have changed and which ones haven’t. Almost all incremental backup software will automatically track file changes between incremental backups. Incremental backups also vastly reduce the amount of storage space required, since each backup only copies the files that have been modified.

Differential backups are similar to incremental backups except they copy all files that have changed since the last full backup. For instance, if you perform a full backup on Monday and a differential backup on Tuesday, the differential backup will copy files modified between Monday and Tuesday. If you perform another differential backup on Wednesday, then that backup will copy files modified between Monday and Wednesday. Differential backups are more flexible than full backups, but not quite as space-friendly as incremental backups.

Backup Strategies

Now that we’ve gotten terminology out of the way, it’s time to look into practical strategies for backing up data. There are a wide variety of methods and utilities for backing up data, many of which seamlessly handle full and incremental backups. I’ve split these methods into four tiers  based on accessibility, ease of use, and completeness. For example, the first level is easily and widely available, but manages a small or highly specialized set of data. The last level, on the other hand, is much more difficult to use, but provides the greatest flexibility, control, and comprehensiveness.

1. Cloud Applications

Cloud applications often provide automatic and seamless backups. If you use Google Docs, for example, your documents are already backed up in Google’s cloud services. If you use iTunes or another online music management service, it doesn’t matter if your computer is lost or destroyed – you can simple restore your media from the service. Cloud applications take the headache out of managing backups, but often come at a cost in terms of limited functionality, reliance on a third-party service, or increased cost. They’re also not a true backup service, and the nature (or even the contents) of the data can change based on the provider.

2. File Synchronization

File synchronization services such as DropBox, iCloud, and OneDrive store individual files at a remote location. Many cloud storage providers offer generous amounts of free storage space, in some cases offering unlimited free storage when combined with cloud applications. The benefit of file synchronization services is that you can backup any kind of file on your computer whether it’s a document, a song, or a program. Many services also store historical copies of your files, allowing you to revert back to older versions if necessary.

ownCloud

ownCloud is a file synchronization service similar to DropBox. ownCloud performs file synchronization and sharing across multiple devices using a desktop client, but it can also maintain backup copies and file revisions. It can access data from multiple locations including external hard drives, network drives, and even external services such as Amazon S3 and DropBox. ownCloud is typically self-hosted, requiring you to manage storage yourself, but there are hosting providers who will manage your data for you. For more information, see part 5 of the private server guide.

BitTorrent Sync

BitTorrent Sync takes a unique approach to online file storage. Sync uses the BitTorrent protocol, which relies on a distributed network of devices to share data. When you install Sync on a device, that device becomes a peer. The device then shares data to other peers on the network. This way, each device contributes to the performance of the service: the more peers share a folder, the faster the folder is synchronized. The downside is that the speed of your backup depends entirely on the number of peers carrying your data. Each peer also needs to store a copy of the data, making full backups difficult.

3. Backup Services

Stepping up from file synchronization services, backup services provide a more comprehensive approach to backups. Not only can these services back up your personal files, but they can also be used back up your computer’s configuration files. In the event of a system failure, this can help you restore your computer to the way it was before the problem occurred.

CrashPlan

CrashPlan is a comprehensive backup solution for multiple computers. CrashPlan runs as a background service and automatically synchronizes files with a destination whether it’s a local folder, an external drive, a CrashPlan installation on another computer, or CrashPlan’s cloud storage. For the privacy minded, CrashPlan encrypts files before sending them to their remote destination, preventing anyone without the password from opening your files. A monthly subscription grants you unlimited storage and bandwidth for transferring backups to CrashPlan’s cloud, letting you access your files from anywhere with an Internet connection.

rsync

rsync is a versatile command-line utility for copying files to a local directory or a remote destination. It’s designed to minimize the size of file transfers by using a unique algorithm which only sends the parts of a file that have changed. rsync is commonly used as a backend for other backup services. It can even be used to perform full system backups as an alternative to disk cloning. DigitalOcean provides a guide to synchronizing local and remote folders using rsync.

4. Disk Cloning

As the name implies, disk cloning makes an identical clone of your computer’s hard drive. Disk cloning is the most comprehensive form of backing up, but it’s also the most costly in terms of resources.

With disk cloning, each individual bit on a storage device is copied to a separate location, resulting in a completely identical copy. The benefit is that you can restore the drive to its exact state at the time of the backup. The drawback is that the process is slow, requires a drive that’s at least as large as the original drive, and requires you to either boot the target computer from another medium, such as a CD or USB drive, or move the original drive to another computer.

Parted Magic

Parted Magic contains a suite of applications for managing disks. Parted Magic runs as a stand-alone environment, allowing you to view and manage storage devices without having to boot into your computer’s native operating system. For disk cloning, Parted Magic provides Clonezilla for cloning drives (a more comprehensive intro to Clonezilla is available here).

dd

dd is a simple but powerful command-line tool for copying data from a source to a destination. dd can be used to copy files, folders, partitions, or entire drives. For example, if we wanted to clone one hard drive (listed as /dev/sda on a Linux machine) to another hard drive (listed as /dev/sdb), we could use the following command:

# dd if=/dev/sda of=/dev/sdb bs=1M conv=sync,noerror
  • if=/dev/sda specifies that the input “file” is the device at /dev/sda, which in this case the original drive.
  • of=/dev/sdb specifies that the output “file” is the device at /dev/sdb, which in this case is the backup drive.
  • bs=1M specifies the block size, or the number of bytes that are copied at a time.
  • conv=sync,noerror prevents dd from halting in case it encounters an error.

dd can take hours to back up several hundred gigabytes, and any changes made to the drive during that time can corrupt the final backup. You can avoid this by using a live environment to boot your computer from a CD, DVD or USB stick. Be extremely careful when entering commands with dd, since you may end up erasing the data you intended to back up!

For a more in-depth tutorial on using dd, see the Disk Cloning page in the Arch Linux Wiki.

In Case of Emergency…

Having a backup solution provides excellent peace of mind, but what about when it’s time to put it to the test? One of the worst feelings in the tech world is putting tons of resources into backing up your data, only to discover that your backups are incomplete or corrupt. Before you finalize your backup solution, run some restoration tests to ensure your data is being successfully backed up. This could mean a few hours – or even a few days – of testing, but it beats losing years of data to a faulty drive.

If you’ve lost data without backing up first, you’re not totally out of luck. If the data is valuable, your first step should be to shut off the computer and find a professional. Failing that, you can try extracting the deleted files yourself. First, use one of the disk cloning utilities mentioned above to create a copy of the drive. Work off of the copy to prevent an accidental overwrite the original data. You can use a tool such as SystemRescueCD to try extracting deleted files from the copy. For more information, see the Data Recovery page in the Ubuntu Wiki.

In the event of a catastrophic failure, don’t panic! As long as you follow the 3-2-1 rule, you’ll always have a copy of your data available somewhere. Get your device up and running again, fetch your nearest backup, perform a restoration, and restart your backup regimen. You’ll thank yourself later!

Share your thoughts

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s