Index
Historically…
How can PC users be motivated in protecting their valuable data?
Data Storage Models
Unstructured
Full System
Full + Incremental/Differential
Continuous Data Protection
Storage Media
Magnetic Tape
Hard Disk
Optical Disk
Solid State Storage
Remote Storage
General Advice and Conclusion
Backup Glossary
Historically…
Jacquard card from a rug-making loom, circa 1810
At the beginning, electronic computers were primarily bulky number crunchers for military and intelligence applications that required initial instructions loaded by hand and a set of variables that will be processed until the end of the program. Evolving from technology that existed since the first quarter of the 1700’s, punch cards became in the 19th Century indispensable for various applications like in the textile loom technology, for control, and the 1890 U.S. Census, for data storage. The nascent Computing Tabulating Recording Corporation (CTR) – later renamed to International Business Machines (IBM), quickly adapted this technology to mechanical and electronic processing units. From the moment data was punched on cardboard cards, the need for backup duplication was understood and addressed. As the properties of paper, at the beginning, and magnetic media, for storage of digital information, were ephemeral and fragile at best, the only reliable method to store information for archival and retrieval was in analog microfilm and microfiche. This conversion into miniaturized optical analog format came at the huge expense of re-digitizing the data in case of an emergency restore procedure.
1-inch magnetic data tape machines (1971)
In the 1970’s punched cards were replaced by magnetic tape, which by that time had evolved and matured to be considered a reliable medium. Governments and the financial sector installed huge data centres where kilometres of tape spooled every night with data streamed from their mainframe computers. At the consumer level, smaller tape units were created that will be connected to mini and micro computers, providing easy backup facilities and immediate access to data stored in personal tape libraries.
Personal computers (PCs) started with floppy disks as their main storage and control units. Data storage requirements for PCs was not critical as the user’s data contained only text and very efficiently packed information due to limitations imposed by their word processors and spreadsheet applications. Hard drives were introduced and better controllers allowed the migration of some tape units to this new platform. For businesses, tape drives were an indispensable requirement as the cost of their data was far superior to the cost of a new computer and labour.
Tapes were great not only for backups but also for transferring information to other systems. Think of a parts inventory that had to be distributed to various branches nationwide.
Higher capacity ZIP drives became the medium of choice for several years for their price and re-usability – in contrast to higher capacity optical media like CD-ROMs, WORM disks. CD-RW format, despite the blend of high capacity and re-usability, never has been widely accepted due to the incompatible implementations that limited its use to similar drives and supporting software.
Dissected hard drive
Today, the backup technology industry has expanded embracing affordable external hard drives and online backup solutions.
It has become so easy for us to keep our data safe; however, are we really backing up?
Mirroring our hard disks is a very simple approach. It is a good step in the right direction for recovering from a catastrophic event; there are, however, issues that this solution cannot address, like:
- I need a previous version of my spreadsheet…
- My office flooded, the external drive is now an expensive doorstopper… AND, my computer is not turning on!
- I would like my data current and accessible from my various computers…
- A virus has taken control over my PC; I need a clean image of my hard disk made last week!
- I would like an easy backup that works all the time, transparently and unattended.
To deal with these issues, a combination of solutions may be essential and call for a plan derived from specific needs and requirements. Fortunately, access to cost effective storage and affordable broadband plans, bring “big-industry” solutions to the PC user.
A 2009 survey carried over 4200 computer users covering 129 countries reveals that:
- 82% of home PC users do not perform regular backups
- 66% have lost pictures and files; 42% within the last year
- 71% are most worried about loosing their pictures
How can PC users be motivated in protecting their valuable data?
The clear answer to this question is in promoting an understanding of the technology, not only in a generic way but in the context of the needs and expectations of the end user. Apple, looking at the problem from the consumer point of view, in Mac OS X takes a blanket approach with Time Machine. Just connect an external high-capacity drive, or configure a server as a repository, and the operating system does the rest. The way it is designed makes it possible to restore the whole system, multiple files or an individual file.
There are many different solutions for Windows based computers. Some use storage devices directly attached to the computer while others backup the data to remote servers located in secure data centres. The backup strategy has to meet the particular user requirements, especially when it deals with client information and data retention by law.
When searching for backup solutions, many suppliers discourage us by using complicated jargon that drives us away; others leave many important questions unanswered, particularly when dealing with privacy and confidentiality.
To approach the backup problem, a few concepts are important to understand:
Data Storage Models
All backup storage starts with the concept of a data repository model. It should be in some storage medium and it should be organized to certain degree. If it is a simple drive duplication, the file system contains the structure information, for other media like magnetic tape or online solutions is a combination of a local database containing an index of files and their location in the physical tape or as a reference to a server in the Internet. There are several data storage models:
Unstructured
Like a collection of CD/DVD disks or flash drives with minimal information about the backup. Data is very difficult to recover.
Full System
It is a mirror of the local hard drive and its function is to restore the computer to the exact state when the full system backup was made. Effective; however, data lost is expected. In relationship to the amount of data, it can take a long time to perform.
Full + Incremental/Differential
It starts with a full system backup. Periodically, the backup software analyses the hard drive and stores, in separate targets, the changes made to the system. The initial full backup can take a long time; subsequent incremental images can be very fast; of course, depending on the number of changes from the last procedure. Restoring involves applying the full backup, followed by the incremental/differential snapshots. The system can be restored up to the last snapshot, reducing the chance of data loss.
Continuous Data Protection
Continuous backups run all the time the computer is in running. The operating system sends messages to the backup program when a file has been added or changed; in collaboration, both participants make a copy of the file (even if it is opened by an application) and gets stored as another version in the backup medium. This strategy takes advantage of advanced operating systems, faster computers and networks. It is the preferred method for online backup systems. The initial image can take a long time to be created; however, it offers the most up-to-date version of all files.
Storage Media
Independently of the Storage Model, there has to be a storage media where the data is set for safekeeping. This is a quick list of the most commonly used media:
Magnetic Tape
Magnetic tape, as one of the oldest media formats, its reliability and vast implementations, is one of the most commonly used medium for data storage, backup and archival. Tape is very slow for random access (looking for a particular file, for example) but it can be even faster than hard drives for sequential read/write operations, which makes it ideal for full backup and restore procedures. Large data centres rely heavily in tape drives that are handled by robotic arms that label, index, load and unload tapes automatically.
Hard Disk
Improvements in pricing, manufacturing and engineering have improved hard drive capacity year after year. Currently, many backup operations use hard drives in one way or another. Its extremely fast access to any sector and data segment makes it ideal for the handling of individual files. Mass manufacturing and commoditization of hard drive technology have brought a wide variety of configurations and features to the consumer. External hard drives are becoming a popular medium of backup storage; however, its use is mainly for full backup methods that limit the success of recovery in case of virus infection and corrupted data.
Optical Disk
CD-ROM, CD-RW, WORM, DVD and Blu-Ray discs are the most popular examples of this media type. Optical Disk technology have become a less popular choice for automated backups as the amount of data the typical consumer has increases every day. Digital cameras and high capacity iPods are filling in hard drives rapidly. The only exception is Blu-Ray disks, which can still be considered a good backup storage option for its high capacity. Another important aspect to take into account is the not-well understood lifespan of optical media; originally thought to last hundreds of years, in the real world, it has proven otherwise.
Solid State Storage
Solid State media is represented by flash drives, Compact Flash (CF), Secure Digital cards (SD), Memory Stick, etc. Its ease of use and small footprint is ideal for the storage of individual files and it has been used primarily in digital cameras, mp3 players and netbooks. In very particular applications can be used safely as backup units.
Remote Storage
High speed access to the Internet is more accessible for most consumers and it has opened the doors for the backup industry to expand its services to most small businesses and home users. Originally, data channels using dedicated lines linked enterprises like banks and insurance companies to backup data centres; nowadays, a wide variety of high speed options allow users to have the same level of connectivity from their offices or homes. Remote storage is becoming the most popular medium for backup technologies; by installing dedicated small applications, systems are performing real-time backups (Live Data) and keeping the user always safe. Successful implementations will address issues like privacy, encryption and data access from outside the original source system. Lately, it has spawn discussions that go beyond national borders where privacy laws differ and access over the data has to be off-limits for foreign governments.
General Advice and Conclusion
- The more important the data is that is stored on the computer, the greater is the need for backing up this data.
- A backup is only as useful as its associated restore strategy. For critical systems and data, the restoration process must be tested.
- Storing the copy near the original is unwise, since many disasters such as fire, flood, theft, and electrical surges are likely to cause damage to the backup at the same time. In these cases, both the original and the backup medium are likely to be lost.
- Automated backup and scheduling should be considered, as manual backups can be affected by human error.
- Backups will fail for a wide variety of reasons. A verification or monitoring strategy is an important part of a successful backup plan.
- Multiple backups on different media, stored in different locations, must be used for all critical information.
- Backed up archives should be stored in open and standard formats, especially when the goal is long-term archiving. Recovery software and processes may have changed, and software may not be available to restore data saved in proprietary formats.
- If you already have a tape backup system, a second backup program may be necessary, do an additional backup to the external hard disk with an automatic backup program, you will have the double data security, and it is easy to check the backed up files in the external hard disk.
- Perform a 5 level rotation to keep data safe from corruption and virus infection.
- Secure-erase and physically destroy discarded backup media.
Backup Glossary
backup policy - an organisation’s procedures and rules for ensuring that adequate amounts and types of backups are made, including suitably frequent testing of the process for restoring the original production system from the backup copies.
backup rotation scheme - a method for effectively backing up data where multiple media are systematically moved from storage to usage in the backup process and back to storage. There are several different schemes. Each takes a different approach to balance the need for a long retention period with frequently backing up changes. Some schemes are more complicated than others.
backup site - a place where business can continue after a data loss event. Such a site may have ready access to the backups or possibly even a continuously updated mirror.
backup software - computer software applications that are used for performing the backing up of data, i.e., the systematic generation of backup copies.
backup window - the period of time that a system is available to perform a backup procedure.
Backup procedures can have detrimental effects to system and network performance, sometimes requiring the primary use of the system to be suspended. These effects can be mitigated by arranging a backup window with the users or owners of the system(s).
copy backup - backs up the selected files, but does not mark the files as backed up (reset the archive bit). This is found in the backup with Windows 2003.
data salvage - the process of recovering data from storage devices when the normal operational methods are impossible. This process is typically performed by specialists in controlled environments with special tools. For example, a crashed hard disk may still have data on it even though it doesn’t work properly. A data salvage specialist might be able to recover much of the original data by opening it up in a clean room and tinkering with the internal parts.
differential backup - a cumulative backup of all changes made since the last full backup. The advantage to this is the quicker recovery time, requiring only a full backup and the latest differential backup to restore the system. The disadvantage is that for each day elapsed since the last full backup, more data needs to be backed up, especially if a majority of the data has been changed.
disaster recovery - the process of recovering after a business disaster and restoring or recreating data. One of the main purposes of creating backups is to facilitate a successful disaster recovery. For maximum effectiveness, this process should be planned in advance and audited.
disk image - a method of backing up a whole disk or filesystem in a single image. Since the underlying data structures are what is actually backed up, this method does not allow for file level control over what is selected for backup or restore.
full backup – a backup of all (selected) files on the system. In contrast to a drive image, this does not included the file allocation tables, partition structure and boot sectors.
hot backup - a backup of a database that is still running, and so changes may be made to the data while it is being backed up. Some database engines keep a record of all entries changed, including the complete new value. This can be used to resolve changes made during the backup.
incremental backup – a backup that only contains the files that have changed since the most recent backup (either full or incremental). The advantage of this is quicker backup times, as only changed files need to be saved. The disadvantage is longer recovery times, as the latest full backup, and all incremental backups up to the date of data loss need to be restored.
media spanning - sometimes a backup job is larger than a single destination storage medium. In this case, the job must be broken up into fragments that can be distributed across multiple storage media.
multiplexing - the practice of combining multiple backup data streams into a single stream that can be written to a single storage device. For example, backing up 4 PCs to a single tape drive at once.
multistreaming - the practice of creating multiple backup data streams from a single system to multiple storage devices. For example, backing up a single database to 4 tape drives at once.
near store - provisionally backing up data to a local staging backup device, possibly for later archival backup to a remote store device.
open file backup - the ability to back up a file while it is in use by another application. See File locking.
remote store - backing up data to an offsite permanent backup facility, either directly from the live data source or else from an intermediate near store device.
restore time - the amount of time required to bring a desired data set back from the backup media.
retention time - the amount of time in which a given set of data will remain available for restore. Some backup products rely on daily copies of data and measure retention in terms of days. Others retain a number of copies of data changes regardless of the amount of time.
site-to-site backup - backup, over the internet, to an offsite location under the user’s control. Similar to remote backup except that the owner of the data maintains control of the storage location.
tape library - a storage device which contains tape drives, slots to hold tape cartridges, a barcode reader to identify tape cartridges and an automated method for physically moving tapes within the device. These devices can store immense amounts of data.
trusted paper key - a machine-readable print of a cryptographic key.
virtual Tape Library (VTL) - a storage device that appears to be a tape library to backup software, but actually stores data by some other means. A VTL can be configured as a temporary storage location before data is actually sent to real tapes or it can be the final storage location itself.