0% found this document useful (0 votes)
90 views

Backup

The document discusses various aspects of backups including purposes, types of storage media, managing data repositories, selecting data, and remote backup services. It explains that backups have two main purposes - to recover data after loss or corruption, and to recover earlier versions of data as per retention policies. Different storage media discussed include magnetic tapes, hard disks, optical disks, solid state drives, and remote backup services. The document also covers concepts like online, nearline, and offline storage, backup sites, data extraction methods, and partial file copying techniques.

Uploaded by

Harendra Singh
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
90 views

Backup

The document discusses various aspects of backups including purposes, types of storage media, managing data repositories, selecting data, and remote backup services. It explains that backups have two main purposes - to recover data after loss or corruption, and to recover earlier versions of data as per retention policies. Different storage media discussed include magnetic tapes, hard disks, optical disks, solid state drives, and remote backup services. The document also covers concepts like online, nearline, and offline storage, backup sites, data extraction methods, and partial file copying techniques.

Uploaded by

Harendra Singh
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 15

In information technology, a backup or the process of backing up is making copies of data which may be used to restore the original

after a data loss event. Backups have two distinct purposes. The primary purpose is to recover data after its loss, be it by data deletion or corruption. The secondary purpose of backups is to recover data from an earlier time, according to a user-defined data retention policy, typically configured within a backup application for how long copies of data are required. Though backups popularly represent a simple form of disaster recovery, and should be part of a disaster recovery plan, by themselves, backups should not alone be considered disaster recovery. Not all backup systems or backup applications are able to reconstitute a computer system, or in turn other complex configurations such as a computer cluster, active directory servers, or a database server, by restoring only data from a backup. There are various Softwares and technologies available to take backup of live production data. For our information we will in due course of time we will see EMC2 Legato Networker at work and how it is a genius tool at work for taking Remote Backups. The backup being taken is often in large volumes and of high importance. It is therefore necessary to keep the backups in a safe place and also so that it can be easily produced back with full integrity and authencity. For the same mentioned purpose different storage media are used and some of them are mentioned below in a very brief detail. Storage media Regardless of the repository model that is used, the data has to be stored on some data storage medium somewhere. Magnetic tape Magnetic tape has long been the most commonly used medium for bulk data storage, backup, archiving, and interchange. Tape has typically had an order of magnitude better capacity/price ratio when compared to hard disk. Tape is a sequential access medium, so even though access times may be poor, the rate of continuously writing or reading data can actually be very fast. Some new tape drives are even faster than modern hard disks. A principal advantage of tape is that it has been used for this purpose for decades (much longer than any alternative) and its characteristics are well understood. Hard disk The capacity/price ratio of hard disk has been rapidly improving for many years. This is making it more competitive with magnetic tape as a bulk storage medium.

The main advantages of hard disk storage are low access times, availability, capacity and ease of use.[7] External disks can be connected via local interfaces like SCSI, USB, FireWire, or SATA, or via longer distance technologies like Ethernet, iSCSI, or Fibre Channel. Some disk-based backup systems, such as Virtual Tape Libraries, support data deduplication which can dramatically reduce the amount of disk storage capacity consumed by daily and weekly backup data. The main disadvantages of hard disk backups are that they are easily damaged, especially while being transported (e.g., for off-site backups), and that their stability over periods of years is a relative unknown. Optical storage Recordable CDs, DVDs, and Blu-ray Discs are commonly used with personal computers and generally have low media unit costs. However, the capacities and speeds of these and other optical discs are typically an order of magnitude lower than hard disk or tape. Many optical disk formats are WORM type, which makes them useful for archival purposes since the data cannot be changed. The use of an autochanger or jukebox can make optical discs a feasible option for larger-scale backup systems. Floppy disk During the 1980s and early 1990s, many personal/home computer users associated backing up mostly with copying to floppy disks. However the data capacity of floppy disks failed to catch up with growing demands, rendering them unpopular and obsolete.[8] Solid state storage Also known as flash memory, thumb drives, USB flash drives, CompactFlash, SmartMedia, Memory Stick, Secure Digital cards, etc., these devices are relatively expensive for their low capacity. A solid state drive does not contain any movable parts unlike its magnetic drive counterpart and can have huge throughput in the order of 500mbps to 1Gbps. SSD drives are now available in the order of 500GB to TBs. Remote backup service As broadband internet access becomes more widespread, remote backup services are gaining in popularity. Backing up via the internet to a remote location can protect against some worst-case scenarios such as fires, floods, or earthquakes which would destroy any backups in the immediate vicinity along with everything else. There are, however, a number of drawbacks to remote backup services. First, Internet connections are usually slower than local data storage devices. Residential broadband is especially problematic as routine backups must use an upstream link

that's usually much slower than the downstream link used only occasionally to retrieve a file from backup. This tends to limit the use of such services to relatively small amounts of high value data. Secondly, users must trust a third party service provider to maintain the privacy and integrity of their data, although confidentiality can be assured by encrypting the data before transmission to the backup service with an encryption key known only to the user. Ultimately the backup service must itself use one of the above methods so this could be seen as a more complex way of doing traditional backups. Managing the data repository Regardless of the data repository model or data storage media used for backups, a balance needs to be struck between accessibility, security and cost. These media management methods are not mutually exclusive and are frequently combined to meet the needs of the situation. Using on-line disks for staging data before it is sent to a near-line tape library is a common example. On-line On-line backup storage is typically the most accessible type of data storage, which can begin restore in milliseconds time. A good example would be an internal hard disk or a disk array (maybe connected to SAN). This type of storage is very convenient and speedy, but is relatively expensive. On-line storage is quite vulnerable to being deleted or overwritten, either by accident, by intentional malevolent action, or in the wake of a data-deleting virus payload. Near-line Near-line storage is typically less accessible and less expensive than on-line storage, but still useful for backup data storage. A good example would be a tape library with restore times ranging from seconds to a few minutes. A mechanical device is usually involved in moving media units from storage into a drive where the data can be read or written. Generally it has safety properties similar to on-line storage. Off-line Off-line storage requires some direct human action in order to make access to the storage media physically possible. This action is typically inserting a tape into a tape drive or plugging in a cable that allows a device to be accessed. Because the data is not accessible via any computer except during limited periods in which it is written or read back, it is largely immune to a whole class of on-line backup failure modes. Access time will vary depending on whether the media is on-site or off-site. Off-site data protection

To protect against a disaster or other site-specific problem, many people choose to send backup media to an off-site vault. The vault can be as simple as a system administrator's home office or as sophisticated as a disaster-hardened, temperaturecontrolled, high-security bunker that has facilities for backup media storage. Importantly a data replica can be off-site but also on-line. Such a replica has fairly limited value as a backup, and should not be confused with an off-line backup. Backup site or disaster recovery centre (DR centre) In the event of a disaster, the data on backup media will not be sufficient to recover. Computer systems onto which the data can be restored and properly configured networks are necessary too. Some organizations have their own data recovery centres that are equipped for this scenario. Other organizations contract this out to a third-party recovery centre. Because a DR site is itself a huge investment, backing up is very rarely considered the preferred method of moving data to a DR site. A more typical way would be remote disk mirroring, which keeps the DR data as up to date as possible. Selection and extraction of data A successful backup job starts with selecting and extracting coherent units of data. Most data on modern computer systems is stored in discrete units, known as files. These files are organized into file systems. Files that are actively being updated can be thought of as "live" and present a challenge to back up. It is also useful to save metadata that describes the computer or the filesystem being backed up. Deciding what to back up at any given time is a harder process than it seems. By backing up too much redundant data, the data repository will fill up too quickly. Backing up an insufficient amount of data can eventually lead to the loss of critical information. Making copies of files is the simplest and most common way to perform a backup. A means to perform this basic function is included in all backup software and all operating systems. Partial file copying Instead of copying whole files, one can limit the backup to only the blocks or bytes within a file that have changed in a given period of time. This technique can use substantially less storage space on the backup medium, but requires a high level of sophistication to reconstruct files in a restore situation. Some implementations require integration with the source file system. When backing up over a network, the rsync utility automatically transmits a minimum set of changes to bring an earlier version of a file at the destination up to

date with the current version at the source. Rsync can dramatically reduce the network traffic needed to maintain a remote mirror of a large set of files undergoing small, frequent changes. File systems File system dump Instead of copying files within a file system, a copy of the whole file system itself can be made. This is also known as a raw partition backup and is related to disk imaging. The process usually involves unmounting the file system and running a program like dd (Unix). Because the disk is read sequentially and with large buffers, this type of backup can be much faster than reading every file normally, especially when the file system contains many small files, is highly fragmented, or is nearly full. But because this method also reads the free disk blocks that contain no useful data, this method can also be slower than conventional reading, especially when the filesystem is nearly empty. Some filesystems, such as XFS, provide a "dump" utility that reads the disk sequentially for high performance while skipping unused sections. The corresponding restore utility can selectively restore individual files or the entire volume at the operator's choice. Identification of changes Some file systems have an archive bit for each file that says it was recently changed. Some backup software looks at the date of the file and compares it with the last backup to determine whether the file was changed. Live Data If a computer system is in use while it is being backed up, the possibility of files being open for reading or writing is real. If a file is open, the contents on disk may not correctly represent what the owner of the file intends. This is especially true for database files of all kinds. The term fuzzy backup can be used to describe a backup of live data that looks like it ran correctly, but does not represent the state of the data at any single point in time. This is because the data being backed up changed in the period of time between when the backup started and when it finished. For databases in particular, fuzzy backups are worthless. Snapshot backup A snapshot is an instantaneous function of some storage systems that presents a copy of the file system as if it were frozen at a specific point in time, often by a copy-onwrite mechanism. An effective way to back up live data is to temporarily quiesce it (e.g. close all files), take a snapshot, and then resume live operations. At this point the snapshot can be backed up through normal methods.[10] While a snapshot is very

handy for viewing a filesystem as it was at a different point in time, it is hardly an effective backup mechanism by itself. Open file backup Many backup software packages feature the ability to handle open files in backup operations. Some simply check for openness and try again later. File locking is useful for regulating access to open files. When attempting to understand the logistics of backing up open files, one must consider that the backup process could take several minutes to back up a large file such as a database. In order to back up a file that is in use, it is vital that the entire backup represent a single-moment snapshot of the file, rather than a simple copy of a read-through. This represents a challenge when backing up a file that is constantly changing. Either the database file must be locked to prevent changes, or a method must be implemented to ensure that the original snapshot is preserved long enough to be copied, all while changes are being preserved. Backing up a file while it is being changed, in a manner that causes the first part of the backup to represent data before changes occur to be combined with later parts of the backup after the change results in a corrupted file that is unusable, as most large files contain internal references between their various parts that must remain consistent throughout the file. Cold database backup During a cold backup, the database is closed or locked and not available to users. The datafiles do not change during the backup process so the database is in a consistent state when it is returned to normal operation. Hot database backup Some database management systems offer a means to generate a backup image of the database while it is online and usable ("hot"). This usually includes an inconsistent image of the data files plus a log of changes made while the procedure is running. Upon a restore, the changes in the log files are reapplied to bring the database in sync. Metadata Not all information stored on the computer is stored in files. Accurately recovering a complete system from scratch requires keeping track of this non-file data too. System specifications are needed to procure an exact replacement after a disaster. Boot sector The boot sector can sometimes be recreated more easily than saving it. Still, it usually isn't a normal file and the system won't boot without it.

Partition layout The layout of the original disk, as well as partition tables and filesystem settings, is needed to properly recreate the original system. File metadata Each file's permissions, owner, group, ACLs, and any other metadata need to be backed up for a restore to properly recreate the original environment. Different operating systems have different ways of storing configuration information. Microsoft Windows keeps a registry of system information that is more difficult to restore than a typical file. Manipulation of Data and Dataset Optimization It is frequently useful or required to manipulate the data being backed up to optimize the backup process. These manipulations can provide many benefits including improved backup speed, restore speed, data security, media usage and/or reduced bandwidth requirements. Compression Various schemes can be employed to shrink the size of the source data to be stored so that it uses less storage space. Compression is frequently a built-in feature of tape drive hardware. Deduplication When multiple similar systems are backed up to the same destination storage device, there exists the potential for much redundancy within the backed up data. For example, if 20 Windows workstations were backed up to the same data repository, they might share a common set of system files. The data repository only needs to store one copy of those files to be able to restore any one of those workstations. This technique can be applied at the file level or even on raw blocks of data, potentially resulting in a massive reduction in required storage space. Deduplication can occur on a server before any data moves to backup media, sometimes referred to as source/client side deduplication. This approach also reduces bandwidth required to send backup data to its target media. The process can also occur at the target storage device, sometimes referred to as inline or back-end deduplication. Duplication Sometimes backup jobs are duplicated to a second set of storage media. This can be done to rearrange the backup images to optimize restore speed or to have a second copy at a different location or on a different storage medium.

Encryption High capacity removable storage media such as backup tapes present a data security risk if they are lost or stolen.[13] Encrypting the data on these media can mitigate this problem, but presents new problems. Encryption is a CPU intensive process that can slow down backup speeds, and the security of the encrypted backups is only as effective as the security of the key management policy. Multiplexing When there are many more computers to be backed up than there are destination storage devices, the ability to use a single storage device with several simultaneous backups can be useful. Refactoring The process of rearranging the backup sets in a data repository is known as refactoring. For example, if a backup system uses a single tape each day to store the incremental backups for all the protected computers, restoring one of the computers could potentially require many tapes. Refactoring could be used to consolidate all the backups for a single computer onto a single tape. This is especially useful for backup systems that do incrementals forever style backups. Staging Sometimes backup jobs are copied to a staging disk before being copied to tape. This process is sometimes referred to as D2D2T, an acronym for Disk to Disk to Tape. This can be useful if there is a problem matching the speed of the final destination device with the source device as is frequently faced in network-based backup systems. It can also serve as a centralized location for applying other data manipulation techniques. Managing the backup process It is important to understand that backing up is a process. As long as new data is being created and changes are being made, backups will need to be updated. Individuals and organizations with anything from one computer to thousands (or even millions) of computer systems all have requirements for protecting data. While the scale is different, the objectives and limitations are essentially the same. Likewise, those who perform backups need to know to what extent they were successful, regardless of scale. Objectives Recovery point objective (RPO)

The point in time that the restarted infrastructure will reflect. Essentially, this is the roll-back that will be experienced as a result of the recovery. The most desirable RPO would be the point just prior to the data loss event. Making a more recent recovery point achievable requires increasing the frequency of synchronization between the source data and the backup repository. Recovery time objective (RTO) The amount of time elapsed between disaster and restoration of business functions.[15] In addition to preserving access to data for its owners, data must be restricted from unauthorized access. Backups must be performed in a manner that does not compromise the original owner's undertaking. This can be achieved with data encryption and proper media handling policies. An effective backup scheme will take into consideration the limitations of the situation. Backup window The period of time when backups are permitted to run on a system is called the backup window. This is typically the time when the system sees the least usage and the backup process will have the least amount of interference with normal operations. The backup window is usually planned with users' convenience in mind. If a backup extends past the defined backup window, a decision is made whether it is more beneficial to abort the backup or to lengthen the backup window. Performance impact All backup schemes have some performance impact on the system being backed up. For example, for the period of time that a computer system is being backed up, the hard drive is busy reading files for the purpose of backing up, and its full bandwidth is no longer available for other tasks. Such impacts should be analyzed. Costs of hardware, software, labour All types of storage media have a finite capacity with a real cost. Matching the correct amount of storage capacity (over time) with the backup needs is an important part of the design of a backup scheme. Any backup scheme has some labour requirement, but complicated schemes have considerably higher labour requirements. The cost of commercial backup software can also be considerable. Logging In addition to the history of computer generated reports, activity and change logs are useful for monitoring backup system events. Validation

Many backup programs make use of checksums or hashes to validate that the data was accurately copied. These offer several advantages. First, they allow data integrity to be verified without reference to the original file: if the file as stored on the backup medium has the same checksum as the saved value, then it is very probably correct. Second, some backup programs can use checksums to avoid making redundant copies of files, to improve backup speed. This is particularly useful for the de-duplication process. Types of Backup ( Overview): Incremental Backups Incremental backup stores all files changed since the last FULL, DIFFERENTIAL OR INCREMENTAL backup. The advantage of an incremental backup is that it takes the least time to finish. The disadvantage is that during a restore operation, each increment is processed and this could result in a lengthy restore job. Incremental backup provides a faster method of backing up data than repeatedly running full backups. During an incremental backup, only files changed since the most recent backup are included. That is where it gets its name: each backup is an increment for a previous backup. The representation below shows how a backup job running four times would look like when using incremental:

The time it takes to execute the backup may be a fraction of the time it takes to perform a full backup. Backup4all is a backup program that supports incremental backup, and it uses the information recorded in its catalog file (.bkc) to determine whether each file has changed since the most recent backup.

The advantage of lower backup times comes with a price: increased restore time. When restoring from incremental backup, you need the most recent full backup as well as EVERY incremental backup you've made since the last full backup. For example, let's assume you did a full backup on Friday and incremental backups on Monday, Tuesday and Wednesday. If you need to restore your backup on Thursday morning, you would need all four backup container files: Friday's full backup plus the incremental backup for Monday, Tuesday and Wednesday. By comparison, if you had run differential backup on Monday, Tuesday and Wednesday, then to restore on Thursday morning you would have needed only Friday's full backup plus Wednesday's differential. Advantages of incremental backups: 1. It is the fastest backup type since it only backs-up increments 2. Saves storage space compared to other types 3. Each backup increment can store a different version for a file/folder Disadvantages for this backup type: 1. Full restore is slow compared to other backup types (you need the first full backup and all increments since then) 2. To restore the latest version of an individual file the increment that contains it must be found first

Differential Backups Differential backup contains all files that have changed since the last FULL backup. The advantage of a differential backup is that it shortens restore time compared to a full backup or an incremental backup. However, if you perform the differential backup too many times, the size of the differential backup might grow to be larger than the baseline full backup. There is a significant, but sometimes confusing, distinction between differential backup and incremental backup. Whereas incremental backs up all the files modified since the last full backup, differential or incremental backup, differential backup offers a middle ground by backing up all the files

that have changed since the last full backup. That is where it gets its name: it backs up everything that's different since the last full backup. In the image below you can see an example on how a differential backup would look like for a backup job that runs four times:

Restoring a differential backup is a faster process than restoring an incremental backup because only two backup container files are needed: the latest full backup and the latest differential. Use differential backup if you have a reasonable amount of time to perform backups. The upside is that only two backup container files are needed to perform a complete restore. The downside is if you run multiple differential backups after your full backup, you're probably including some files in each differential backup that were already included in earlier differential backups, but haven't been recently modified. Advantages: 1. Restore is faster than restoring from incremental backup 2. Backing up is faster than a full backup 3. The storage space requirements are lower than for full backup Disadvantages: 1. Restore is slower than restoring from full backup 2. Backing up is slower than incremental backup 3. The storage space requirements are higher than for incremental backup Full backup Full backup is the starting point for all other backups and contains all the data in the folders and files that are selected to be backed up (source data). Since it will store all files and folders, frequent full

backups result in faster and simpler restore operations in comparison with other backup types. It would be ideal to make full backups all the time, because they are the most comprehensive and are self-contained. However, the amount of time it takes to run full backups often prevents us from using it. Full backups are often restricted to a weekly or monthly schedule, although the increasing speed and capacity of backup media is making overnight full backups a more realistic proposition. Full backups offer the best solution in data protection and given that you can schedule a backup to run automatically, it requires little intervention compared to the benefits. A single full backup provides the ability to completely restore all backed-up files and folders, as exemplified in the image below:

However, you should be aware of a significant security issue: each full backup contains an entire copy of the data. If the backup media were to be illegally accessed, stolen or lost, the entire copy of your data could be in the hands of unauthorized persons. This is why when deciding to use a backup program to make full backups, make sure it supports encryption to protect the backed-up data. Advantages: 1. Restore is the fastest 2. The entire backed-up data is stored in a single file (better storage management) Disadvantages: 1. Backing up is the slowest compared to other backup types 2. The storage space requirements are the highest (compared to incremental backup or differential backup). Considering how cheap storage devices are now, this is a low impact disadvantage.

As a recommendation, even if full backup offers the most protection, it's good to have a backup strategy in place where full backups are performed weekly, and faster backup types (such as incremental) are executed daily. You can watch a video that presents a complex but optimal backup strategy here: Scheduled backups combined with cleanup operations. . Differences between Data Backup and Data Archiving Data backups usually copy data to a sequential access recovery medium while data archiving moves data to a lower tier random access medium while possibly leaving a stub behind. In addition, backups typically retain multiple copies of the data being protected while archives retain a single copy of the data being stored either through data deduplication or content-addressable storage (CAS) technologies. Another major difference is that data archiving provides indexing and search capabilities within files whereas data backups offer at most a keyword search based on policy or backup image. And finally, data backups are generally designed for short-term storage for recovery purposes as opposed to data archives which are designed for long-term storage for regulatory and legal compliance. Remember Backups are not Archives. Backup: A backup refers to a copy of data that may be used to restore the original in the event the latter is lost or damaged beyond repair. It is a safeguard for data that is being used. Archive: An archive refers to a single or a collection of historical records specifically selected for long-term retention and future reference. It is usually data that is no longer actively used. One of the differences worth noting in the above descriptions is that a backup is always a copy while an archive should be the original that was removed from its initial location and sent elsewhere for long-term retention. Now we will see in detail how actually backup is taken of Remote servers and Hardware and Software Technology in use for it. The tool used is EMC2 Legato

Networker. Many other technologies are put to use which will be put to limelight later on.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy