OReilly UNIX Backup And Recovery Dec 1999 ISBN 1565926420 pdf

  Page iii

Unix Backup and Recovery

  W. Curtis Preston

  

Beijing • Cambridge • Farnham • Köln • Paris • Sebastopol • Taipei • Tokyo

Page iv

  Disclaimer: This netLibrary eBook does not include data from the CD-ROM that was part of the original hard copy book.

  Unix Backup and Recovery

  by W. Curtis Preston Copyright (c) 1999 O'Reilly & Associates, Inc. All rights reserved.

  Printed in the United States of America. Published by O'Reilly & Associates, Inc., 101 Morris Street, Sebastopol, CA 95472.

  Editor:

  Gigi Estabrook

  Production Editor:

  Clairemarie Fisher O'Leary

  Printing History: November 1999: First Edition.

  Nutshell Handbook, the Nutshell Handbook logo, and the O'Reilly logo are registered trademarks of O'Reilly & Associates, Inc. Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this book, and O'Reilly & Associates, Inc. was aware of a trademark claim, the designations have been printed in caps or initial caps. The association between the image of an Indian gavial and the topic of Unix backup and recovery is a trademark of O'Reilly & Associates, Inc.

  While every precaution has been taken in the preparation of this book, the publisher assumes no responsibility for errors or omissions, or for damages resulting from the use of the information contained herein.

  This book is printed on acid-free paper with 85% recycled content, 15% post-consumer waste. O'Reilly & Associates is committed to using paper with the highest recycled content available consistent with high quality.

  ISBN: 1-56592-642-0

  Page vii

  Page v This book is dedicated to my lovely wife Celynn, my beautiful daughters Nina and Marissa, and to God, for continuing to bless my life with gifts such as these.

  • -W. Curtis Preston

TABLE OF CONTENTS Preface

  

   Don't Skip This Chapter!

  

   Monitoring Your Backups

   Testing Your Backups

   Storing Your Backups

   Deciding How to Back Up

   Deciding When to Back Up

   Deciding What to Back Up

   You Can Find a Balance

   How Serious Is Your Company About Backups?

   Why Should You Read This Book?

  2. Backing It All Up

  1. Preparing for the Worst

  

  

   Step 6: Test, Test, Test

   Step 5: Document What You Have Done

   Step 4: Protect Against Disasters

   Step 3: Organize Everything

   Step 2: Back Up Everything

   Step 1: Define (Un)acceptable Loss

   Developing a Disaster Recovery Plan

   My Dad Was Right

   Put It All Together

  Page viii

  5. Commercial Backup Utilities

  

  The infback.sh, oraback.sh, and syback.sh Utilities A Really Fast tar Utility: star

  

  Recording Configuration Data: The SysAudit Utility Displaying Host Information: The SysInfo Utility Performing Remote Detections: The queso Utility Mapping Your Network: The nmap Utility AMANDA

  

  

  

  

  What to Look For

  

  Full Support of Your Platforms

  

  Backup of Raw Partitions

  

  The hostdump.sh Utility

  4. Free Backup Utilities

  Following Proper Development Procedures

   Backing Up with the dump Utility

   Unrelated Miscellanea

   Good Luck

   II. Freely Available Filesystem Backup & Recovery Utilities

  

  3. Native Backup & Recovery Utilities

   An Overview

   Restoring with the restore Utility

  

   Limitations of dump and restore

  Features to Check For

  

  Backing Up and Restoring with the cpio Utility Backing Up and Restoring with the tar Utility Backing Up and Restoring with the dd Utility Comparing tar, cpio, and dump

  

  How Do I Read This Volume?

  Backup of Very Large Filesystems and Files Simultaneous Backup of Many Clients to One Drive

  Simultaneous Backup of One Client to Many Drives

  Homegrown Bare-Metal Recovery

  

  HA Building Blocks

  

  Commercial HA Solutions

  

  The Impact of an HA Solution

  

  7. SunOS/Solaris

  

  What About Fire?

  

  

  

  Recovering a SunOS/Solaris System

  8. Linux

  

  How It Works

  

  A Sample Bare-Metal Recovery

  

  9. Compaq True-64 Unix

  

  Compaq's btcreate Utility

  

  Homegrown Bare-Metal Recovery

  What Is High Availability?

  6. High Availability

  Page ix

  

  Data Requiring Special Treatment

  

  Storage Management Features

  

  Reduction in Network Traffic

  

  Support of a Standard or Custom Backup Format Ease of Administration

  

  Security

  

  Ease of Recovery

  Protection of the Backup Index

  

  

  Robustness

  

  Automation

  

  Volume Verification

  

  Cost

  

  Vendor

  

  Conclusions

  

  10. HP-UX

  

  Confusion: The Mysteries of Database Architecture The Muck Stops Here: Databases in Plain English What's the Big Deal?

  

  Database Structure

  

  An Overview of a Page Change

  

  What Can Happen to an RDBMS?

  

  Backing Up an RDBMS

  Restoring an RDBMS

  Can It Be Done?

  

  Documentation and Testing

  

  Unique Database Requirements

  

  14. Informix Backup & Recovery

  

  Informix Architecture

  

  

  

  

  

  HP's make_recovery Utility

  

  The copyutil Utility

  

  Using dump and restore

   Page x

  11. IRIX

  

  SGI's Backup and Restore Utilities

  System Recovery with Backup Tape Homegrown Bare-Metal Recovery

  13. Backing Up Databases

  

  12. AIX

  

  IBM's mksysb Utility

  

  IBM's Sysback/6000 Utility

  

  System Cloning

  

  

  Automating Informix Startup: The dbstart.informix.sh Script Protect the Physical Log, Logical Log, and sysmaster Which Backup Utility Should I Use? Physical Backups Without a Storage Manager: ontape

  Physical Backups with a Storage Manager: onbar Recovering Informix

  Choosing on a Backup Drive

  

  17. ClearCase Backup & Recovery

  

  ClearCase Architecture

  

  VOB Backup and Recovery Procedures View Backup and Recovery Procedures Summary

  

  18. Backup Hardware

  

  

  An Ounce of Prevention

  Using Backup Hardware

  

  Tape Drives

  

  Optical Drives

  

  Automated Backup Hardware

  

  Vendors

  

  

  

  

  Logical Backups

  

  15. Oracle Backup & Recovery

  

  Oracle Architecture

  

  Physical Backups Without a Storage Manager Physical Backups with a Storage Manager Managing the Archived Redologs

   Page xi

  Recovering Oracle

  Logical Backups

  Logical Backups

  

  A Broken Record

  

  16. Sybase Backup & Recovery

  

  Sybase Architecture

  

  Physical Backups Without a Storage Manager Physical Backups with a Storage Manager Recovering Sybase

  

   Hardware Comparison

  

  19. Miscellanea Volatile Filesystems

  

  Demystifying dump

  

  Gigabit Ethernet

  

  Disk Recovery Companies

  

  Yesterday

  

  Trust Me About the Backups

   Index

   Page xiii

PREFACE

  Like many people, I had to learn backups the hard way. I worked at a large company where I was responsible for backing up Unix SVr3/4, Ultrix, HP-UX 8-10, AIX 3, Solaris 2.3, Informix, Oracle, and Sybase. In those days I barely understood how Unix worked, and I really didn't understand how databases worked-yet it was my responsibility to back it all up. I did what any normal person would do. I went to the biggest bookstore I could find and looked for a book on the subject. There weren't any books on the shelf, so I went to the counter where they could search the Books in Print database. Searching on the word "backup" brought up one book on how to back up Macintoshes.

  Disillusioned, I did what many other people did: I read the backup chapters in several system and database administration books. Even the best books covered it on only a cursory level, and none of them told me how to automate the backups of 200 Unix machines that ran eight different flavors of Unix and three different database products. Another common problem with these chapters is that they would dedicate 90 percent or more to backup and less than 10 percent to recovery. So my company did what many others had done before us-we reinvented the wheel and wrote our own homegrown utilities and procedures.

  Then one day I realized that our backup/recovery needs had outgrown our homegrown utilities, which meant that we needed to look at purchasing a commercial utility. Again, there were no resources to help explain the differences between the various backup utilities that were available at that time, so we did what most people do-we talked to the vendors. Since most of the vendors just bashed one another, our job was to try to figure out who was telling the truth and who wasn't. We then wrote a Request For Information (RFI) and a Request For Proposal (RFP) and sent it to the vendors we were considering, whose quotes ranged from

  Page xiv $16,000 to $150,000. Believe it or not, the least expensive product also did the best on the RFI, and we bought and installed our first commercial backup utility.

  The day came for me to leave my first backup utility behind, as I was hired by a company that would one day become Collective Technologies. Finally, a chance to get out of backups and become a real system administrator! Interestingly enough, one of my first clients had been performing backups only sporadically, but I discovered that they had a valid license for the commercial product with which I was already familiar. (Imagine the luck.) While rolling out that product, they asked me also to look at how they were backing up their Oracle databases. The next thing I knew, I had ported my favorite Oracle backup script and published it. The response to that article was amazing. People around the world wrote me and thanked me for sharing it, and I caught the publishing bug. One of Collective Technologies' mottos is, ''If something is broken, fix it!" Normally, we're talking about problems within our own company, but I applied it to the backup and recovery industry ... and the dream of this book was born.

I Wish I Had This Book

  My dream was to write a book that would make sure that no one ever had to start from scratch again, and I believe that my coauthors and I have done just that. It contains every backup tool that I wish I had had when I first entered the Unix business and every lesson and trick that I've learned along the way. It covers how to back up and recover everything from a basic Unix workstation to a complicated Informix, Oracle, or Sybase database. Whether your budget barely stretches to cover the cost of the backup media or allows you to buy a silo bigger than your house, this book has something for you. Whether your task is to figure out how to back up, with no commercial utilities, an environment such as the one I first encountered or to choose from among more than 50 commercial backup utilities, this book will tell you how to do it. With that in mind, let me mention a few things about this book that are unique.

Only the Recovery Matters

  As a friend of mine used to tell me, "No one cares if you can back up-only if you can recover." Yet how many backup chapters have you read that dedicate less than 10 percent to recovery? You won't find that in this book. I have tried very hard to ensure that recovery is given treatment equal to that of backups. In fact, many times it is given greater treatment; the Oracle chapter has more than twice as much space dedicated to the recovery as it does to backups!

  Page xv

Products Change

  Some people may be surprised that there are no product names mentioned in the commercial backup section. I did this for several reasons, the main one being that products change constantly. It would be impossible to keep this book up to date with the 50 different backup products that are available for Unix. In fact, the book would be out of date by the time it hit the shelves. Instead, this book explains the concepts of commercial backup and recovery software, allowing you to apply those concepts to the claims that the vendors are currently making. Up-to-date information about specific products has been placed on

Backing Up Databases Is Not That Hard

  If you're a database administrator (DBA), you may not be familiar with the Unix backup commands necessary to back up your database. If you're a system administrator (SA), you may not be familiar with the architecture of your particular database platform. Both of these concepts are explained in detail in this book. I explain the backup utilities in plain language so that any DBA can understand them, and I explain database architecture in such a way that an SA, even one who has never before seen a database, can understand it.

Bare-Metal Recovery Is Not That Hard

  One of these days you will lose the operating system disk for an important system, and you will need to recover it. This is called a "bare-metal recovery." The standard recovery method described in many backups products' documentation is to install a minimal operating system and restore on top of it. This is the worst possible method to do a bare-metal recovery of a Unix system; among other problems, you end up overwriting some of the system files while the system is running from the very disk to which you are trying to restore. The best ways to do bare-metal recoveries for six different versions of Unix are covered in detail in this book.

  The Scripts in This Book Actually Work Nothing bugs me more than to read a book in which the author talks about a really neat program, only to find out that the program is so full of bugs it won't work.

  Most of the programs in this book are already running at hundreds of sites around the world. With all the typical "unsupported" disclaimers in place, I do my best to ensure that they continue to work for the people who use them. If you're

  Page xvi

   provide updates as they become available.

How This Book is Organized

  This book is divided into six parts:

  Part I, Introduction This part of this book contains just enough information to whet your backup and recovery appetite. Chapter 1, Preparing for the Worst, contains the six steps that you must go through to create and maintain a disaster recovery plan, one part of which will be a good backup and recovery system. Chapter 2, Backing It All Up, goes into detail about the essential elements of a good backup and recovery system. Part II, Freely Available Filesystem Backup & Recovery Utilities This section covers the freely available utilities that you can use to back up your systems if you can't afford a commercial backup package. Chapter 3, Native Backup & Recovery Utilities, covers Unix's native backup and recovery utilities in detail, including dump, tar, GNU tar, cpio, GNU cpio, and dd. Chapter 4, Free Backup Utilities, starts with some simple tools to assist you in your backups, and contains a complete overview of the popular AMANDA utility, which is used to back up many small to medium-sized Unix installations around the world. Part III, Commercial Filesystem Backup & Recovery Utilities If you have outgrown the capabilities of free utilities, or would just like to take advantage of new backup and recovery technologies, you'll need to look at a commercial product. Chapter 5, Commercial Backup Utilities, is your guide to the hundreds of features available in the over 50 commercial backup products available on the market today, allowing you to make an educated purchase decision.

  Page xvii

  Chapter 6, High Availability, details how, when backups just aren't fast enough, a high availability system is designed to keep you from ever needing to use your backups.

  Part IV, Bare-Metal Backup & Recovery Methods A bare-metal recovery is the fastest way to bring a dead system back to life, even if its root drive is completely destroyed. Chapter 7, SunOS/Solaris, contains an in-depth description of the "homegrown" bare-metal recovery procedure that can also be used to back up Linux, Compaq, HP- UX, and IRIX, as well as a detailed Solaris-based example of bare-metal recovery. Chapter 8, Linux, detail how you can perform a bare-metal recovery of a Linux system with a floppy, a backup device, pax, and lilo. Chapter 9, Compaq True-64 Unix, covers both Compaq True-64 Unix's bare-metal recovery tool and the Compaq version of the homegrown procedure covered in Chapter 7. Chapter 10, HP-UX, covers the make_recovery tool, which now comes with HP-UX to perform bare-metal recoveries, along with the HP version of the homegrown procedure. Chapter 11, IRIX, explains how the different versions of IRIX's Backup and Restore scripts work, as well as the IRIX version of the homegrown procedure. Chapter 12, AIX, discusses AIX, a procedure that does not support the homegrown procedure discussed in Chapter 7, but does use mksysb, probably one of the oldest and best-known bare-metal recovery tools. Part V, Database Backup & Recovery This section explains in plain language an area that presents some of the greatest backup and recovery challenges that a system administrator or database administrator will face-backing up and recovering databases. Chapter 13, Backing Up Databases, is a chapter that will be your friend if you're an SA who's afraid of databases or a DBA learning a new database. It explains database architecture in plain language, while relating each architectural element to the appropriate term in Informix, Oracle, and Sybase. Chapter 14, Informix Backup & Recovery, explains both the older ontape and the newer onbar, after which it provides a logically flowcharted recovery procedure that can be used with either utility. Page xviii Chapter 15, Oracle Backup & Recovery, explains how to perform Oracle hot backups whether you are using Oracle's native utilities, EBU, or RMAN, and then provides a detailed flowchart guiding you through even a difficult recovery. Chapter 16, Sybase Backup & Recovery, shows exactly how to use the Backup Server utility, including another flow chart to guide you through Sybase recoveries. Part VI, Backup & Recovery Potpourri The information contained in this part of the book is by no means unimportant; it simply wouldn't fit anywhere else! Chapter 17, ClearCase Backup & Recovery, explains in detail the unique backup and recovery challenges presented by ClearCase. Chapter 18, Backup Hardware, explains the many different types of backup hardware available today, as well as providing criteria that you may use to decide which type of backup drive is right for you. Chapter 19, Miscellanea, covers everything from the oft-debated "live filesystem dumps" question to a few jokes that I found about backup and recovery! Conventions The following typographical conventions are used in this book: Constant width Is used to indicate command-line computer output, computer-generated messages, and code examples. It is also used when referring to parameters in text. Constant width italic Is used to indicate variables in examples and text, and comments in examples. Constant width bold Is used to indicate user input in examples. Italic Is used to introduce new terms and to indicate URLs, variables or files and directories, commands, file extensions, filenames, and directory names. How to Contact Us We have tested and verified all the information in this book to the best of our ability, but you may find that features have changed (or even that we have made

  mistakes!). Please let us know about any errors you find, as well as your suggestions for future editions, by writing to:

  Page xix

  O'Reilly & Associates 101 Morris Street Sebastopol, CA 95472 1-800-998-9938 (in the U.S. or Canada) 1-707-829-0515 (international/local) 1-707-829-0104 (fax)

  You can also send messages electronically. To be put on our mailing list or to request a catalog, send email to:

  nuts@oreilly.com

  To ask technical questions or comment on the book, send email to:

  bookquestions@oreilly.com

This Book Was a Team Effort

  I have never worked with a group of people like the ones I work with at Collective Technologies. Over the past three years, they have answered question after question about the various ways to back up and recover just about everything under the sun. Thanks to them, there is information in this book that would never have been otherwise. They sent me manpages and verified syntax for commands on versions of Unix that I've never even seen. They entered into technical debates about how to compare the architectures of Informix, Oracle, and Sybase. They tested the programs that are included in this book and even wrote a few of them.

  By far the greatest contribution that other people gave to this book is that several of the chapters were written by experts in a particular field. I realized about a year ago that I would never finish this book if I didn't ask some of my friends to help. The result was that more than 20 percent of the final book ended up being written by people other than me. Their expertise in a particular area made their chapters far better than anything I could have written on my own. Having said that, please allow me to formally thank all my of my coauthors:

  AIX bare-metal recovery

  Charles Gagnon and Brian Jensen of Collective Technologies

  AMANDA

  John R. Jackson and Alexandre Oliva from the AMANDA Core Development Team

  Clearcase backup and recovery

  Bob Fulwiler of Seattle, Washington

  Compaq/Digital Unix bare-metal recovery

  Matthew Huff of Collective Technologies

  Page xx Dump internals

  David Young of Collective Technologies

  High-availability systems

  Josh Newcomb and Gustavo Vegas of Collective Technologies

  HP-UX bare-metal recovery

  Steve Ferguson of Collective Technologies

  IRIX bare-metal recovery

  Blayne Puklich of Collective Technologies

  Sybase backup and recovery

  Bryn Smith of Collective Technologies Without these folks, either the book would never have been completed or it would contain substantially less data than the book you see today. Another group of people that I must thank is my technical reviewers. If every book's author had the team of technical reviewers I had, the world would contain far less misinformation. This book was actually reviewed on an ongoing basis by a number of Collective Technologies people. I set up an RCS system that allowed a team of about 30 reviewers to actually check out my chapters and edit them. They constantly kept me in check, identifying parts of the book that were inaccurate or that needed clarification. You can't imagine the benefit of having such a great team looking over your shoulder. This special ongoing technical review team consisted of:

  Scott Aschenbach Michael Clark Norman Hill Jason Perkins Rusty Atkins Nancy Cortez Todd Holloway Stephen Potter Ed Bailey Jim Donnelan Bill Huff Jason Stege David Bajot William Duffy Paul Iadonisi Vince Taluskie Mike Bush Steve Ferguson Brian Jensen Gustavo Vegas

  Enrico Cantu Henry Ferrara Eric Jones Bryce Wade Paul Chalker Charles Gagnon Cliff Nadler Asim Zuberi

  I would like to give a special thank you to every one of you! Once the final draft of the book was completed, an entirely different set of people did a complete technical review. These people were brutal! I can tell you that this incredibly humbling experience made this book far more technically accurate than it would have been otherwise. All of the technical reviewers did a wonderful job, but I'd like to thank two of them in particular. Gordon Galligher did an extensive technical review of the entire book, even though he got the review copy late and has a newborn baby! Art Kagel, of comp.databases.informix fame, reviewed and re-reviewed the Informix chapter until it was right. I even got email at 3:00 A.M. once in which he revealed he'd finally found the answer to a question that had

  Page xxi

  been bugging both of us. The readers owe a big thank you to all of the following people:

  Those who reviewed the entire book:

  Brian Epstein Gordon C. Galligher Mike O'Connor

  Those who reviewed selected chapters:

  Clem Akins Mark A. Alestra Scott Aschenbach Greg Bourgoin Jeffrey Dykzeul Norm Eisenberg Lee Gould Brian Jensen Art S. Kagel Cliff Nadler Daniel T. Pigg Rodney Rutherford Liza Weissler

  Wow! That's more than 40 technical reviewers! That means that if you find something in this book that's not technically correct, I've got 40 other people to point the finger at! Again, I would like to send a virtual high five to every one of these folks. Whether you helped me with the syntax of one or two commands or reviewed the whole book, I couldn't have done it without you!

  I Don't Know It All

  If there's one thing I learned while writing this book, it's that I do not know everything there is to know about backups. If you have a better way to do anything listed in this book, have learned any special tricks, or have written any neat utilities that you think would help other people do backups and recoveries, let me know. Email me at curtis@backupcentral.com. .

  How Can I Say Thanks?

  How can I begin to thank the hundreds of people who helped me? To God: May any praise for this book go to You alone.

  Page xxii

  To my wife, Celynn: I say "thank you" for the many nights you spent alone while I pounded away at my keyboard somewhere around the globe. You're a special woman who never gave up on me or my dream. I love you. Can we finally take a vacation that doesn't involve a laptop? To my older daughter, Nina: I say "Yes! It's finally done!" I know you've spent the last three years wondering when you were ever going to get your daddy back. Well, I'm done. Come give me a hug.

  To my baby daughter, Marissa: Maybe you, Nina, Mom, and I can finally spend some time together now! To my parents: What can I say? You always believed in me. You always used to tell me, "I don't care if you're a ditchdigger. Just be the best darn ditchdigger in the world." Well, being a backup guy is as close as you can get to being a ditchdigger in the computer business, and I "wrote the book" on that.

  To my wife's family: Thank you for raising such a wonderful lady. Thank you for treating me as one of your own and supporting us on our quest. Pahingi ng

  sinagong? To all the teachers who kept trying to get me to live up to my potential: You finally got through.

  To Collective Technologies: I never could have done this if it hadn't been for you folks. You truly are a special group of people, and I'm proud to be known as one of you. To Ed Taylor, Gordon Galligher, Curt Vincent, and anyone else who made the call to bring me on board at CT: What can I say? I'd probably still be swapping tapes if it wasn't for you. (Wait! I am still swapping tapes!) To Jeff Rochlin: How could I forget the guy who taught me how to use my own RFI? Thanks, dude. I hope Mickey's treating you really nice. To all my SA friends: Thank you for supporting me during this project. As I visited your hometowns in my travels, you welcomed me as one of your own. Only you truly understand what it's like trying to do something like this, and I couldn't have done it without you. To O'Reilly & Associates: Thank you for the opportunity to bring this much-needed book to market. (Sorry it took me two and a half years longer than it should have!) To Gigi Estabrook, my editor: We'll have to actually meet one of these days! I don't know how you do this, reading the same book over and over, without letting your eyes just glaze over. You're a great editor, and I could really tell that you

  Page xxiii

  put your all into this project. Thank you, thank you, and thank you. (Now don't edit that sentence, OK?) To the reader: Thank you for purchasing this book. I hope you learn as much reading it as I did writing it. To everyone else: Stop asking me if the book's done yet, all right? It's done!

  Page 1

I INTRODUCTION

  Part I consists of the following two chapters: • Chapter 1, Preparing for the Worst, describes the elements that should be part of an overall disaster recovery plan.

  • Chapter 2, Backing It All Up, provides an overview of the backup and recover process.

  Page 3

  One of the simplest rules of systems administration is that disks and systems fail. If you haven't already lost a system or at least a disk drive, consider yourself extremely lucky. You also might consider the statistical possibility that your time is coming really soon. Maybe it's just me, but I lost four laptop disk drives while trying to write this book! (Yes, I had them backed up.) This chapter talks about developing an overall disaster recovery plan, of which your backup and recovery system will be just a part.

  My Dad Was Right

  My father used to tell me, ''There are two types of motorcycle owners. Those who have fallen, and those who will fall." The same rule applies to system administrators. There are those who have lost a disk drive and those who will lose a disk drive. (I'm sure my dad was just trying to keep me from buying a motorcycle, but the logic still applies. That's not bad for a guy who got his first computer last year, don't you think?) Whenever I speak about my favorite subject at conferences, I always ask questions like, "Who has ever lost a disk drive?" or "Who has lost an entire system?" Actually, this chapter was written while at a conference. When I asked those questions there, someone raised his hand and said, "My computer room just got struck by lightning." That sure made for an interesting discussion! If you haven't lost a system, look around you ... one of your friends has.

  Speaking of old adages, the one that says "It'll never happen to me" applies here as well. Ask anyone who's been mugged if they thought it would happen to them. Ask anyone who's been in a car accident if they ever thought it would happen to

  Page 4

  them. Ask the guy whose computer room was struck by lightning if he thought it would ever happen to him. The answer is always "No." While the title of this book is Unix Backup & Recovery, the whole reason you are making these backups is so that you will be able to recover from some level of disaster. Whether it's a user who has accidentally or maliciously damaged something or a tornado that has taken out your entire server room, the only way you are going to recover is by having a good, complete, disaster recovery plan that is based on a solid backup and recovery system. Neither can exist completely without the other. If you have a great backup system but aren't storing your media off-site, you'll be sorry when that tornado hits. You may have the most well organized, well protected set of backup volumes,* but they won't be of any help if your backup and recovery system hasn't properly stored the data on those volumes. Getting good backups may be an early step in your disaster recovery plan, but the rest of that plan-organizing and protecting those backups against a disaster-should follow soon after. Although the task may seem daunting, it's not impossible.

  Developing a Disaster Recovery Plan

  Devising a good disaster recovery plan is hard work. You need to build it from the ground up, and it can take months or even years to perfect. Since computer environments are changing constantly, you continually have to test your plan to make sure it still works with your changing environment. This chapter is not meant to be a comprehensive guide to disaster recovery planning. There are books dedicated to just that topic, and before you attempt to design your own disaster recovery plan, I strongly advise you to research this topic further. This chapter gives an overview of the steps necessary to complete such a plan, as well as discusses a few details that are typically left out of other books. It provides a frame of reference upon which the rest of the book will be based. There are essentially six steps to designing a complete disaster recovery plan. While you may work on several steps simultaneously, the order listed here is very important. Don't jump into the design stage before understanding what level of risk your company is willing to take or what types of disasters the plan needs to address. Likewise, what good does it do to have a well-documented, well-organized disaster recovery plan based on a backup system that doesn't work? The six steps are as follows:

  • * This book will use the term volume instead of tape whenever appropriate. See the section "Why the Word "Volume" Instead of "Tape"?" in Chapter 2, Backing It All Up,

    for an explanation.

  Page 5 1. Define (un)acceptable loss.

  Before you develop a disaster recovery plan, decide how much you will lose if you don't. That will help you decide how much time, effort, and money to spend on a disaster/recovery plan.

  2. Back up everything.

  You have to make sure that everything is backed up-including data, metadata, and the instructions you'll need to get them back.

  3. Organize everything.

  You have everything on backup volumes. But can you find the volume you need when disaster strikes? The key to being able to find your backups is organization.

  4. Protect against disasters.

  Most people think about natural disasters only when creating a disaster recovery plan. There are nine other types of disasters, and you have to protect against all of them. (The 10 types of disasters are covered in Chapter 2.)

  5. Document what you have done.

  You need to document your plan in such a way that anyone can follow your steps after or during a disaster.

  6. Test, test, test.

  A disaster recovery plan that has not been tested is not a plan; it's a proposal. You don't want to be in the middle of a disaster and discover that you have forgotten some critical steps.

  Step 1: Define (Un)acceptable Loss

  A disaster recovery plan is an insurance policy. If you've ever read anything about backups, you've heard that before. I would like to extend that analogy. Consider your car insurance policy. All insurance policies in the United States start with PIP, or personal injury protection. That way if you hit someone and get sued, you are protected. You can then add coverage for collision, personal property, emergency roadside assistance, and rental car coverage. These additional layers of coverage are called riders. Just like your car insurance policy, disaster recovery plans may include optional riders. You simply need to decide the types of riders that your company needs, or can afford. How do you do this? You have to look at the potential losses that your company will suffer if a disaster occurs and decide which ones are acceptable or unacceptable, as the case may be. You then select the riders that will protect you against the losses that you have decided are unacceptable. (This analogy is discussed in further detail in Chapter 2, Backing It All Up.)

  Page 6

  You need to make the same kind of decisions on behalf of your company. If it is unacceptable to lose a single day's worth of data when a disaster happens, then you need to send your volumes to an off-site storage vendor every single day. You must decide what kind of losses your company is not willing to accept, and then insure against those losses with your disaster recovery plan. You cannot design a disaster recovery plan without this step. Every decision that you must make will be based on the information you discover during this analysis. Doing otherwise might cause you to purchase riders that you don't need or to leave out ones that you do need.

  Classify Your Data

  What is considered an acceptable loss for office automation data may not be considered acceptable when considering your customer database. Some data is easily re- created with effort, while other data is irreplaceable. Look at each type of data that you have and decide whether it can be re-created.

  There are several types of re-createable data. Suppose you are a company that sells a software product. You have hundreds of developers working around the clock on a very important product. If disaster hits, they would hate it, but they could re-create their work. The schedule will slip, but with enough time, you could replace the enhancements that they made to the code. As a rule, if data is being created by a single person or group of people, without interaction from anyone outside your

  

company, then that data is probably replaceable. This is not to say that this data should not be backed up. It means that you might decide not to send volumes off-site

  for this type of data every single day, since both the volumes and the storage vendor cost money. You might decide to send them off-site only once a week. On the other hand, the cost of re-creating that data must be taken into account, and you may not want to explain to a group of 200 developers why they have to re-create everything they did last week. If that is the case, then you have defined that losing more than one day's worth of anyone's work is unacceptable. Great! That's the purpose of this step.

  There are types of data that are always irreplaceable. Suppose that you work in a hospital where patients come in to have MRIs and CAT scans performed in preparation for surgery or medical treatments. These images are stored digitally-there are no films. The doctors and surgeons use these images to plan critical operations or delicate treatments. What if a failure occurred that destroyed these images? These scans are often a picture of a progressing illness at a particular point in time. The loss of these images not only would expose the hospital and doctors to possible lawsuits but also could cost someone her life.

  There are also financial institutions and brokerage firms that process hundreds of thousands of transactions each day. These transactions can total millions of dol-

  Page 7

  lars. A loss of a single transaction could be devastating. Would you want your bank to lose the direct deposit of your paycheck? Would you want your brokerage firm to lose your buy request for that hot new Internet IPO stock? Examples of irreplaceable data do not have to be so devastating. Suppose a customer asks to have his address changed. You update the system and then you suffer a disaster. Do you even remember which customers called you last week, let alone what they asked for? Probably not. Your customer will sit at his new address awaiting his statement or product while you ship it to the old address. The result is that your credibility is destroyed in the customer's eyes. In today's world, you may end up on 20/20 or Dateline NBC. In some instances, sending your backup volumes off-site daily (or hourly) is sufficient. However, there are situations in which the data is so critical and irreplaceable, the data must be duplicated and sent off-site immediately.

  Assign a Monetary Value to Your Data

  It is not possible to assign a monetary value to all types of data. How do you decide what an angry customer will cost you? (A truly angry customer can significantly cripple your business-especially if she sues you.) With other types of data, though, it is very easy. If you have five people who will have to redo a week's worth of their work, then the cost is a week's worth of their salaries, plus overhead. There are other things that are more difficult to calculate, such as the loss of productivity due to a drop in morale.

  Weigh the Cost

  You should not just blindly spend money on a disaster recovery plan that is more expensive than a disaster would be. This sounds like a given, but it can happen if you are not careful. It is possible that there are certain types of losses that you feel are unacceptable, no matter what the cost is to insure against them; that is fine, but make sure that you are insuring against them deliberately-and for all the right reasons.

  Step 2: Back Up Everything

  This sounds like a given, right? It's not. Certain types of data typically are excluded or forgotten. Many companies cut corners by omitting certain types of data from their backups. For example, by excluding the operating system from your backups, you may save a little media. However, if you find yourself in need of the old /etc/

  fstab, you will be out of luck. You may save some money, but you also may be putting your company at risk. It's easier and safer just to back up everything.

  Page 8

  There also may be types of data that are forgotten completely. The most common mistake is to back up the data on a system but not to get a "picture" of what the system itself looks like in case you have to rebuild it.

  Exclude Lists Good, Include Lists Bad

  It is best to have a system that automatically backs up everything, except for a few explicit exceptions specified on an exclude list. If your backup system requires you to update an include list every time a new filesystem is added, you may forget or you may add it incorrectly; the result is that the filesystem does not get backed up. In a disaster, this means the data never comes back. This is why I prefer backup products that automatically back up all filesystems. (The concept of include and exclude lists is covered in Chapter 2.)

  Databases

  Backing up a database requires more work than backing up a normal filesystem. (Actual database backup procedures are covered in Part V of this book.) Theoretically, if you are backing up everything in your filesystems and you are backing up your databases in some manner, you should be able to recover from disaster. Unfortunately, there are scenarios in which you might leave out an essential piece of the disaster recovery puzzle. The only way to ensure that you are prepared to recover your databases in case of a disaster is to back them up to another machine.

  In fact, a previous version of my Oracle backup script (see Chapter 15, Oracle Backup & Recovery) did not back up the online redologs during a hot backup. All my backup and recovery tests worked fine, until I attempted to restore the database to a different system. We were able to restore all the database files, but the database needed the redologs in order to complete the recovery. Since we had not backed up the redologs, we did not have them to restore. You see, when I was recovering the database to the same system, the redologs were always there. (Of course, I immediately changed the script to address this problem.)

  Backups of Your Backups

  Whether you are using a homegrown solution that creates flat file indexes of your volumes or a commercial backup product that has a btree index, you need to be able to recover it easily. Think about it. Even if your commercial backup system makes volumes that can be read by native backup utilities, without the database that identifies what's where, you have no idea what system is on what volume. That means that this database has now become the most important database in your

  company. You need to make sure that it is backed up, and its recovery