Yes. NT kernel-based operating systems are only immune to executable viruses when run

• malignant viruses • propagation engine • scripting hosts • self-replicating • shell • signature • virus scanner • worms

Chapter 9: Creating Fault Tolerance

Overview Security means more than just keeping hackers out of your computers. It really means keeping your data safe from loss of any kind, including accidental loss due to user error, bugs in software, and hardware failure. Systems that can tolerate hardware and software failure without losing data are said to be fault tolerant. The term is usually applied to systems that can remain functional when hardware or software errors occur, but the concept of fault tolerance can include data backup and archiving systems that keep redundant copies of information to ensure that the information isnt lost if the hardware it is stored upon fails. Fault tolerance theory is simple: Duplicate every component that could be subject to failure. From this simple theory springs very complex solutions, like backup systems that duplicate all the data stored in an enterprise, clustered servers that can take over for one another automatically, redundant disk arrays that can tolerate the failure of a disk in the pack without going offline, and network protocols that can automatically reroute traffic to an entirely different city in the event that an Internet circuit fails. Causes for Loss To correctly plan for fault tolerance, you should consider what types of loss are likely to occur. Different types of loss require different fault tolerance measures, and not all types of loss are likely to occur to all clients. fault tolerance The ability of a system to withstand failure and remain operational. At the end of each of these sections, there will be a tip box that lists the fault tolerance measures that can effectively mitigate these causes for loss. To create an effective fault tolerance policy, rank the following causes for loss in the order that you think they’re likely to occur in your system. Then list the effective remedy measures for those causes for loss in the same order, and implement those remedies in top-down order until you exhaust your budget. Note The solutions mentioned in this section are covered in the second half of this chapter. Human Error User error is the most common reason for loss. Everyone has accidentally lost information by deleting a file or overwriting it with something else. Users frequently play with configuration settings without really understanding what those settings do, which can cause problems as well. Believe it or not, most computer downtime in businesses is caused by the activities of the computer maintenance staff. Deploying patches without testing them first can cause servers to fail, performing maintenance during working hours can cause bugs to manifest and servers to crash. Leading edge solutions are far more likely to have undiscovered problems, and routinely selecting them over more mature solutions means that your systems will be less stable. Tip A good archiving policy provides the means to recover from human error easily. Use permissions to prevent users’ mistakes from causing widespread damage. Routine Failure Events Routine failure events are the second most likely causes for loss. Routine failures fall into a few categories that are each handled differently. Hardware Failure Hardware failure is the second most common reason for loss and is highly likely to occur in servers and client computers. Hardware failure is considerably less likely to occur in devices that do not contain moving parts. The primary rule of disk management is: Stay in the mass market—don’t get esoteric. Unusual solutions are harder to maintain, are more likely to have buggy drivers, and are usually more complex than they are worth. Every hard disk will eventually fail. This bears repeating: Every hard disk will eventually fail. They run constantly in servers at high speed, and they generate the very heat that destroys their spindle lubricant. These two conditions combine to ensure that hard disks wear out through normal use within about 10 years. Note Early in the computer industry, the Mean Time Between Failure MTBF of a hard disk drive was an important selling point. Mean Time Between Failures MTBF The average life expectancy of electronic equipment. Most hard disks have an MTBF of about five years. The real problem with disk failure is that hard disks are the only component in computers that can’t be simply swapped out because they are individually customized with your data. To tolerate the failure of your data, you must have a copy of it elsewhere. That elsewhere can be another hard disk in the same computer or in another computer, on tape, or on removable media. removable media