Reliability is defined as “a measure of the
Unit VI Unit VI Reliability Reliability Reliability Reliability
is defined as a measure of the is defined as “a measure of the success with which the system conforms to some authoritative specification of its behavior… p When the behavior deviates from that which is specified for it, this is called a Failure”
Basic Concept • The reliability can be divided into two parts • The reliability can be divided into two parts.
- – Application Dependent.
Application Independent – Application Independent.
- The Application Independent specification of reliability consists in requiring that transaction maintain atomicity consists
in requiring that transaction maintain atomicity, durability, serializability & isolation properties.
Application dependent part consists of requiring that transaction fulfill the general system’s specifications.
- Application dependent part consists of requiring that
We emphasize two aspects of reliability : – Correctness.
- We emphasize two aspects of reliability :
- – Availability Availability.
- Example :‐ Consider the DD consisting of two sites 1 & 2.
Let X1 & X2 are copies of X at site 1 & 2. Consider transaction T updates X. T will perform operation as : operation as :
Lock X1, Lock X2, Prepare Update & Perform 2 ‐ Phase Commitment.
If communication networks fails after both sites have decided to commit but before the commit is sent from site site 1 (the coordinator) to site 2 1 (the coordinator) to site 2.
There are two possible strategies to handle the problem.
¾First considers the correctness requirement by keeping X2 locked until failure is repaired.
¾Second maximizes the availability at the risk of ¾Second maximizes the availability at the risk of Following are the problems when we try to design a reliable distributed database system.
- Commitment of transaction :‐ If we use 2‐Phase commitment protocol, we lose availability. We can use different protocols which allow a transaction to terminate properly even in presence of failures. These called Termination Protocols.
- Multiple copies of data & robustness of concurrency
control : ‐
- Determining the state of the network :‐
- Detection & resolution of inconsistencies :‐
- Checkpoints & Cold restart :‐
Nonblocking Commitment Protocols
- A commitment protocol is called blocking if A commitment protocol is called blocking if occurrence of some kinds of failures forces some of the participating sites to wait until failure is p p g repaired.
- A transaction which can not be terminated at a site is called pending at this site.
p g
- The 2‐Phase commitment protocol is blocking if coordinator fails & some participant has at the same time declared itself ready to commit.
- In this case, the participant must wait for recovery of the coordinator.
I - / PM
I PM / RM ua / PM / AAM U tm / ACM R A C AAM / ACM RM / CM C A CM / - ACM / - A C Coordinator C A Participant State diagram for the 2-Phase-Commitment Protocol
Messages Notes PM = Prepare Message PM = Prepare Message I = Initial State RM = Ready Answer Message U = Uncertain AAM = Abort Answer Message (Waiting for some information) ACM = Abort Command Message g R = Ready to Commit. R R d t C it CM = Commit Command Message A = Abort Local Condition C = Commit ua = local unilateral abort tm = timeout = Transitions which are due to an exchange of messages. = Transitions which are due to an exchange of messages = Unilateral Transitions (Unilateral abort or timeout) (
) g g , α / β = α is the incoming message or local condition, β is the generated message β
• If a state diagram of this kind is used for analyzing
reliability aspects of a protocol care must be reliability aspects of a protocol, care must be taken in assuming that transitions from one state to to another are atomic another are atomic.• For example, consider a transition from state X to
state Y i h i I & O Y with input I & output O.- The following behavior is assumed.
1. The input message I is received.
2. The
2 The new state Y is recorded on stable storage new state Y is recorded on stable storage.
3. The output message O is sent.
Nonblocking Commitment Protocols with with Site Failures Site Failures
- We are interested in designing a termination
protocol for the 2‐Phase Commitment protocol
which allows the transaction to be terminated at all operational sites, when a failure of the coordinator site occurs. - This is possible only in theses two cases
1. At
1. At least one of the participant has received the least one of the participant has received the command.
2 None of the participant has received the of the participant has received the
2. None
The 3‐phase commitment protocol
- In this protocol, the participants do not directly commit the transaction during the second phase of commitment instead they reach in this phase a new Prepared‐to‐Commit (PC) state.
I / PM
I PM / AAM U - / PM PM / RM ua / - U tm / ACM AAM / ACM RM / PCM R A PCM / OK ACM / - A PC PC tm / ACM OK/CM CM / -
Coordinator Participant C C
State diagram for the 3-Phase-Commitment Protocol
New States PC = Prepared-to-Commit p New Messages PCM = Enter the PC state = Possible restart• This new protocol eliminates the blocking problem of
the 2-phase-commitment protocol because p
p
1. If one of the operational participants has received the
command and the command was ABORT then the operational participants can abort the transaction
2. If one of the operational participants has received the
commands and the command was ENTER- d d h d ENTER PREPARED-STATE, then all the operational participants can commit the transaction participants can commit the transaction3. If none of the operational participants has received
the ENTER PREPARED STATE command , we have the ENTER-PREPARED-STATE command , we have
the case which can not be terminated for a 2PC protocol. As it has reached the new state , the failed participant will therefore abort the transaction atTermination protocols for 3‐phase‐commitment
- The design of termination protocols is based on the following • The design of termination protocols is based on the following property.
If at least one operational participant has not entered
- If at least one operational participant has not entered
the Prepared‐to‐Commit state, then the transaction can be safely aborted.
- If at least one operational participant has entered the
Prepared ‐to‐Commit state, then the transaction can be safely committed
- Since the above conditions are not mutually exclusive, in several the termination protocol can decide whether to commit or
cases cases the termination protocol can decide whether to commit or
abort. - A protocol which always commits the transaction when both the are possible is called cases Progressive .
- The simplest termination protocol is the centralized , nonprogressive protocol. • First the coordinator is elected by the operational participant.
- The new coordinator behaves as follows.
1 If the new coordinator is in the Prepared to Commit State it issues the new coordinator is in the Prepared‐to‐Commit State, it issues
all operational participants the command to enter also in the to ; when it has received all the OK messages , it issues the state COMMIT command2. If the new coordinator is in the commit state , i.e. , it has the transaction , it issues the COMMIT command to all committed the transaction it issues the COMMIT command to all committed the participants
3. If the new coordinator is in the abort state, it issues the ABORT command to all the participants
4. Otherwise , new coordinator orders all participants to go back to a
state state previous to the Prepared to Commit and after it has previous to the Prepared‐to‐Commit , and after it has- This protocol is similar to 3‐Phase‐Commitment protocol.
- In case of failure of new coordinator, the same termination
protocol can be reentered by the remaining operational operational sites b electing a ne coordinator sites by electing a new coordinator.
- Disadvantage :‐ It is nonprogressive.
Th l i hi h di b
- There are several ways in which a new coordinator can be selected. O f th t l t di t i t i
- One of the way to select new coordinator is to assign total ordering to all sites & choose the first in this order.
Restart Protocols for 3‐Phase‐Commitment
- A restart protocol is executed by a site when it recover • A restart protocol is e ec ted b a site hen it reco er from a failure.
- In the case of 2‐Phase‐Commitment, the restart • In the case of 2 Phase Commitment the restart protocol requires accessing remote recovery information, information if the participant failed while it was in if the participant failed while it was in ready state. With 3 Phase Commitment & termination protocol, the
- With 3‐Phase‐Commitment & termination protocol the restart
procedure will have to access remote recovery information if participant has completed the first phase, p p p p , independently of whether it has reached the prepared‐ to ‐commit state or not, because at restart it is not known how the transaction is terminated.
Commitment Protocols & Network Partitions
Existence Existence of nonblocking protocols for partitions
of nonblocking protocols for partitions• The problem of the existence of nonblocking protocol
in in case of partition can be addressed by considering a case of partition can be addressed by considering adifferent problem : the existence of protocols which
allow independent recovery in case of site failures. p y- Suppose that we can build the protocol such that if one one site, say site2, fails, then site, say site2, fails, then
1. The
other site, site1, terminates the transaction
2 Site2 at restart terminates the transaction correctly at restart terminates the transaction correctly without requiring any additional information from site1 site1
- The modified protocol is based on the following assumptions: ‐ assumptions:
1. A site discovers that another site is down by not receiving a required message within a given receiving a required message within a given timeout
2. A
message can be lost only because of a site
failure3. Each site receives a message , changes , and sends
the required answer as an atomic
transition
Protocol which can deals with partitions
Primary approach:- If the 2PC protocol is used together with a primary site approach , then it is possible to terminate all the transactions of the group of the primary site , if and only if the coordinators of all pending transactions belong to this group
- This can be achieved by assigning the primary site the coordinator function for all transactions.
Majority approach and quorum‐based protocols
The The basic rules of a quorum based protocols are basic rules of a quorum based protocols are:
1. Each site i has associated with a number of votes Vi , Vi being being a positive integer a positive integer.
2. Let V indicate the sum of the votes of all sites of the network network
3. A transaction must collect a commit quorum Vc before committing committing
4. A transaction must collect a abort quorum Va before aborting aborting
5. Va + Vc > V
Rule Rule 5 ensures that a transaction is either committed or 5 ensures that a transaction is either committed or
- A centralized termination protocol for the quorum‐ based
3PC has the following structure:
1. A new coordinator is elected
2. The coordinator collects state information and acts coordinator collects state information and acts according to the following rules : a. If
2. The
a. If at least one site has committed (aborted) , send a at least one site has committed (aborted) , send a COMMIT(ABORT)
command to the other sites
b. If b t e u be o otes o s tes c a e eac ed t e the number of votes of sites which have reached the prepared ‐to‐commit state is greater than equal to Vc , send a COMMIT command.
c. If the number of votes of sites in the prepare to abort state
reaches about quorum , send an ABORT command
d. If the number of votes of sites which have reached
the the prepare to commit state plus number of votes prepare to commit state plus number of votesof uncertain sites is greater than or equal to Vc ,
send send a PREPARE a PREPARE TO COMMIT command to ‐TO‐COMMIT command to uncertain sites and wait for condition 2b occure. If e If the number of votes which have reached the the number of votes which have reached the
prepare ‐to‐abort state plus number of votes of
uncertain uncertain sites is greater than or equal to Va, send sites is greater than or equal to Va, send a PREPARE‐TO‐ABORT command and wait for condition 2c occurReliability & Concurrency Control
- Suppose that there is a failure.
- How can we maximize the number of transactions which are executed during this failure by operational part of the system?
- The availability of a system which allows only one transaction to be run during failures is not g satisfactory, therefore concurrency control must be taken into account.
Nonredundant Databases
, y p
- If the database is nonredundant, then it is very simple to determine which transactions can be executed.
- Assume that 2‐Phase‐Locking is used for concurrency control. l
- As there is only one copy of each data item, this copy is either either available or not available or not.
- If we assume that only site crashes but no partitions, then y
g the availability of the items which belong to the write
‐set is not required & it is possible to spool update message for these items. l f h d b d d h
- In general, if the database is nonredundant, there is not very much to do in order to increase its availability in presence presence of failure.
of failure.
Redundant Databases There
- There are two reasons to have redundancy are two reasons to have redundancy – To increase locality of reads.
To increase availability & reliability of system – To increase availability & reliability of system.
- We have seen three main approaches to concurrency control based on 2‐PL t l b d
2 PL
- – Write‐locks‐all
- – Majority locking – Primary copy locking.
- Let us see basic tradeoffs of these approaches with example.
Example :‐ Consider a distributed databases consists of three
, , sites & three data items X, Y, Z which are stored as shown in fig (a).
All possible ways in which the network can be partitioned i i d are shown in fig (b). h i fi (b) The read‐ and write‐sets of few transactions are shown in in fig (c)
fig (c).
1
2
3 X x1 x2 Data Y y1 y3 Items Z Z z2 2 z3
3 (a) Allocation of copies of data items at sites Group Group Group 1 2 3
A) 1 2, 3 ‐‐‐
B) 2 1, 3 ‐‐‐
C) 3 1, 2 ‐‐‐
D) )
1
2
3
(b) Possible Partitions (b) Possible Partitions Trans Read Set Write Set
1 xyz xyz
2
3 y xy x y xyz xyz
4
5 xyz xy y xy xy
6
7 x xyz xy x
8
9 xy x x x
10
11 xyz xy ‐‐‐
‐‐‐ 12 x ‐‐‐
Write locks all
- Write‐locks‐all • Weighted majority locking.
- Primary copy locking.
Determining a Consistent View of the Network
- There are two aspects for this. • There are two aspects for this – Monitoring the state of the network.
Propagating a new state information to all sites
- – Propagating a new state information to all sites consistently.
- We can use timeouts in the algorithm to discover if site is • We can use timeouts in the algorithm to discover if site is down. But use of timeout may lead to inconsistent view of the
- But use of timeout may lead to inconsistent view of the network. Exmaple : Consider a 3 Site Network.
- Exmaple :‐ Consider a 3‐Site Network.
- We assume that a generalized networkwide mechanism is
built such that all higher‐level programs are provided with the following facilities.
1. There is at each site a state table containing an entry for each site. The entry can be up or down.
2. Any program can set a “watch” on any site, so that it receives
an interrupt when a site changes state.
- A site considers up only those sites with which it can communicate, i h f ll h d i hi h b l
therefore all crashed sites which belong to a different group in case of partitions are considered down. down
- We will consider separately the problem of monitoring & propagating propagating state information state information.
Monitoring the State of the Network
- Generally basic mechanism for deciding whether a site is • Generally basic mechanism for deciding whether a site is up
or down is to request a message from it & wait for a timeout.
- Let us call requesting site the controller & other site the controlled site.
- In a monitoring algorithm, instead of having controller request
messages from the controlled site, it is more easy to have controlled site send I‐AM‐UP message periodically to the controller.
- Using this mechanism for detecting whether a site is up or down, the problem consists of assigning controllers to each each site so that overall message overhead is minimized site so that overall message overhead is minimized.
- A possible solution is to assign circular ordering to the sites
and to assign to each site the function of controller of its predecessor.
- In absence of failures, each site periodically sends I‐AM‐
UP message to its successor & controls that I‐AM‐UP message from its predecessor arrives in time.
- If I‐AM‐UP message from the predecessor does not arrive
in time, then the controller assumes that the controlled controlled site has failed updates the state table & site has failed, updates the state table & broadcasts the updated state table to all other sites.
- If the predecessor of the site is down, then the site has • If the predecessor of the site is down then the site has to control its predecessor….
. . . .
. . . . K-3 K-2 K-1 K (Sites) UP DOWN DOWN UP (States) UP DOWN DOWN UP (States) States of sites
Broadcasting a New State
E h ti th it f ti d t t t t- Each time the monitor function detects a state change, it broadcasts the new state table so that
all ll it f th h t t t bl
sites of the same group have same state table. - Since this function could be activated by several
sites in parallel, some mechanism in needed to
control interference. - A possible mechanism is to attach a globally unique timestamp to each new version of a state table.
Detection & Resolution of Inconsistency
When a partition of the network occurs, transactions
- When a partition of the network occurs, transactions should
be run at most in one group of sites if we want to preserve consistency of the database.
- But in some applications transactions are allowed to run in all partitions where there is at least one copy of the necessary data to get more availability.
- When a failure is repaired, one can try to eliminate i inconsistency. i
- To do this it is necessary first to discover which portions of f th d t b i i t (D t ti f
the data become inconsistency (Detection of inconsistency) & then to assign these portions a value which which is most reasonable (Resolution of inconsistency) is most reasonable (Resolution of inconsistency).
Detection of Inconsistency
Let us assume that during a partition, transactions have
- Let us assume that during a partition, transactions have been
executed in two or more groups of sites & independent updates may have been performed on different
copies of the same fragment.
- The general approach consisting of comparing the contents of copies to check that they are identical or not is inefficient & incorrect.
- A correct approach is the detection of inconsistencies can be based on version numbers.
- The copies of data items which are stored at sites of this group are called Master copies, the others are called Isolated copies.
- During normal operation all copies are master copies & mutually consistent.
- For each copy an Original version number & Current version number are maintained.
- Initiall Original ersion n mber is set to 0 & c rrent • Initially Original version number is set to 0 & current version number is set to 1.
- Each time an update is performed on the copy only p p
py y current version number is incremented.
- When a partition occurs, the original version number of each isolated copy is set to the value of its current each isolated copy is set to the value of its current version number.
- The original version number records the current g version
number of the isolated copies before any “partitioned updates” are performed on it.
- The original version number is not altered until the • The original version number is not altered until the
• Example :‐ Let us consider copies x1, x2 & x3 of
data item x are stored at three different sites data item x are stored at three different sites.- Let V1, V2 & V3 are version number.
I iti ll ll i i t tl d t d • Initially all copies are consistently updated.
- Assume that one update is performed, so V1 = (0,2)
V2 = (0,2) V3 = (0,2)
- Now a partition occurs separating x3 from the other two copies. • Let x1 & x2 as master copies.
p
- The version number becomes now V1 = (0 2) V2 = (0 2) V3 = (2 2)
V1 = (0,2) V2 = (0,2) V3 = (2,2)
- Suppose that only master copies are updated
V1 V1 = (0 3) V2 = (0 3) V3 = (2 2) = (0,3) V2 = (0,3) V3 = (2,2)
- After repair it is possible to see that x3 has not been modified, ,
g since its current & original version number are
same.
- In this case, no inconsistency occurred & it is sufficient to perform the updates on x3.
- Now suppose that only x3 is updated during partition
V1 = (0,2) V2 = (0,2) V3 = (2,3)
- Since original version number of x3 is not equal to x1 & x2, 2 th t i h t b d t d the master copies have not been updated.
- If there are no other copies then we can apply to the master master copies the updates of x3 copies the updates of x3.
Checkpoints & Cold Restart
- Cold restart is required after some catastrophic failure q
p which has caused the loss of log information on stable storage. In DDB, cold restart is difficult because if one site has to
- In DDB cold restart is difficult because if one site has to establish an earlier state, then all other sites also have to establish earlier state.
- The recovery process is global, affecting all sites of the Th
i l b l ff ti ll it f th database. g y
- A consistent global restart C is characterized by the following properties.
- – For each transaction T, C contain the updates performed by all subtransactions of T at any site or it performed by all subtransactions of T at any site or it does not contain any of them.
- – If a transaction T is contained in C, then all conflicting transactions which have preceded T in the serialization i hi h h d d i h i li i
- The simplest way to reconstruct a global consistent consistent state in a DD is to use local dumps state in a DD is to use local dumps, local logs & global checkpoints.
• A global checkpoint is a set of local checkpoints
which are performed at all sites of the network & are synchronized by the condition “If a subtransaction of a transaction T is contained in the local checkpoint at some site, then all other subtransaction of T must be contained in the corresponding local checkpoints at other sites.”
- If global checkpoints are available then reconstruction problem is solved as follows reconstruction problem is solved as follows.
- At the failed site the latest local checkpoint which b id d f i d t i d can be considered safe is determined.
• This determines which earlier global state has to be
d reconstructed.• Then all other sites are requested to reestablish the
local states of the corresponding local checkpoints.- The main problem with this approach consists in recording global checkpoints.
- There are three possible solutions are
1 To find less expensive ways to record global find less expensive ways to record global checkpoints, so called loosely synchronized checkpoints. checkpoints All sites are asked by a coordinator to record a global checkpoint. p
2. To
avoid building global checkpoints at all, let the
recovery recovery procedure take the responsibility of procedure take the responsibility of reconstructing a consistent global state at cold restart.3. To use 2‐Phase‐Commitment protocol for
g guaranteeing g that the local checkpoints created by p y
DDB Administration It
- It deals with a variety of activities for deals with a variety of activities for development, control, maintenance & testing of the the software of database application software of database application.
- The two important issue in database
administration administration is the degree of site autonomy is the degree of site autonomy.
1.Absence of Local Autonomy :‐ The functions of a
global l b l DBA i il li d DBA DBA are similar to centralized DBA.2.Complete Local Autonomy :‐ The functions of a
global DBA are very limited, since every site is
independentlyadministered.
Catalog Management in DDB Catalogs
- Catalogs are used for are used for
1.Translating application :‐ Data referenced by application application at different levels of transparency are at different levels of transparency are mapped to physical data.
2.Optimizing Applications :‐ Data allocation, access methods available at each site & statistical information i f i i d f d i
are required for producing access plans.
3.Executing Applications :‐ Catalog information is used to verify that access plans are valid & that the
users have appropriate access rights.
Content of Catalog
1. Global Schema Description
2. Fragmentation g
p
Description3. Allocation Description
4 Mapping to Local Names to Local Names
4. Mapping
5. Access Method Description
6. Statistics on the Database
7. Consistency y ( g y Information (Protection & Integrity
Constraints)
Distribution of Catalog
Catalogs• Catalogs can be allocated in DDB in many different can be allocated in DDB in many different
ways. The basic ways are
1.Centralized
1 Centralized Catalogs Catalogs
2.Fully Replicated Catalogs.
3.Local Catalogs. p
- Several intermediate alternatives are possible like both centralized at one site & local catalogs are at all , other sites, etc..
Object Naming & Catalog Management with
Site Autonomyy
• The major requirement is to allow each local user
to create & name his data independently as well to create & name his data independently as well as allowing several users to share data. ThereforeData definition sho ld be performed locall
– Data definition should be performed locally.- – Different users should be able to give same name to different data.
- – Different users at different sites should be able to reference the same data.
1. Systemwide Names
Unique• Unique name given to each object in the system name given to each object in the system
consists of
1. ID
1 ID of the user who creates the object of the user who creates the object.
2. The site of that user.
3. The object name.
4. The j birth site of the object