Distributed Query Processing

19.7 Distributed Query Processing

In Chapter 13, we saw that there are a variety of methods for computing the answer to a query. We examined several techniques for choosing a strategy for processing a query that minimize the amount of time that it takes to compute the answer. For centralized systems, the primary criterion for measuring the cost of

a particular strategy is the number of disk accesses. In a distributed system, we must take into account several other matters, including:

• The cost of data transmission over the network. • The potential gain in performance from having several sites process parts of

the query in parallel. The relative cost of data transfer over the network and data transfer to and from

disk varies widely depending on the type of network and on the speed of the disks. Thus, in general, we cannot focus solely on disk costs or on network costs. Rather, we must find a good trade-off between the two.

19.7.1 Query Transformation

Consider an extremely simple query: “Find all the tuples in the account relation.” Although the query is simple—indeed, trivial—processing it is not trivial, since the account relation may be fragmented, replicated, or both, as we saw in Sec- tion 19.2. If the account relation is replicated, we have a choice of replica to make. If no replicas are fragmented, we choose the replica for which the transmission cost is lowest. However, if a replica is fragmented, the choice is not so easy to make, since we need to compute several joins or unions to reconstruct the account relation. In this case, the number of strategies for our simple example may be large. Query optimization by exhaustive enumeration of all alternative strategies may not be practical in such situations.

Fragmentation transparency implies that a user may write a query such as:

branch name = “Hillside” (account)

Since account is defined as:

account 1 ∪ account 2

the expression that results from the name translation scheme is: branch name = “Hillside” (account 1 ∪ account 2 ) Using the query-optimization techniques of Chapter 13, we can simplify the

preceding expression automatically. The result is the expression:

19.7 Distributed Query Processing 855

branch name = “Hillside” (account 1 branch name = “Hillside” (account 2 ) which includes two subexpressions. The first involves only account 1 , and thus

can be evaluated at the Hillside site. The second involves only account 2 , and thus can be evaluated at the Valleyview site. There is a further optimization that can be made in evaluating:

branch name = “Hillside” (account 1 )

Since account 1 has only tuples pertaining to the Hillside branch, we can eliminate the selection operation. In evaluating:

branch name = “Hillside” (account 2 )

we can apply the definition of the account 2 fragment to obtain:

branch name = “Valleyview” (account)) This expression is the empty set, regardless of the contents of the account relation.

branch name = “Hillside”

Thus, our final strategy is for the Hillside site to return account 1 as the result of the query.

19.7.2 Simple Join Processing

As we saw in Chapter 13, a major decision in the selection of a query-processing strategy is choosing a join strategy. Consider the following relational-algebra expression:

account ✶ depositor ✶ branch

Assume that the three relations are neither replicated nor fragmented, and that account is stored at site S 1 , depositor at S 2 , and branch at S 3 . Let S I denote the site at which the query was issued. The system needs to produce the result at site S I . Among the possible strategies for processing this query are these:

• Ship copies of all three relations to site S I . Using the techniques of Chapter

13, choose a strategy for processing the entire query locally at site S I . • Ship a copy of the account relation to site S 2 , and compute temp 1 = account ✶

depositor at S 2 . Ship temp 1 from S 2 to S 3 , and compute temp 2 = temp 1 ✶ branch

at S 3 . Ship the result temp 2 to S I .

• Devise strategies similar to the previous one, with the roles of S 1 ,S 2 ,S 3

exchanged. No one strategy is always the best one. Among the factors that must be

considered are the volume of data being shipped, the cost of transmitting a block

856 Chapter 19 Distributed Databases

of data between a pair of sites, and the relative speed of processing at each site. Consider the first two strategies listed. Suppose indices present at S 2 and S 3 are useful for computing the join. If we ship all three relations to S I , we would need to either re-create these indices at S I , or use a different, possibly more expensive, join strategy. Re-creation of indices entails extra processing overhead and extra disk accesses. With the second strategy a potentially large relation (account ✶ depositor)

must be shipped from S 2 to S 3 . This relation repeats the name of a customer once for each account that the customer has. Thus, the second strategy may result in extra network transmission compared to the first strategy.

19.7.3 Semijoin Strategy

Suppose that we wish to evaluate the expression r 1 ✶r 2 , where r 1 and r 2 are stored at sites S 1 and S 2 , respectively. Let the schemas of r 1 and r 2 be R 1 and R 2 . Suppose that we wish to obtain the result at S 1 . If there are many tuples of r 2 that do not join with any tuple of r 1 , then shipping r 2 to S 1 entails shipping tuples that fail to contribute to the result. We want to remove such tuples before shipping data to S 1 , particularly if network costs are high.

A possible strategy to accomplish all this is:

1. Compute temp 1 ← R 1 ∩ R 2 (r 1 ) at S 1 .

2. Ship temp 1 from S 1 to S 2 .

3. Compute temp 2 ← r 2 ✶ temp 1 at S 2 .

4. Ship temp 2 from S 2 to S 1 .

5. Compute r 1 ✶ temp 2 at S 1 . The resulting relation is the same as r 1 ✶r 2 . Before considering the efficiency of this strategy, let us verify that the strategy

computes the correct answer. In step 3, temp 2 has the result of r 2 R 1 ∩ R 2 (r 1 ). In step 5, we compute:

r 1 ✶r 2 R 1 ∩ R 2 (r 1 )

Since join is associative and commutative, we can rewrite this expression as:

(r 1 R 1 ∩ R 2 (r 1 )) ✶ r 2

Since r 1 (R 1 ∩ R 2 ) (r 1 )=r 1 , the expression is, indeed, equal to r 1 ✶r 2 , the expression we are trying to evaluate. This strategy is particularly advantageous when relatively few tuples of r 2 contribute to the join. This situation is likely to occur if r 1 is the result of a relational-algebra expression involving selection. In such a case, temp 2 may have significantly fewer tuples than r 2 . The cost savings of the strategy result from having to ship only temp 2 , rather than all of r 2 , to S 1 . Additional cost is incurred in shipping temp 1 to S 2 . If a sufficiently small fraction of tuples in r 2 contribute

19.8 Heterogeneous Distributed Databases 857

to the join, the overhead of shipping temp 1 will be dominated by the savings of

shipping only a fraction of the tuples in r 2 .

This strategy is called a semijoin strategy , after the semijoin operator of the relational algebra, denoted ⋉. The semijoin of r 1 with r 2 , denoted r 1 ⋉r 2 , is:

R 1 (r 1 ✶r 2 )

Thus, r 1 ⋉r 2 selects those tuples of relation r 1 that contributed to r 1 ✶r 2 . In step

3, temp 2 =r 2 ⋉r 1 . For joins of several relations, this strategy can be extended to a series of semijoin steps. A substantial body of theory has been developed regarding the use of semijoins for query optimization. Some of this theory is referenced in the bibliographical notes.

19.7.4 Join Strategies that Exploit Parallelism

Consider a join of four relations:

r 1 ✶r 2 ✶r 3 ✶r 4

where relation r i is stored at site S i . Assume that the result must be presented at site S 1 . There are many possible strategies for parallel evaluation. (We studied the issue of parallel processing of queries in detail in Chapter 18.) In one such strategy, r 1 is shipped to S 2 , and r 1 ✶r 2 computed at S 2 . At the same time, r 3 is shipped to S 4 , and r 3 ✶r 4 computed at S 4 . Site S 2 can ship tuples of (r 1 ✶r 2 ) to S 1 as they are produced, rather than wait for the entire join to be computed. Similarly, S 4 can ship tuples of (r 3 ✶r 4 ) to S 1 . Once tuples of (r 1 ✶r 2 ) and (r 3 ✶r 4 ) arrive at S 1 , the computation of (r 1 ✶r 2 ) ✶ (r 3 ✶r 4 ) can begin, with the pipelined join technique of Section 12.7.2.2. Thus, computation of the final join result at S 1 can be done in parallel with the computation of (r 1 ✶r 2 ) at S 2 , and with the

computation of (r 3 ✶r 4 ) at S 4 .