Merge-Join Algorithm

12.5.4.1 Merge-Join Algorithm

Figure 12.7 shows the merge-join algorithm. In the algorithm, JoinAttrs refers to the attributes in R ∩ S, and t r ✶t s , where t r and t s are tuples that have the same

554 Chapter 12 Query Processing

pr := address of first tuple of r; ps := address of first tuple of s; while (ps = null and pr = null) do

begin

t s := tuple to which ps points; S s := {t s };

set ps to point to next tuple of s; done := false; while (not done and ps = null) do

begin

t ′ s := tuple to which ps points; if ′ (t s [JoinAttrs] = t s [JoinAttrs])

then begin

S ′ s := S s ∪ {t s }; set ps to point to next tuple of s;

end

else done := true;

end

t r := tuple to which pr points; while ( pr = null and t r [JoinAttrs] < t s [JoinAttrs]) do

begin

set pr to point to next tuple of r; t r := tuple to which pr points;

end

while ( pr = null and t r [JoinAttrs] = t s [JoinAttrs]) do

begin for each t s in S s do begin

add t s ✶t r to result;

end

set pr to point to next tuple of r; t r := tuple to which pr points;

end

end .

Figure 12.7 Merge join.

values for JoinAttrs, denotes the concatenation of the attributes of the tuples, fol- lowed by projecting out repeated attributes. The merge-join algorithm associates one pointer with each relation. These pointers point initially to the first tuple of the respective relations. As the algorithm proceeds, the pointers move through the relation. A group of tuples of one relation with the same value on the join attributes is read into S s . The algorithm in Figure 12.7 requires that every set of tuples S s fit in main memory; we discuss extensions of the algorithm to avoid this requirement shortly. Then, the corresponding tuples (if any) of the other relation are read in, and are processed as they are read.

12.5 Join Operation 555

Figure 12.8 Sorted relations for merge join.

Figure 12.8 shows two relations that are sorted on their join attribute a 1. It is instructive to go through the steps of the merge-join algorithm on the relations shown in the figure.

The merge-join algorithm of Figure 12.7 requires that each set S s of all tuples with the same value for the join attributes must fit in main memory. This require- ment can usually be met, even if the relation s is large. If there are some join attribute values for which S s is larger than available memory, a block nested-loop join can be performed for such sets S s , matching them with corresponding blocks of tuples in r with the same values for the join attributes.

If either of the input relations r and s is not sorted on the join attributes, they can be sorted first, and then the merge-join algorithm can be used. The merge-join algorithm can also be easily extended from natural joins to the more general case of equi-joins.