Merge-Join Algorithm
12.5.4.1 Merge-Join Algorithm
Figure 12.7 shows the merge-join algorithm. In the algorithm, JoinAttrs refers to the attributes in R ∩ S, and t r ✶t s , where t r and t s are tuples that have the same
554 Chapter 12 Query Processing
pr := address of first tuple of r; ps := address of first tuple of s; while (ps = null and pr = null) do
begin
t s := tuple to which ps points; S s := {t s };
set ps to point to next tuple of s; done := false; while (not done and ps = null) do
begin
t ′ s := tuple to which ps points; if ′ (t s [JoinAttrs] = t s [JoinAttrs])
then begin
S ′ s := S s ∪ {t s }; set ps to point to next tuple of s;
end
else done := true;
end
t r := tuple to which pr points; while ( pr = null and t r [JoinAttrs] < t s [JoinAttrs]) do
begin
set pr to point to next tuple of r; t r := tuple to which pr points;
end
while ( pr = null and t r [JoinAttrs] = t s [JoinAttrs]) do
begin for each t s in S s do begin
add t s ✶t r to result;
end
set pr to point to next tuple of r; t r := tuple to which pr points;
end
end .
Figure 12.7 Merge join.
values for JoinAttrs, denotes the concatenation of the attributes of the tuples, fol- lowed by projecting out repeated attributes. The merge-join algorithm associates one pointer with each relation. These pointers point initially to the first tuple of the respective relations. As the algorithm proceeds, the pointers move through the relation. A group of tuples of one relation with the same value on the join attributes is read into S s . The algorithm in Figure 12.7 requires that every set of tuples S s fit in main memory; we discuss extensions of the algorithm to avoid this requirement shortly. Then, the corresponding tuples (if any) of the other relation are read in, and are processed as they are read.
12.5 Join Operation 555
Figure 12.8 Sorted relations for merge join.
Figure 12.8 shows two relations that are sorted on their join attribute a 1. It is instructive to go through the steps of the merge-join algorithm on the relations shown in the figure.
The merge-join algorithm of Figure 12.7 requires that each set S s of all tuples with the same value for the join attributes must fit in main memory. This require- ment can usually be met, even if the relation s is large. If there are some join attribute values for which S s is larger than available memory, a block nested-loop join can be performed for such sets S s , matching them with corresponding blocks of tuples in r with the same values for the join attributes.
If either of the input relations r and s is not sorted on the join attributes, they can be sorted first, and then the merge-join algorithm can be used. The merge-join algorithm can also be easily extended from natural joins to the more general case of equi-joins.
Parts
» Indian Institute of Technology, Bombay
» Data Mining and Information Retrieval
» Structure of Relational Databases
» Database Schema When we talk about a database, we must differentiate between the database
» Basic Structure of SQL Queries
» Modification of the Database
» • Embedded SQL : Like dynamic SQL , embedded SQL provides a means by
» Advanced Aggregation Features**
» The Cartesian-Product Operation
» The Tuple Relational Calculus
» The Entity-Relationship Model
» • For an n-ary relationship set with an arrow on one of its edges, the primary
» Entity-Relationship Design Issues
» Representation of Generalization
» Alternative Notations for Modeling Data
» Other Aspects of Database Design
» Features of Good Relational Designs
» Atomic Domains and First Normal Form
» Decomposition Using Functional Dependencies
» BCNF Decomposition Algorithm
» Decomposition Using Multivalued Dependencies
» Application Programs and User Interfaces
» Overview of Physical Storage Media
» Magnetic Disk and Flash Storage
» Organization of Records in Files
» Comparison of Ordered Indexing and Hashing
» Implementation of Pipelining
» Evaluation Algorithms for Pipelining
» Transformation of Relational Expressions
» (A, r ), the number of distinct values that appear in the relation r for attribute
» Advanced Topics in Query Optimization**
» Transaction Atomicity and Durability
» Transaction Isolation and Atomicity
» Implementation of Isolation Levels
» Transactions as SQL Statements
» Weak Levels of Consistency in Practice
» Concurrency in Index Structures**
» Failure with Loss of Nonvolatile Storage
» Early Lock Release and Logical Undo Operations
» Centralized and Client – Server Architectures
» Parallelism on Multicore Processors
» Recovery and Concurrency Control
» Distributed Query Processing
» Heterogeneous Distributed Databases
» Partitioning and Retrieving Data
» Transactions and Replication
» Decision-Tree Construction Algorithm
» Relevance Ranking Using Terms
» Synonyms, Homonyms, and Ontologies
» Crawling and Indexing the Web
» Information Retrieval: Beyond Ranking of Pages
» Structured Types and Inheritance in SQL
» Array and Multiset Types in SQL
» Application Program Interfaces to XML
» Native Storage within a Relational Database
» Other Issues in Application Development
» Representation of Geographic Data
» Transaction-Processing Monitors
» Real-Time Transaction Systems
» PostgreSQL Implementation of MVCC
» Database Design and Querying Tools
» Database Administration Tools
» Business Intelligence Features
Show more