Fragment-and-Replicate Join
18.5.2.2 Fragment-and-Replicate Join
Partitioning is not applicable to all types of joins. For instance, if the join condition is an inequality, such as r ✶ r.a<s.b s , it is possible that all tuples in r join with some tuple in s (and vice versa). Thus, there may be no easy way of partitioning r and s so that tuples in partition r i join with only tuples in partition s i .
We can parallelize such joins by using a technique called fragment and replicate.
We first consider a special case of fragment and replicate— asymmetric fragment-
and-replicate join —which works as follows:
1. The system partitions one of the relations—say, r . Any partitioning tech- nique can be used on r , including round-robin partitioning.
2. The system replicates the other relation, s, across all the processors.
3. Processor P i then locally computes the join of r i with all of s, using any join technique.
The asymmetric fragment-and-replicate scheme appears in Figure 18.3a. If r is already stored by partitioning, there is no need to partition it further in step 1. All that is required is to replicate s across all processors.
The general case of fragment-and-replicate join appears in Figure 18.3b; it works this way: The system partitions relation r into n partitions, r 0 , r 1 ,..., r n− 1 , and partitions s into m partitions, s 0 , s 1 ,..., s m− 1 . As before, any partitioning technique may be used on r and on s. The values of m and n do not need to
be equal, but they must be chosen so that there are at least m ∗ n processors. Asymmetric fragment and replicate is simply a special case of general fragment and replicate, where m = 1. Fragment and replicate reduces the sizes of the relations at each processor, compared to asymmetric fragment and replicate.
18.5 Intraoperation Parallelism 809
P n–1,m–1 (a) Asymmetric
r n–1
(b) Fragment and replicate fragment and replicate
Figure 18.3 Fragment-and-replicate schemes.
Let the processors be P 0,0 , P 0,1 ,..., P 0,m−1 , P 1,0 ,..., P n− 1,m−1 . Processor P i, j computes the join of r i with s j . Each processor must get those tuples in the partitions on which it works. To accomplish this, the system replicates r i to pro-
cessors P i, 0 , P i, 1 ,..., P i,m− 1 (which form a row in Figure 18.3b), and replicates s i to processors P 0,i , P 1,i ,..., P n− 1,i (which form a column in Figure 18.3b). Any join technique can be used at each processor P . i, j
Fragment and replicate works with any join condition, since every tuple in r can be tested with every tuple in s. Thus, it can be used where partitioning cannot be.
Fragment and replicate usually has a higher cost than partitioning when both relations are of roughly the same size, since at least one of the relations has to be replicated. However, if one of the relations—say, s —is small, it may be cheaper to replicate s across all processors, rather than to repartition r and s on the join attributes. In such a case, asymmetric fragment and replicate is preferable, even though partitioning could be used.
Parts
» Indian Institute of Technology, Bombay
» Data Mining and Information Retrieval
» Structure of Relational Databases
» Database Schema When we talk about a database, we must differentiate between the database
» Basic Structure of SQL Queries
» Modification of the Database
» • Embedded SQL : Like dynamic SQL , embedded SQL provides a means by
» Advanced Aggregation Features**
» The Cartesian-Product Operation
» The Tuple Relational Calculus
» The Entity-Relationship Model
» • For an n-ary relationship set with an arrow on one of its edges, the primary
» Entity-Relationship Design Issues
» Representation of Generalization
» Alternative Notations for Modeling Data
» Other Aspects of Database Design
» Features of Good Relational Designs
» Atomic Domains and First Normal Form
» Decomposition Using Functional Dependencies
» BCNF Decomposition Algorithm
» Decomposition Using Multivalued Dependencies
» Application Programs and User Interfaces
» Overview of Physical Storage Media
» Magnetic Disk and Flash Storage
» Organization of Records in Files
» Comparison of Ordered Indexing and Hashing
» Implementation of Pipelining
» Evaluation Algorithms for Pipelining
» Transformation of Relational Expressions
» (A, r ), the number of distinct values that appear in the relation r for attribute
» Advanced Topics in Query Optimization**
» Transaction Atomicity and Durability
» Transaction Isolation and Atomicity
» Implementation of Isolation Levels
» Transactions as SQL Statements
» Weak Levels of Consistency in Practice
» Concurrency in Index Structures**
» Failure with Loss of Nonvolatile Storage
» Early Lock Release and Logical Undo Operations
» Centralized and Client – Server Architectures
» Parallelism on Multicore Processors
» Recovery and Concurrency Control
» Distributed Query Processing
» Heterogeneous Distributed Databases
» Partitioning and Retrieving Data
» Transactions and Replication
» Decision-Tree Construction Algorithm
» Relevance Ranking Using Terms
» Synonyms, Homonyms, and Ontologies
» Crawling and Indexing the Web
» Information Retrieval: Beyond Ranking of Pages
» Structured Types and Inheritance in SQL
» Array and Multiset Types in SQL
» Application Program Interfaces to XML
» Native Storage within a Relational Database
» Other Issues in Application Development
» Representation of Geographic Data
» Transaction-Processing Monitors
» Real-Time Transaction Systems
» PostgreSQL Implementation of MVCC
» Database Design and Querying Tools
» Database Administration Tools
» Business Intelligence Features
Show more