Determination of Semi-Join Programs in SDD-1

  Determination of Semi-Join Programs in SDD-1

  • In the SDD-1 approach semi-joins are used for reducing In the SDD 1 approach, semi joins are used for reducing cardinalities of relation; when they have been applied to the maximum extent, all relations are collected at the same site, where query can be executed. h b t d
  • The SDD-1 approach consists of a basic algorithm for determining feasible execution strategy and the using g

  gy g postoptimization criteria to improve it.

  

Basic SDD-1 Algorithm

  • The Basic SDD-1 Algorithm constructs reducers program for relations; reducers consists of unary operations & semi-joins, which are selected on the basis of their cost. j ,
  • Consider the semi-join R SJ S; it has no cost when R

  A=B & S are stored at the same site.

  • When R & S are at different site cost is

  Cost(R SJ S) = C + val(B[S]) X size(B) X C

  A=B

  1

  • The benefit of semi-join • The benefit of semi-join benefit(R SJ S)= (1 -

  ρ) X size(R) X card(R) X C

  A=B

  1

  where ρ is the selectivity of the semi-join.

  The algorithm can be given as follows •

1. Basis :- A join graph G is given. All local

  1 Basis : A join graph G is given All local reductions to relations appearing in the G have been applied already. have been applied already

  2. Method :- While there are profitable semi joins, include either most profitable semi-joins include either most profitable or the cheapest one in the reducer p g program of the relation to which it applies, reevaluate benefits and costs of the affected semi-joins.

  3. Termination :- The site which requires less transmission is selected for collecting all the relations.

  

Query optimization using SDD-1 algorithm

  Definition of Optimization Problem

  DEPTNUM=DEPTNUM SNUM=SNUM SNUM=SNUM DEPT SUPPLIER SUPPLY Profiles Profiles

DEPT SUPPLIER SUPPLY

  Card(SUPPLY) = 5000 Card(DEPT) = 20 Card(SUPPLIER) = 200 Site(SUPPLY) = 2 Site(SUPPLY) = 2 Site(DEPT) = 3 Site(DEPT) = 3 Site(SUPPLIER) Site(SUPPLIER) = 1

  1 SNUM NAME SNUM DEPTNUM DEPTNUM NAME SIZE 4 20 SIZE 4 2 SIZE 2 3

  VAL 200 200

  VAL 1000 100

  VAL 20 5 Assumptions All values of SNUM in SUPPLIER are present in SUPPLY All values of SNUM in SUPPLIER are present in SUPPLY All values of DEPTNUM in DEPT are present in SUPPLY

  

Description of all possible semi-joins

  Semi-joins S i j i Selectivity S l ti it Benefits B fit C Cost t p1:SUPPLY NSJ SUPPLIER p1:SUPPLY NSJ SUPPLIER ρ(p1) = 0.2 0.8X6X5000 4X200 ρ(p1) = 0 2 0 8X6X5000 4X200 p2:SUPPLY NSJ DEPT

  ρ(p2) = 0.2 0.8X6X5000 2X20 p3 SU

  • p3:SUPPLIER NSJ SUPPLY SJ SU

  4x1000 000 ρ(p3) = 1 ρ(p3) p4: DEPT NSJ SUPPLY

  • Profitable semi-joins: p1 & p2

  2X100 ρ(p4) = 1

  • Iteration 1 : p2 selected
  • Effect on the profile of SUPPLY p

  Card(SUPPLY) = 1000 C(n,m,r) used for SNUM, Site(SUPPLY) = 2 with n= 5000, r=1000 m=val(SNUM[SUPPLY])=1000 l(SNUM[SUPPLY]) 1000 SNUM DEPTNUM r, for r < m/2 SIZE 4 2 c(n,m,r) = ( ) (r m)/3, for m/2 (r+m)/3, for m/2 <= r < 2m r 2m

  VAL 666 20 666

  VAL

  20 m, for r>= 2m Effect on other Semi-joins Selectivity Benefits Cost p1:SUPPLY NSJ SUPPLIER

  4X200 ρ(p1) = 0.2 0.8X6X5000 p3:SUPPLIER NSJ SUPPLY

3 SUPPLIER NSJ SUPPLY ( 3) 0 666 0 333X24X200 4x666 4 666 ρ(p3) = 0.666 0.333X24X200 p4: DEPT NSJ SUPPLY

  2X20 ρ(p4) = 1 - Profitable semi-joins: p1 & p3

  • Iteration 2 : p1 selected
  • Effect on the profile of SUPPLY p

  C(n,m,r) used for DEPTNUM, Card(SUPPLY) = 200 with n= 1000, r=200 Site(SUPPLY) = 2 ( ) m=val(deptNUM[SUPPLY ])=20 m=val(deptNUM[SUPPLY’])=20 SNUM DEPTNUM r, for r < m/2 c(n,m,r) = ( (r+m)/3, for m/2 <= r < 2m ) SIZE SIZE 4 2

  4

  2 m, for r>= 2m

  VAL 123 20 Effect on other Semi-joins Selectivity Benefits Cost p3:SUPPLIER NSJ SUPPLY 4x123 ρ(p3) = 0.666 0.333X24X200 p4:SUPPLY NSJ DEPT

4 SUPPLY NSJ DEPT ( 4)

  1

  2X20

  2X20 ρ(p4) = 1 - Profitable semi-joins: p3

  • Iteration 3 : p3 selected
  • Effect on the profile of SUPPLIER p

  Card(SUPPLIER) = 123 Site(SUPPLIER) = 1 ( ) SNUM NAME SIZE 4 20 SIZE

  4

20 VAL 123 123 Effect on other Semi-joins Selectivity Benefits Cost p4:SUPPLY NSJ DEPT

  2X20 ρ(p4) = 1 - No other Profitable semi-joins

  

Selection of the site

for collecting all the relations f ll ti ll th l ti Cost(site 1) = 6 X 200 + 5 X 20 = 1300 Cost(site 2) 24 X 123 + 5 X 20 Cost(site 2) = 24 X 123 + 5 X 20 = 3052 3052 Cost(site 3) = 24 X 123 + 6 X 200 = 4152 Site 1 Selected.

  

Postoptimization

  • To improve the obtained solution a • To improve the obtained solution, a postoptimization can be made. The postoptimization obeys two criteria postoptimization obeys two criteria

  1. Eliminating the semi-joins whose only effect is to reduce relations that are already on the site to reduce relations that are already on the site selected for executing the query.

2. Delaying expensive semi joins R SJ S after

  

2 Delaying expensive semi-joins R SJ S after

reduction of S by means of other semi-joins; this requires changing the order of application q g g pp of semi-join operations.

  Cont… Example

  • Postoptimization Postoptimi ation Since semi-join p3 has the only effect of

    reducing relation SUPPLIER, which is at the

    selected site 1, p3 is not useful.
  • Summary Summary DEPT is sent to site 1 without semi-join, at a cost of 100. SUPPLY is reduced by the two cost of 100. SUPPLY is reduced by the two semi-joins with SUPPLIER & DEPT, at a cost

    of 840, then sent to site 1, at a cost of 1200. of 840, then sent to site 1, at a cost of 1200.

  Apers, Hevner and Yao (AHY) algorithm algorithm

  General Queries

  • • General queries are the queries with joins &

    unions in their optimization graph.
  • • The basic transformation criteria used is the

    commutativity of Join & Unions for y generating distributed joins.

  Effect of Commuting Joins & Unions

  The commutation of joins & unions can be represented in The commutation of joins & unions can be represented in figure, which represents three different optimization graph of same query. I fi ( ) f In fig (a), fragments are first collected then joined, called as t fi t ll t d th j i d ll d nondistributed join.

  In fig (b), fragments are first joined then collected, called as distributed join.

  

1. Nondistributed join :- This optimization problem is much

simpler. It reduces to determining a pair of sites (possibly the same site) at which union operations are

performed. If the sites are different, then the query is

reduced to a simple join query between two relations. reduced to a simple join query between two relations

  2. Distributed join :- This optimization problem is much harder. The join graph of join between R and S within j g p j hypernode representing the union operation. The knowledge of fragmentation criteria must be used for eliminating edges from join graph.

  Once minimal join graph has been determined, the execution of joins appearing in the join must be execution of joins appearing in the join must be optimized. Finally, join results are sent to the same site for y, j performing Union.

  • It is also possible to perform Partial Unions, before performing joins (as in fig(c)). p g j ( g( ))
  • In building optimization graph G’ of fig (C) from op optimization graph G of fig (b), following rules are a o g ap G o g (b), o o g u es a e used.

1.Fragment on which partial unions are performed

  1.Fragment on which partial unions are performed are enclosed into hypernode ( { ( {R1, R2} & {S2, S3}). , } { , })

  

2.If two fragments Ri & Sj are connected by an arc

in G, then the nodes to which they belong are , y g also connected by an arc in G’.

  ( (Edge between R1 & S2 in G generates the edge g g g

  • Partitioned join graphs are a important class of j join graphs for optimization. g p p
  • • In the Partitioned join graphs, each subgraph can

    independently optimized due to facts depe de y op ed due o ac s
    • – The optimization of joins of disconnected subgraphs can be performed independently.
    • – The union operation is not affected by the order in which operands are collected.

  • This property allows the building of a variety of strategies involving partial unions for each g g p .

  operation

  • Searching for the best execution of a given query g

  g q y requires.

  • – Generating all the possible query optimization graphs.
  • – Applying join queries methods to optimize joins & adding cost of A l i j i i th d t ti i j i & ddi t f unions.
  • – Selecting the best query processing policy among them.
    • • The figure shows the four alternative ways of computing

      the cost of join graph of two relations R & S having two

      fragment each. f t h