Consider the join R S where the join predicate is R.a = S.b, given the following
ID: 3850127 • Letter: C
Question
Consider the join R S where the join predicate is R.a = S.b, given the following metadata about R and S:
• Relation R contains 20,000 tuples and has 10 tuples per block
• Relation S contains 5,000 tuples and has 10 tuples per block
• Attribute b of relation S is the primary key for S, and every tuple in S matches 3 tuples in R
• There exists a unclustered (secondary) index on R.a with height 3
• There exists a clustered (primary) index on S.b with height 2
• The main memory buffer can hold 5 blocks (B=5)
Answer the following questions:
a. If R S is evaluated with a block nested loop join, which relation should be the outer relation? Justify your answer. What is the cost of the join in number of I/O’s?
b. If R S is evaluated with an index nested loop join, what will be the cost of the join in number of I/O’s? Show your cost analysis.
c. What is the cost of a plan that evaluates this query using sort-merge join. Show the details of your cost analysis.
d. Evaluate the cost of computing the R S using hash join assuming: i) The main memory buffer can hold 202 blocks, ii) The main memory buffer can hold 11 blocks
Explanation / Answer
a. Block nested-loop join: If r1 is the outer relation, we need 800 M1 50+ 800 disk accesses, if r2 is the outer relation we need 1500 M1 800 + 1500 disk accesses.
b.index nested loop join:Using r1 as the outer relation we need 20000 1500 + 800 = 30, 000, 800 disk accesses, if r2 is the outer relation we need 5000 800 + 1500 = 36, 001, 500 disk accesses.
c.sort-Merge-join: Assuming thatr1 and r2 are not initially sorted on the join key, the total sorting cost inclusive of the output is Bs = 1500(2logM1(1500/M)+2+ 800(2logM1(800/M) + 2) disk accesses. Assuming all tuples with the same value for the join attributes fit in memory, the total cost is Bs + 1500 + 800 disk accesses.
d. Hash-join: We assume no overflow occurs. Since r1 is smaller, we use it as the build relation and r2 as the probe relation. If M > 800/M, i.e. no need for recursive partitioning, then the cost is 3(1500+800) = 6900 disk accesses, else the cost is 2(1500 + 800)logM1(800) 1 + 1500 + 800 disk accesses.