Problem 5: Query Evaluation Consider the join between relations R and S, where t
ID: 3711503 • Letter: P
Question
Problem 5: Query Evaluation
Consider the join between relations R and S, where the join condition is R.a = S.b. We are given the following information about the two relations. The cost metric is the number of page I/Os, and the cost of writing out the result should be uniformly ignored.
Relation R contains 10,000 tuples and has 10 tuples per page.
Relation S contains 2,000 tuples and also has 10 tuples per page.
Attribute b of relation S is the primary key for S.
Both relations are stored as simple heap files.
Neither relation has any indexes built on it.
52 buffer pages are available.
Questions:
What is the cost of joining R and S using a simple nested loops join?
What is the cost of joining R and S using a block nested loops join?
What is the cost of joining R and S using a sort-merge join?
What is the cost of joining R and S using a hash join?
How many tuples will the join of R and S produce, at most, and how many pages would be required to store the result of the join back on disk?
If secondary B+ indexes existed on R.a and S.b, would either provide a cheaper alternative for performing the join (using an index nested loops join) than a block nested loops join? Explain.
If primary B+ indexes existed on R.a and S.b, would either provide a cheaper alternative for performing the join (using the index nested loops algorithm) than a block nested loops join? Explain.
Explanation / Answer
Solution:
Number of tuples in relation R = 10,000
Number of tuples per page in relation R = 10
Number of total number of tuples in R = 1000
Number of tuples in relation S = 2,000
Number of tuples per page in relation S = 10
Number of total number of tuples in S = 200
What is the cost of joining R and S using a block nested loops join?
There are 2 options
1. R is outer, S is inner relation
2. S is outer, R is inner relation
Buffer utilization
Output: 1 page
Inner relation: 1 page
Outer relation: 50 pages
1. R is outer, S is inner relation
Cost to read outer = 1,000 I/Os
Join = 4,000 I/Os
Total cost = (1,000 + 4,000) = 5,000 I/Os
2. S is outer, R is inner relation
Cost to read outer = 200 I/Os
Join = 4,000 I/Os
Total cost = (200 + 4,000) = 4,200 I/Os
Best case is the first option
Therefore cost is 4,200 I/Os
To minimize the cost of the join, the buffer size should be 202 pages so that we can store entire smaller relation in the buffer.
What is the cost of joining R and S using a sort-merge join?
Sorting Phase
Since we have a 52 page buffer, we have enough space to hold sorted sublists = 21 sublists < 52
Cost to sort R = 2*( 1,000 + 1,000 ) = 4,000 I/Os
Cost to sort S = 2* ( 200 + 200 ) = 800 I/Os
Merging phase
Since b is the primary key for S, we do not have to worry about duplicates.
Cost for merging = 1,000 + 200 = 1,200 I/Os
Total Cost = (4,000 + 800 + 1,200) = 6,000 I/Os
What is the cost of joining R and S using a hash join?
Partitioning Phase
I/O cost = 2*1000(read and write) + 2*200(read and write) = 2,400 I/Os
Probing Phase
Need to check whether the buffer size is enough to hold all buckets. It should satisfy the following condition
B > f ? M/(B ? 1)
Since 52 > 1.1 ? 1, 000/(52 ? 1) = 21.5, the buffer size is adequate.
Therefore, I/O cost = 1000(read) + 200(read) = 1,200 I/Os
Total cost = 2,400 + 1,200 = 3,600 I/Os
Note: please repost others.
I hope this helps if you find any problem. Please comment below. Don't forget to give a thumbs up if you liked it. :)