Consider the join of R and S where R.a = S.b , given the following information a
ID: 3686475 • Letter: C
Question
Consider the join of R and S where R.a = S.b, given the following information about the relations to be joined. The cost metric is the number of disk I/Os and the cost of writing out the result is ignored.
Relation R contains 10,000 tuples; each tuple is 400 bytes long.
Relation S contains 2,000 tuples; each tuple is 400 bytes long.
A page size is 4096 bytes and the unpacked, bitmap page format is used. For each page, 96 bytes are reserved and cannot be used to store data. The rest of the page is used to store as many tuples as possible.
Attribute b of relation S is the primary key for S.
Both relations are stored as simple heap files. Neither relation has any index built on it.
The available memory buyer has 52 pages.
The fudge factor is 1.1.
Answer the following questions and explain why:
a) What is the cheapest cost of joining R and S using a block nested loops join for the given amount of memory buyer space? What should the number of buffer pages be so that the cost of the join is the minimum?
b) What is the cheapest cost of joining R and S using a GRACE hash join?
c) What is the cheapest cost of joining R and S using a sort-merge join?
Explanation / Answer
Number of tuples in relation R = 10,000
Number of tuples per page in relation R = 10
Number of total number of tuples in R = 1000
Number of tuples in relation S = 2,000
Number of tuples per page in relation S = 10
Number of total number of tuples in S = 200
• Block nested loops join
There are 2 options
1. R is outer, S is inner relation
2. S is outer, R is inner relation
Buffer utilization
Output: 1 page
Inner relation: 1 page
Outer relation: 50 pages
1. R is outer, S is inner relation
Cost to read outer = 1,000 I/Os
Join = 4,000 I/Os
Total cost = (1,000 + 4,000) = 5,000 I/Os
2. S is outer, R is inner relation
Cost to read outer = 200 I/Os
Join = 4,000 I/Os
Total cost = (200 + 4,000) = 4,200 I/Os
Best case is the first option
Therefore cost is 4,200 I/Os
To minimize the cost of the join, the buffer size should be 202 pages so that we can store entire smaller relation in the buffer.
• Grace hash join
Partitioning Phase
I/O cost = 2*1000(read and write) + 2*200(read and write) = 2,400 I/Os
Probing Phase
Need to check whether the buffer size is enough to hold all buckets. It should satisfy the following condition
B > f M/(B 1)
Since 52 > 1.1 1, 000/(52 1) = 21.5, the buffer size is adequate.
Therefore, I/O cost = 1000(read) + 200(read) = 1,200 I/Os
Total cost = 2,400 + 1,200 = 3,600 I/Os
• Sort-Merge join
Sorting Phase
Since we have a 52 page buffer, we have enough space to hold sorted sublists = 21 sublists < 52
Cost to sort R = 2*( 1,000 + 1,000 ) = 4,000 I/Os
Cost to sort S = 2* ( 200 + 200 ) = 800 I/Os
Merging phase
Since b is the primary key for S, we do not have to worry about duplicates.
Cost for merging = 1,000 + 200 = 1,200 I/Os
Total Cost = (4,000 + 800 + 1,200) = 6,000 I/Os