Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

Consider the join of R and S where R.a = S.b , given the following information a

ID: 3686475 • Letter: C

Question

Consider the join of R and S where R.a = S.b, given the following information about the relations to be joined. The cost metric is the number of disk I/Os and the cost of writing out the result is ignored.

Relation R contains 10,000 tuples; each tuple is 400 bytes long.

Relation S contains 2,000 tuples; each tuple is 400 bytes long.

A page size is 4096 bytes and the unpacked, bitmap page format is used. For each page, 96 bytes are reserved and cannot be used to store data. The rest of the page is used to store as many tuples as possible.

Attribute b of relation S is the primary key for S.

Both relations are stored as simple heap files. Neither relation has any index built on it.

The available memory buyer has 52 pages.

The fudge factor is 1.1.

Answer the following questions and explain why:

a) What is the cheapest cost of joining R and S using a block nested loops join for the given amount of memory buyer space? What should the number of buffer pages be so that the cost of the join is the minimum?

b) What is the cheapest cost of joining R and S using a GRACE hash join?

c) What is the cheapest cost of joining R and S using a sort-merge join?

Explanation / Answer

Number of tuples in relation R = 10,000

Number of tuples per page in relation R = 10

Number of total number of tuples in R = 1000

Number of tuples in relation S = 2,000

Number of tuples per page in relation S = 10

Number of total number of tuples in S = 200

• Block nested loops join

There are 2 options

1. R is outer, S is inner relation

2. S is outer, R is inner relation

Buffer utilization

Output: 1 page

Inner relation: 1 page

Outer relation: 50 pages

1. R is outer, S is inner relation

Cost to read outer = 1,000 I/Os

Join = 4,000 I/Os

Total cost = (1,000 + 4,000) = 5,000 I/Os

2. S is outer, R is inner relation

Cost to read outer = 200 I/Os

Join = 4,000 I/Os

Total cost = (200 + 4,000) = 4,200 I/Os

Best case is the first option

Therefore cost is 4,200 I/Os

To minimize the cost of the join, the buffer size should be 202 pages so that we can store entire smaller relation in the buffer.

• Grace hash join

Partitioning Phase

I/O cost = 2*1000(read and write) + 2*200(read and write) = 2,400 I/Os

Probing Phase

Need to check whether the buffer size is enough to hold all buckets. It should satisfy the following condition

B > f M/(B 1)

Since 52 > 1.1 1, 000/(52 1) = 21.5, the buffer size is adequate.

Therefore, I/O cost = 1000(read) + 200(read) = 1,200 I/Os

Total cost = 2,400 + 1,200 = 3,600 I/Os

• Sort-Merge join

Sorting Phase

Since we have a 52 page buffer, we have enough space to hold sorted sublists = 21 sublists < 52

Cost to sort R = 2*( 1,000 + 1,000 ) = 4,000 I/Os

Cost to sort S = 2* ( 200 + 200 ) = 800 I/Os

Merging phase

Since b is the primary key for S, we do not have to worry about duplicates.

Cost for merging = 1,000 + 200 = 1,200 I/Os

Total Cost = (4,000 + 800 + 1,200) = 6,000 I/Os