Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

Problem 5: Query Evaluation Consider the join between relations R and S, where t

ID: 3711503 • Letter: P

Question

Problem 5: Query Evaluation

Consider the join between relations R and S, where the join condition is R.a = S.b. We are given the following information about the two relations. The cost metric is the number of page I/Os, and the cost of writing out the result should be uniformly ignored.

Relation R contains 10,000 tuples and has 10 tuples per page.

Relation S contains 2,000 tuples and also has 10 tuples per page.

Attribute b of relation S is the primary key for S.

Both relations are stored as simple heap files.

Neither relation has any indexes built on it.

52 buffer pages are available.

Questions:

What is the cost of joining R and S using a simple nested loops join?

What is the cost of joining R and S using a block nested loops join?

What is the cost of joining R and S using a sort-merge join?

What is the cost of joining R and S using a hash join?

How many tuples will the join of R and S produce, at most, and how many pages would be required to store the result of the join back on disk?

If secondary B+ indexes existed on R.a and S.b, would either provide a cheaper alternative for performing the join (using an index nested loops join) than a block nested loops join? Explain.

If primary B+ indexes existed on R.a and S.b, would either provide a cheaper alternative for performing the join (using the index nested loops algorithm) than a block nested loops join? Explain.

Explanation / Answer

Solution:

Number of tuples in relation R = 10,000

Number of tuples per page in relation R = 10

Number of total number of tuples in R = 1000

Number of tuples in relation S = 2,000

Number of tuples per page in relation S = 10

Number of total number of tuples in S = 200

What is the cost of joining R and S using a block nested loops join?

There are 2 options

1. R is outer, S is inner relation

2. S is outer, R is inner relation

Buffer utilization

Output: 1 page

Inner relation: 1 page

Outer relation: 50 pages

1. R is outer, S is inner relation

Cost to read outer = 1,000 I/Os

Join = 4,000 I/Os

Total cost = (1,000 + 4,000) = 5,000 I/Os

2. S is outer, R is inner relation

Cost to read outer = 200 I/Os

Join = 4,000 I/Os

Total cost = (200 + 4,000) = 4,200 I/Os

Best case is the first option

Therefore cost is 4,200 I/Os

To minimize the cost of the join, the buffer size should be 202 pages so that we can store entire smaller relation in the buffer.

What is the cost of joining R and S using a sort-merge join?

Sorting Phase

Since we have a 52 page buffer, we have enough space to hold sorted sublists = 21 sublists < 52

Cost to sort R = 2*( 1,000 + 1,000 ) = 4,000 I/Os

Cost to sort S = 2* ( 200 + 200 ) = 800 I/Os

Merging phase

Since b is the primary key for S, we do not have to worry about duplicates.

Cost for merging = 1,000 + 200 = 1,200 I/Os

Total Cost = (4,000 + 800 + 1,200) = 6,000 I/Os

What is the cost of joining R and S using a hash join?

Partitioning Phase

I/O cost = 2*1000(read and write) + 2*200(read and write) = 2,400 I/Os

Probing Phase

Need to check whether the buffer size is enough to hold all buckets. It should satisfy the following condition

B > f ? M/(B ? 1)

Since 52 > 1.1 ? 1, 000/(52 ? 1) = 21.5, the buffer size is adequate.

Therefore, I/O cost = 1000(read) + 200(read) = 1,200 I/Os

Total cost = 2,400 + 1,200 = 3,600 I/Os

Note: please repost others.

I hope this helps if you find any problem. Please comment below. Don't forget to give a thumbs up if you liked it. :)