In this assignment, you will have a chance to implement the hash-join algorithm
ID: 3689783 • Letter: I
Question
In this assignment, you will have a chance to implement the hash-join algorithm for a DBMS. You can use C, C++, Java or any other high-level programming language to write the program. However, your program should include sufficient comments to make it readable. You need to turn in a brief report/description about your program design and implementation (e.g., high-level program diagram and data/file structures) and program usage; your program source code; proof of compilation (e.g., the screen snapshot of a successful compilation); and sample execution outputs. Please assemble all the above required contents in a single Word or PDF file for your submission. The program specificiation is given as follows. Let R_1 (a_1, a_2, a_3) and R_2 (b_1, b_2, b_3, b_4) be two relations with all integer attributes. Tuples in these two relations are sequentially stored in two data files, respectively. Use the hash-join algorithm to implement a join (equijoin) of R_1 and R_2. Assume that the hash function is f(k) = k mod N, where N is the number of buckets allowed in your hash structure/table. Your program should allow a user to choose the joining attributes from the two relations, i.e., performing R_1 R_1 middot a_i = R_2 middot b_j R_2 for any chosen pair of a_i and b_j, where a_i is the i-th attribute in R_1 (1 lessthanorequalto i lessthanorequalto 3) and b_j is the j-th attribute in R_2 (1 lessthanorequalto j lessthanorequalto 4). For example, a user may want to perform R_1 R_1 middot a_2 = R_2 middot b_3 R_2. Your program should display the join result and output the selectivity of the join. You may request a user to interactively input the necessary parameters, such as the data file names for R_1 and R_2, the number of tuples in each relation, and the joining attributes (e.g., 1 for the 1st attribute, 3 for the 3rd attribute). Use your program to perform several joins for different relation instances of R_1 and R_2.Explanation / Answer
Hash Join algorithm
for each tuple t in T do
let h = hash on join attributes t(b)
place t in hash table Th in bucket keyed by hash value h
for each tuple x in X do
let h = hash on join attributes x(a)
if h indicates a nonempty bucket (B) of hash table Th
if h matches any t in B
concatenate x and t
place relation in R
Tables to be joined using hash join algorithm
Code in java
import java.util.*;
public class HashJoinDBMS {
public static void main(String[] args) {
String[][] tab1 = {{"27", "John"}, {"18", "Alice"}, {"28", "Gloria"},
{"18", "Patrick"}, {"28", "Alice"}};
String[][] tab2 = {{"John", "Willy"}, {"John", "Spidy"},
{"Alice", "Gummy"}, {"Alice", "Zompy"}, {"Gloria", "Betty"},
{"Bob", "foo"}};
HashJoinDBMS(tab1, 1, tab2, 0).stream()
.forEach(s -> System.out.println(Arrays.deepToString(s)));
}
static List<String[][]> HashJoinDBMS(String[][] rec1, int id1,
String[][] rec2, int id2) {
List<String[][]> res = new ArrayList<>();
m<String,List<String[]>> m=new HashMap<>();
for (String[] rec : rec1) {
List<String[]> v = m.getOrDefault(rec[id1], new ArrayList<>());
v.add(rec);
m.put(rec[id1], v);
}
for (String[] rec : rec2) {
List<String[]> list = m.get(rec[id2]);
if (list != null) {
list.stream().forEach(s -> {
res.add(new String[][]{s, rec});
});
}
}
return res;
}
}