Suppose we want to build a B+ tree that has space for 200,000 data entries in it
ID: 3809282 • Letter: S
Question
Suppose we want to build a B+ tree that has space for 200,000 data entries in its leaf pages. Each data entry is made up of a key and its corresponding data value. Let us assume the following specifications. Each page (leaf or internal) is 4096 bytes long. Each page holds three 8-byte pointers (parent, left sibling, right sibling) in addition to the bytes consumed by all of the keys and their accompanying 8-byte values. (In the case of leaf pages, these “values” are simply pointers that locate the full data record corresponding to the key (e.g., customer record for the given key.) We want to use an even number of data entries in the leaf pages. Internally, we also want an even number. The keys have unique values. For all of the following questions, show your work:
a) Suppose each key is 56 bytes long, and suppose we fill the leaf pages to capacity (i.e., as much as possible). Compute the number of leaf pages that we need.
b) Compute the number of internal pages, at each level, that we “need” (i.e., assume that you can fill the parents to capacity (to the maximum even number of keys)). This will result in a structure that has the fewest number of pages. Note, however, that the root can have as few as one key.
Explanation / Answer
a)
Since a page is 4,096 bytes, it can only store a limited number of index entries. Also, we are building a B+ tree that is filled up as much as possible so we will store as many index entries as can fit in a disk page.
An index entry is a pair of values: [search key value, reference to tuple], and in our case, size(search key value)=56 and size(reference to tuple)=8.
So, to determine the number of index entries that can fit in a leaf page we can use this calculation:
Number of index entries per leaf page = Floor( Page_size / (size(search key value) + size(reference to tuple))) = Floor( 4096 / (56 + 8)) = 64
Now we must determine the number of leaf pages that are necessary. Since the table we are indexing is stored as a heap, the tuples are not ordered in the data file. So, our B+ -tree index must contain one index enter per tuple in the table; this is known as a dense index. So, there will be 200000 index entries collectively stored in the leaf pages.
The number of leaf pages necessary is given by: –
Number of leaf pages = Ceiling ( (Number of index entries to be stored) / (Number of index entries per leaf page) ) = Number of leaf pages = Ceiling ( 200000 / 64 ) = 3125.
b)
Next, we must determine the branching factor, m, of our B+ -tree. This can be done by repeating step (a) above with some slight modifications. In a B+ tree of order m, each internal node must be able to store up to m branches (i.e. pointers to nodes in the level below) and thus, up to m-1 search key values (usually called separator values when they appear in the internal nodes).
Page_size >= (m-1) * size(search key value) + m * size(pointer)
In our case, we have: 4096 >= (m-1) * 56+ m * 8
Solving for m, which must be a whole number, we get: m = Floor( ( 4,096 + 56 ) / 64 ) = 64
So, each internal node will store 64 pointers to nodes in the level below and a corresponding 63 search key values.
Now that we have m, we can determine the number of pages at each level in the B + tree. Since every leaf in our tree needs to be referenced from the level above, collectively we will need 3125 pointers (i.e. branches), one per leaf page.
Therefore the number of internal pages in the level above the leaves is: –
Number of pages in level above leaves = Ceiling( Number of leaf pages / branching factor)
Number of pages in level above leaves = Ceiling( 3125 / 64 ) = 49
– We need to continue doing this computation until we reach the point where we have just one internal node that will be the root of our tree
– Number of pages in the next level above = Ceiling( 49 / 64 ) = 1