Midterm 15w2
Midterm 15w2
Name Student No
(PRINT) (Last) (First)
Signature
In case of ambiguity, please write down any reasonable assumptions that you need to make.
1. {4 marks} Which of the following statements about Alt. 2 extendible hash indexes are generally true?
Assume that we are dealing with indexes on unique keys (i.e., no duplicates). A “bucket” is a “page”.
a) A bad hash function will still distribute keys pretty evenly across the entire hash structure.
b) It is often faster to find a unique key in a hash index than in a B+ tree index.
c) Usually, hash indexes can support range queries efficiently.
d) The local depth nub in an extendible hashing index bucket tells you how many <key, rid>
pairs are in the bucket.
e) Hash buckets need to be at least half full.
Page 2
2. {9 marks} Consider the following schema for a mining table created by an investment firm. Assume
that we use 4K pages (4096 bytes), that a tuple cannot span pages, and that we try to fill pages to
capacity. All rid and child pointers are 10 bytes long, and as usual, we’ll ignore housekeeping bytes.
Don’t worry about whether there is an odd or an even number of data entries in your answer.
Details: Of the many thousands of mines around the world, supposes an investment firm tracks 200
mines and has 10 full years’ worth of data for each of those mines (assume 365.25 days/year). Numeric
fields take up 4 bytes, dates are stored in integer format (e.g., 20150211 means February 11, 2015), and
strings are 15 bytes each. Each data record contains 110 bytes. The table and its fields are:
a) {5 marks} Compute the number of leaf pages (only) in this index. Show your work.
b) {2 marks} How many complete records will fit on one data page if we plan to leave approximately
30% free space on each data page of the table? Show your work. Make reasonable assumptions.
c) {2 marks} Note that the leading field of the clustering index’s key is the date. Why is this firm
planning to leave 30% free space on each data page? Be specific.
Page 3
3. {5 marks} Suppose we have the following entries that we’re going to insert into an empty Extendible
Hash index structure shown below. You can assume that each bucket can hold up to 3 entries. Use the
same hash function from class (i.e., modulo last n bits of each binary value). The index contains no keys
so far, but we’ll start with pointers to 2 initially empty buckets.
Add these keys, in this order: 12, 15, 7, 24, 8, 6, 16—and show the resulting structure every time the
directory doubles, and at the end. Note that 12 = (1100)2.
For the “end”, it is OK to add any remaining keys directly to the existing structure if it doesn’t split
again (to save you writing time).
Page 4
4. {4 marks} A data entry is a <key, rid> pair in a hash bucket. What is the maximum key length k
permitted so that at least 90 data entries will be able to fit on a 4K (4096-byte) bucket in an Alt. 2 hash
index? In other words, if the key length is any longer than k, then we won’t be able to fit 90 data entries
on a page. As usual, assume that each rid is 10 bytes long. Write down any reasonable assumptions, if
any, that you have to make. Show your work.
5. {4 marks} Suppose we have a B+ tree of height 2 containing 28,000 unique primary keys, kept
sorted in alphanumeric order. The leaf pages can contain a maximum of 140 data entries per page, and
these leaf pages are filled to capacity. How many pages need to be retrieved into an empty buffer pool
to determine the first 10,000 primary keys (only), in order? Show your work.
Page 5
6. {4 marks} Suppose a parent node in a B+ tree has 3 children: A, B, and C—where the middle node
is B. All 3 of A, B, and C are leaf pages, and they currently contain 56 data entries each. The order of
the tree is 42. Using the principles of B+ tree deletions and maintenance from class, how many keys can
we delete from B without having to merge two of A, B, and C? Assume that only node B has keys that
are being deleted from the tree. Briefly, justify your answer.
7. {4 marks} Consider a B+ tree index of order d = 5 (i.e., max. 10 entries for any of its nodes/pages).
What is the maximum possible number of data entries at the leaf level that can appear in this B+ tree, if
its height is exactly 2 (i.e., has 3 levels)?
Page 6
8. {5 marks} Compute the total number of cylinders that the disk arm has to move for each of these 2
algorithms: (a) the regular elevator algorithm (SCAN with LOOK), just like in class; and (b) the SSTF
(Shortest Seek Time First) algorithm. Let us assume that the disk assembly is currently at cylinder 0
and is heading higher.
The cylinder requests that have already arrived are, in order: 1500, 2100, 4000, and 800. Assume that
immediately after fetching the page(s) (i.e., completing the service) for cylinder 2100 that the following
requests arrive nearly simultaneously: 1300, 2000, 4500, and 1600. Thus, there are 8 service requests
in all.
Page 7
9. {4 marks} Consider the SYSIBM.SYSCOLUMNS and SYSIBM.SYSTABLES DB2 catalog tables
on the next 2 pages, and the query below. Recall that a TBCREATOR is a unique identifier in case
multiple tables have the same name.
In plain English, in one sentence, explain what the purpose of the following (relatively simple) SQL
catalog query is.
Page 8