DB2_20240617 solution

Databases 2 - exam - June 17, 2024 - Dur.
2h
S. Comai, P. Fraternali, D. Martinenghi
A. Concurrency control (10 points, up to 2 per class)
Classify the following schedule with respect to VSR, CSR, 2PL, Strict 2PL, and TS multi (with the
conventions adopted for TS Multi under Snapshot Isolation, used for the exercises). Motivate all your
answers.
If it is VSR, provide all the possible serializations. If it is not 2PL/2PL strict, explain which lock/unlock
requests cause this.
r1(Z) r2(X) w2(Y) w3(Y) w4(Z) w2(Z) w5(X) r5(Y) w4(X) r6(Y) w6(X)
Solution.
• CSR: NO - there is a cycle due to r2(X) w4(X) and w4(Z) w2(Z)

• VSR: NO – swapping the blind writes w5(X) and w4(X) does not remove the cycle
• 2PL: NO - w2(Z) cannot anticipate the lock request and w2(Y) must release the lock before w3(Y)
- w4(X) cannot anticipate the lock before w5(X) and w4(Z) must release the lock before r2(Z) - to
determine the belonging to 2PL, it would also suffice to notice that it is not in CSR, so it cannot be
in 2PL
• Strict 2PL: NO, it is not 2PL
• TS multi: NO, w2(Z) cannot be after w4(Z)
B. Ranking (12+2 points, up to 3 for Questions 3 and 5, up to 2 for the remaining questions)
A group of people from Milan and Como are looking for a meeting point allowing them to easily travel
back and forth. They want to minimize the mean travel time m to both cities and to have balanced times.
Thus, they devise the following scoring function for any place with a travel time x to Milan and y to Como:
√
x+y 2 2
f (x, y) = m + s, with m = and s = (x − m) + (y − m) ,
2
where m is the mean time and s is akin to the standard deviation of the population {x, y} (geometrically, s
is the distance of ⟨x, y⟩ to the bisector of the first quadrant of the Cartesian plane, where x = y, i.e., times
are balanced). The group is only given three options, A, B, and C, vertically distributed over two rankings
as follows (for your calculations, consider that f (20, 100) ≈ 116.57):
x y
B: 20 A: 20
A: 100 B: 100
C: 110 C: 110
1. What is the skyline of this very small dataset? Do not show any step or calculation, just the result.
2. Compute the top-1 meeting point with TA according to f (x, y). Show depth and accesses.
3. The result found by TA is not a skyline point. Explain how this is possible.
4. Compute the top-1 meeting point with FA according to f (x, y). Show depth and accesses.
5. The result found by FA is wrong. Explain how this is possible.
6. (Bonus for the more mathematically inclined) Can you draw the shape of the iso-score curves of f ?
Solution.
1. The skyline is {A, B}.
2. A and B are discovered during round 1, with the same score ≈ 116.57 (they are symmetric and so is f ).
The threshold is T = 20 at round 1 and T = 100 at round 2 (the threshold point lies on the bisector, so
the distance to it is 0). At round 3, C is found, with f (C) = 110 (C is on the bisector), and T = 110.
We stop and output C. Depth=3, 6 s.a., and 3 r.a. (or 2, depending on implementation).
3. The skyline is the set of all optimal points according to some monotone function, but f is not monotone.
4. FA stops at depth 2 (with 4 sorted accesses), discovering only A and B and makes no random access
in this case. A and B are tied at 116.57; either is returned.
5. Again, f is not a monotone scoring function and FA’s correctness requires a monotone function.
x
6. A few iso-score curves are shown here: .
Indeed, the locus of points having a score c is characterized as follows:
√
x+y x−y 2 y−x 2 √
f (x, y) = c ⇔ (by substitution) + ( ) +( ) = c ⇔ x + y + 2∣x − y∣ = 2c
2 2 2
√ √
When x > y this reduces to the straight line y = x √2+1
2−1
− √2c ,
2−1
and for x < y to y = x √2−1
2+1
+ √2c
2+1
(and
when x = y then x = y = c).
C. Physical databases (10 points)
Table Customer(CUSTID, LastName, FirstName, Country) contains 800K tuples (primary key is in capital
letters) and Purchase (PRODUCTID, CUSTID, DATE, Qty) contains 8M tuples. Estimate the cost of running
the following query in the scenarios described below. Describe an efficient query plan for the following
scenarios. Estimate their execution costs (write the complete formula for the query) and explain all
the steps of the plan and their costs. The evaluation of the exercise also considers the plan’s degree of
efficiency.
SELECT *
FROM Customer C join Purchase P ON C.CUSTID=P.CUSTID
WHERE Country= ‘‘Italy’’ or Country=‘‘Spain’’
1. Table Purchase is entry-sequenced and occupies 40K blocks. Table Customer is stored in a hash table
with a hash function defined on the key. It occupies 8K blocks and has a negligible overflow chain.
Val(Country)=160.
2. Like in point 1 plus: Table Purchase has a secondary hash index built on attribute CustID with the
same hash function of Customer but with a cost of access due to overflow of 1.3. With the overflow
blocks, it occupies 10.5K blocks.
3. Like in point 2 plus: Table Customer has a B+ index (3 levels, 1K leaf nodes) with search key Country.
Solution.
1. Plan 1 (first scan Customer): Scan (sequentially) table Customer. For each tuple satisfying Coun-
try=“Italy” or Country=“Spain”, scan table Purchase and compute the join condition.
Plan1 = 8K + 2*(800K/160) * 40K = 8K + 10K * 40K = 400M I/O accesses
Plan 2 (first scan Purchase): Scan table Purchase; for each customer access the hash table and check
if Country=“Italy” or Country=“Spain”.
Plan2 = 40K + 8M * 1 = 8,04 M I/O accesses (best plan)
Plan 3 (Nested loop join):

Plan3 = 8K + 8K * 40K = 320 M I/O accesses
2. Plan 4 (improve Plan 1): Scan (sequentially) table Customer; for each tuple satisfying Country=“Italy”
or Country=“Spain”, find CustID in the hash index and follow the pointers to retrieve the purchases
of the customer.
Plan4 = 8K + 2*(800K/160) * (1.3 + 10) = 8K + 10K * 11.3 = 121K I/O accesses
Plan 5 (Hash join + retrieve purchases): compute the hash join; for each Italian/Spanish customer,
retrieve their purchases.
Plan5 = 8K + 10.5K + 2*(800K/160) * (8M/800K) = 8K + 10.5K + 10K * 10 = 118,5K I/O accesses
(best plan)
3. Plan 6 (extend Plan 4): Find Italy and Spain in the B+ index, follow the pointers to retrieve Italian
and Spanish customers, then lookup customers in the hash index and retrieve their purchases.
Plan6 = 2 *(2 + (5K/800) + 5K) + 10K * (1.3 + 10) = 123K I/O accesses (it does not improve Plan
4!)

DB2_20240617 solution

Uploaded by

Copyright:

Available Formats

DB2_20240617 solution

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

DB2_20240617 solution

Uploaded by

Copyright:

Available Formats

Databases 2 - exam - June 17, 2024 - Dur.

• CSR: NO - there is a cycle due to r2(X) w4(X) and w4(Z) w2(Z)

1. The skyline is {A, B}.

Plan 3 (Nested loop join):

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.