10 DistQueryOptimization
10 DistQueryOptimization
Introduction
Background
Distributed DBMS Architecture
Distributed Database Design
Distributed Query Processing
Query Processing Methodology
Distributed Query Optimization
Distributed Transaction Management (Extensive)
Building Distributed Database Systems (RAID)
Mobile Database Systems
Privacy, Trust, and Authentication
Peer to Peer Systems
Distributed DBMS © 1998 M. Tamer Özsu & Patrick Valduriez Page 7-9. 1
Useful References
Textbook Principles of Distributed Database Systems,
Chapter 7
Distributed DBMS © 1998 M. Tamer Özsu & Patrick Valduriez Page 7-9. 2
Distributed Query Processing
Methodology
Calculus Query on Distributed
Relations
Query
Query
GLOBAL
GLOBAL
Decomposition
Decomposition SCHEMA
SCHEMA
Fragment Query
Global STATS
STATSON
ON
Global
Optimization
Optimization
FRAGMENTS
FRAGMENTS
Optimized Local
Queries
Distributed DBMS © 1998 M. Tamer Özsu & Patrick Valduriez Page 7-9. 3
Restructuring
Convert relational calculus to relational ENAME Project
algebra
Make use of query trees
Example DUR=12 OR DUR=24
Find the names of employees other than J. Doe
who worked on the CAD/CAM project for
either 1 or 2 years. PNAME=“CAD/CAM” Select
SELECT ENAME
FROM EMP, ASG, PROJ
WHERE EMP.ENO = ASG.ENO ENAME≠“J. DOE”
AND ASG.PNO = PROJ.PNO
AND ENAME ≠ “J. Doe”
PNO
AND PNAME = “CAD/CAM”
AND (DUR = 12 OR DUR = 24)
ENO Join
Distributed DBMS © 1998 M. Tamer Özsu & Patrick Valduriez Page 7-9. 4
Restructuring –Transformation
Rules
Commutativity of binary operations
RSSR
R SS R
RSSR
Associativity of binary operations
( R S ) T R (S T)
( R S ) T R (S T )
Idempotence of unary operations
A’(A’(R)) A’(R)
p1(A1)(p2(A2)(R)) = p1(A1) p2(A2)(R)
where R[A] and A' A, A" A and A' A"
Commuting selection with projection
Distributed DBMS © 1998 M. Tamer Özsu & Patrick Valduriez Page 7-9. 5
Restructuring – Transformation
Rules
Commuting selection with binary operations
p(A)(R S) (p(A) (R)) S
p(Ai)(R (Aj,Bk) S) (p(Ai) (R)) (Aj,Bk) S
p(Ai)(R T) p(Ai) (R) p(Ai) (T)
Distributed DBMS © 1998 M. Tamer Özsu & Patrick Valduriez Page 7-9. 7
Equivalent Query
ENAME
PNO ENO
Distributed DBMS © 1998 M. Tamer Özsu & Patrick Valduriez Page 7-9. 8
Restructuring
ENAME
PNO
PNO,ENAME
ENO
Distributed DBMS © 1998 M. Tamer Özsu & Patrick Valduriez Page 7-9. 9
Cost Functions
Total Time (or Total Cost)
Reduce each cost (in terms of time) component individually
Do as little of each cost component as possible
Optimizes the utilization of the resources
Increases system throughput
Response Time
Do as many things as possible in parallel
May increase total time because of increased total activity
Distributed DBMS © 1998 M. Tamer Özsu & Patrick Valduriez Page 7-9. 10
Total Cost
Summation of all cost factors
Distributed DBMS © 1998 M. Tamer Özsu & Patrick Valduriez Page 7-9. 11
Total Cost Factors
Distributed DBMS © 1998 M. Tamer Özsu & Patrick Valduriez Page 7-9. 12
Response Time
Distributed DBMS © 1998 M. Tamer Özsu & Patrick Valduriez Page 7-9. 13
Example
Site 1
x units
Site 3
Site 2 y units