0% found this document useful (0 votes)
37 views

CAS CS 460/660 Introduction To Database Systems Query Optimization

The document discusses query optimization in database systems. It provides examples of alternative query execution plans for a sample query joining the Reserves and Sailors tables. Earlier selections and joins can be "pushed down" in the query tree to reduce the number of I/Os compared to a naive nested loop plan. The goal is to find an efficient plan that computes the same results while minimizing estimated execution costs like I/O.

Uploaded by

Arnaldo Canelas
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
37 views

CAS CS 460/660 Introduction To Database Systems Query Optimization

The document discusses query optimization in database systems. It provides examples of alternative query execution plans for a sample query joining the Reserves and Sailors tables. Earlier selections and joins can be "pushed down" in the query tree to reduce the number of I/Os compared to a naive nested loop plan. The goal is to find an efficient plan that computes the same results while minimizing estimated execution costs like I/O.

Uploaded by

Arnaldo Canelas
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 20

CAS CS 460/660

Introduction to Database Systems

Query Optimization

1.1
Review
 Implementation of Relational Operations as Iterators
 Focus largely on External algorithms (sorting/hashing)
 Choices depend on indexes, memory, stats,…
 Joins
 Blocked nested loops:
 simple, exploits extra memory
 Indexed nested loops:
 best if 1 rel small and one indexed
 Sort/Merge Join
 good with small amount of memory, bad with duplicates
 Hash Join
 fast (enough memory), bad with skewed data
 Relatively easy to parallelize
 Sort and Hash-Based Aggs and DupElim

1.2
Query Optimization Overview
 Query can be converted to relational algebra
 Rel. Algebra converted to tree, joins as branches
 Each operator has implementation choices
 Operators can also be applied in different order!

SELECT S.sname sname


FROM Reserves R, Sailors S
WHERE R.sid=S.sid AND
R.bid=100 AND S.rating>5 bid=100 rating > 5

sid=sid
sname((bid=100  rating > 5) (Reserves  Sailors))

Reserves Sailors
1.3
Iterator Interface (pull from the top)
 Recall:
•Relational operators at nodes support uniform
iterator interface:
sname
Open( ), get_next( ), close( )
bid=100 rating > 5 •Unary Ops – On Open() call Open() on child.
•Binary Ops – call Open() on left child then
on right.
sid=sid
•By convention, outer is on left.

Reserves Sailors

Alternative is pipelining (i.e. a “push”-based approach).

Can combine push & pull using special operators.

1.4
Query Optimization Overview (cont)

 Logical Plan: Tree of R.A. ops


 Physical Plan: Tree of R.A. ops, with choice of algorithm for each
operator.

 Two main issues:


 For a given query, what plans are considered?
 Algorithm to search plan space for cheapest (estimated) plan.
 How is the cost of a plan estimated?

 Ideally: Want to find best plan.

 Reality: Avoid worst plans!

1.5
Cost-based Query Sub-System
Select *
Queries From Blah B
Where B.blah = blah
Usually there is a
heuristics-based
rewriting step before
the cost-based steps.
Query Parser

Query Optimizer

Plan Generator Plan Cost Estimator Catalog Manager

Schema Statistics
Query Plan Evaluator

1.6
Schema for Examples
Sailors (sid: integer, sname: string, rating: integer, age: real)

Reserves (sid: integer, bid: integer, day: dates, rname: string)

 As seen in previous lectures…


 Reserves:
 Each tuple is 40 bytes long, 100 tuples per page, 1000 pages.
 Let’s say there are 100 boats.
 Sailors:
 Each tuple is 50 bytes long, 80 tuples per page, 500 pages.
 Let’s say there are 10 different ratings.
 Assume we have 5 pages in our buffer pool.

1.7
Motivating Example

SELECT S.sname
FROM Reserves R, Sailors S
WHERE R.sid=S.sid AND
R.bid=100 AND S.rating>5

 Cost: 500+500*1000 I/Os Plan: (On-the-fly)


sname
 By no means the worst plan!
 Misses several opportunities: selections
could have been `pushed’ earlier, no use bid=100 rating > 5 (On-the-fly)
is made of any available indexes, etc.
 Goal of optimization: To find more
(Page-Oriented
efficient plans that compute the same
sid=sid Nested loops)
answer.

Sailors Reserves

1.8
Alternative Plans – Push Selects
(No Indexes)
(On-the-fly)
sname

(On-the-fly)
sname
bid=100 (On-the-fly)

bid=100 rating > 5 (On-the-fly)


(Page-Oriented
sid=sid Nested loops)
(Page-Oriented
sid=sid Nested loops)
rating > 5
(On-the-fly)
Reserves
Sailors Reserves
Sailors

500,500 IOs 250,500 IOs

1.9
Alternative Plans – Push Selects
(No Indexes)

sname (On-the-fly)
(On-the-fly)
sname

bid=100 (On-the-fly)
(Page-Oriented
sid=sid Nested loops)

(Page-Oriented
sid=sid Nested loops) rating > 5 bid = 100
(On-the-fly)
(On-the-fly)
rating > 5
(On-the-fly) Reserves
Sailors Reserves

Sailors
250,500 IOs
250,500 IOs
1.10
Alternative Plans – Push Selects
(No Indexes)

(On-the-fly) (On-the-fly)
sname sname

rating > 5 (On-the-fly)


bid=100 (On-the-fly)

(Page-Oriented (Page-Oriented
sid=sid Nested loops) sid=sid Nested loops)

rating > 5 bid=100 Sailors


(On-the-fly) Reserves (On-the-fly)

Sailors Reserves

6000 IOs
250,500 IOs
1.11
Alternative Plans – Push Selects
(No Indexes)

(On-the-fly)
sname

(On-the-fly)
sname
rating > 5 (On-the-fly)

(Page-Oriented
sid=sid Nested loops)
(Page-Oriented
sid=sid Nested loops) (Scan &
bid=100 rating > 5 Write to
(On-the-fly) temp T2)
bid=100 Sailors
(On-the-fly)

Reserves Sailors
Reserves
4250 IOs
6000 IOs 1000 + 500+ 250 + (10 * 250)

1.12
Alternative Plans – Push Selects
(No Indexes)

(On-the-fly) (On-the-fly)
sname sname

(Page-Oriented (Page-Oriented
sid=sid Nested loops) sid=sid Nested loops)

(Scan & (Scan &


bid=100 rating > 5 Write to rating>5 bid=100 Write to
(On-the-fly) temp T2) (On-the-fly) temp T2)

Reserves Sailors Sailors Reserves

4250 IOs 4010 IOs


500 + 1000 +10 +(250 *10)

1.13
Alternative Plans 1 sname
(On-the-fly)

(No Indexes)
(Sort-Merge Join)
sid=sid
 Main difference: Sort
(Scan; (Scan;
Merge Join write to
temp T1)
bid=100 rating > 5 write to
temp T2)
 With 5 buffers, cost of plan:
Reserves Sailors
 Scan Reserves (1000) + write temp T1 (10 pages, if we have 100
boats, uniform distribution).
 Scan Sailors (500) + write temp T2 (250 pages, if have 10 ratings).
 Sort T1 (2*2*10), sort T2 (2*4*250), merge (10+250)
 Total: 4060 page I/Os. (note: T2 sort takes 4 passes with B=5)
 If use BNL join, join = 10+4*250, total cost = 2770.
 Can also `push’ projections, but must be careful!
 T1 has only sid, T2 only sid, sname:
 T1 fits in 3 pgs, cost of BNL under 250 pgs, total < 2000.

1.14
(On-the-fly)
Alt Plan 2: Indexes sname

(On-the-fly)
rating > 5

 With clustered hash index on bid of (Index Nested Loops,


Reserves, we get 100,000/100 = sid=sid with pipelining )
1000 tuples on 1000/100 = 10 pages. (Use hash
Index, do
 INL with outer not materialized. bid=100 Sailors
not write
to temp)
– Projecting out unnecessary fields Reserves
from outer doesn’t help.
 Join column sid is a key for Sailors.
At most one matching tuple, unclustered index on sid OK.
 Decision not to push rating>5 before the join is based on
availability of sid index on Sailors.
 Cost: Selection of Reserves tuples (10 I/Os); then, for each,
must get matching Sailors tuple (1000*1.2); total 1210 I/Os.

1.15
What is needed for optimization?

 Iterator Interface
 Cost Estimation
 Statistics and Catalogs
 Size Estimation and Reduction Factors

1.16
Query Blocks: Units of Optimization

SELECT S.sname
FROM Sailors S
WHERE S.age IN
(SELECT MAX (S2.age)
Outer block FROM Sailors S2
GROUP BY S2.rating)
Nested block

 An SQL query is parsed into a collection of query blocks, and these are
optimized one block at a time.

 Inner blocks are usually treated as subroutines


 Computed:
 once per query (for uncorrelated sub-queries)
 or once per outer tuple (for correlated sub-queries)

1.17
Translating SQL to Relational Algebra
SELECT S.sid, MIN (R.day)
FROM Sailors S, Reserves R, Boats B
WHERE S.sid = R.sid AND R.bid = B.bid AND B.color = “red”
AND S.rating = ( SELECT MAX (S2.rating) FROM Sailors S2)
GROUP BY S.sid
HAVING COUNT (*) >= 2

For each sailor with the highest rating (over all sailors), and at least two
reservations for red boats, find the sailor id and the earliest date on which the
sailor has a reservation for a red boat.

1.18
Translating SQL to Relational Algebra

SELECT S.sid, MIN (R.day)


FROM Sailors S, Reserves R, Boats B
WHERE S.sid = R.sid AND R.bid = B.bid AND B.color = “red”
AND S.rating = ( SELECT MAX (S2.rating) FROM Sailors S2)
GROUP BY S.sid
HAVING COUNT (*) >= 2
Inner Block
 S.sid, MIN(R.day)
(HAVING COUNT(*)>2 (
GROUP BY S.Sid (

B.color = “red” S.rating = ( Boats))))
val
Sailors Reserves

1.19
Relational Algebra Equivalences
 Allow us to choose different operator orders and to `push’ selections and
projections ahead of joins.
 Selections:
(Cascade)
 c1 ... cn  R   c1  . . .  cn  R
 c1  c2 R   c2  c1 R (Commute)

 Projections: 
a1 
R 
a1 
...  R
an  (Cascade)
(
i
 Joins: R (S
f
a T) (R S) T (Associative)
 n

(R S) i (S R) (Commute)
n
These two mean
c we can do joins in any order.
l
u 1.20

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy