0% found this document useful (0 votes)

10 views

14-queryexecution2 (1)

Lecture #14 discusses query execution in database systems, focusing on parallel execution which enhances performance and responsiveness. It distinguishes between parallel and distributed databases, outlines process models like process per worker and thread per worker, and explains intra-query and inter-query parallelism. Additionally, it covers I/O parallelism strategies such as multi-disk parallelism and database partitioning to optimize query performance.

Uploaded by

ayush anand

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views

14-queryexecution2 (1)

Uploaded by

ayush anand

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Lecture #14: Query Execution II

15-445/645 Database Systems (Fall 2024)

https://15445.courses.cs.cmu.edu/fall2024/
Carnegie Mellon University
Andy Pavlo

1 Background
Previous discussions of query executions assumed that the queries executed with a single worker (i.e
thread). However, in practice, queries are often executed in parallel with multiple workers.
Parallel execution provides a number of key benefits for DBMSs:
• Increased performance in throughput (more queries per second) and latency (less time per query).
• Increased responsiveness and availability from the perspective of external clients of the DBMS.
• Potentially lower total cost of ownership (TCO). This cost includes both the hardware procurement
and software license, as well as the labor overhead of deploying the DBMS and the energy needed
to run the machines. This has become of particular relevance in increasingly cloud-centric systems.
There are two types of parallelism that DBMSs support: inter-query parallelism and intra-query paral-
lelism.

2 Parallel vs Distributed Databases

In both parallel and distributed systems, the database is spread out across multiple “resources” to improve
parallelism. These resources may be computational (e.g., CPU cores, CPU sockets, GPUs, additional ma-
chines) or storage (e.g., disks, memory).
It is important to distinguish between parallel and distributed systems:
• Parallel DBMS In a parallel DBMS, resources, or nodes, are physically close to each other. These
nodes communicate over a high-speed interconnect. It is assumed that communication between
resources is not only fast, but also cheap and reliable.
• Distributed DBMS In a distributed DBMS, resources may be far away from each other; this might
mean the database spans racks or data centers in different parts of the world. As a result, resources
communicate using a slower interconnect (often over a public network). Communication costs be-
tween nodes are higher and failures cannot be ignored.
Even though a database may be physically divided over multiple resources, it still appears as a single
logical database instance to the application. Thus, a SQL query executed against a single-node DBMS
should generate the same result on a parallel or distributed DBMS.

3 Process Models
A DBMS process model defines how the system supports concurrent requests from a multi-user applica-
tion/environment. The DBMS is comprised of one or more workers that are responsible for executing tasks
on behalf of the client and returning the results. An application may send a large request or multiple
requests at the same time that must be divided across different workers.
Fall 2024 – Lecture #14 Query Execution II

Figure 1: Process per Worker Model

There are two major process models that a DBMS may adopt: process per worker and thread per worker.
A third common database usage pattern takes an embedded approach.

Process per Worker

The most basic approach is process per worker. Here, each worker is a separate OS process, and thus relies
on OS scheduler. An application sends a request and opens a connection to the databases system. Some
dispatcher receives the request and selects one of its worker processes to manage the connection. The
application then communicates directly with the worker who is responsible for executing the request that
the query wants. This sequence of events is shown in Figure 1.
Relying on the operating system for scheduling effectively reduces the DBMS’s control over execution.
Further, this model depends on shared memory to maintain global data structures or relies on message
passing, which has a higher overhead.
An advantage of the process per worker approach is that a process crash doesn’t disrupt the whole system
because each worker runs in the context of its own OS process.
This process model raises the issue of multiple workers on separate processes making numerous copies of
the same page. A solution to maximize memory usage is to use shared-memory for global data structures
so that they can be shared by workers running in different processes.
Examples of systems that utilize the process-per-worker process model include IBM DB2, Postgres, and
Oracle. When these DBMSs were developed, pthreads had not yet become the standard threading model.
The semantics of threading varied from OS to OS while fork() was better defined.

Thread per Worker

The most common model nowadays is thread per worker. Instead of having different processes doing differ-
ent tasks, each database system has only one process with multiple worker threads. In this environment,
the DBMS has full control over the tasks and threads, it can manage it own scheduling. The multi-threaded
model may or may not use a dispatcher thread. A diagram of the thread per worker model is shown in
Figure 2.
Using multi-threaded architecture provides certain advantages. For one, there is less overhead per context
switch. Additionally, a shared model does not have to be maintained. However, it is possible that a thread
crash can kill the entire database process. Also, the thread per worker model does not necessarily imply
that the DBMS supports intra-query parallelism.
Almost every DBMS created in the last 20 years uses this approach, including Microsoft SQL Server and
MySQL. IBM DB2 and Oracle have updated their models to provide support for this approach, as well.
Postgres and Postgres-derived databases largely still use the process-based approach.

15-445/645 Database Systems

Page 2 of 6
Fall 2024 – Lecture #14 Query Execution II

Figure 2: Thread per Worker Model

Figure 3: Embedded DBMS Scheduling

Embedded DBMS
A very different usage pattern for databases involves running the system in the same address space of the
application, as opposed to a client-server model where the database stands independent of the application.
In this scenario, the application will set up the threads and tasks to run on the database system. The
application itself will largely be responsible for scheduling. A diagram of an embedded DBMS’s scheduling
behaviors is shown in Figure 3.
DuckDB, SQLite, and RocksDB are the most famous embedded DBMSs.

Scheduling
In conclusion, for each query plan, the DBMS has to decide where, when, and how to execute. Relevant
questions include:
• How many tasks should it use?
• How many CPU cores should it use?
• What CPU cores should the tasks execute on?
• Where should a task store its output?
When making decisions regarding query plans, the DBMS always knows more than the OS and should be
prioritized as such.

4 Inter-Query Parallelism
In inter-query parallelism, the DBMS executes different queries concurrently. Since multiple workers are
running requests simultaneously, overall performance is improved. This increases throughput and reduces
latency.
If the queries are read-only, then little coordination is required between queries. However, if multiple
queries are updating the database concurrently, more complicated conflicts arise. These issues are discussed
further in lecture 16.

15-445/645 Database Systems

Page 3 of 6
Fall 2024 – Lecture #14 Query Execution II

Figure 4: Intra-Operator Parallelism – The query plan for this SELECT is a se-
quential scan on A that is fed into a filter operator. To run this in parallel, the query
plan is partitioned into disjoint fragments. A given plan fragment is operated on a
by a distinct worker. The exchange operator calls Next concurrently on all fragments
which then retrieve data from their respective pages.

5 Intra-Query parallelism
In intra-query parallelism, the DBMS executes the operations of a single query in parallel. This decreases
latency for long-running queries.
The organization of intra-query parallelism can be thought of in terms of a producer/consumer paradigm.
Each operator is a producer of data as well as a consumer of data from some operator running below it.
Parallel algorithms exist for every relational operator. The DBMS can either have multiple threads access
centralized data structures or use partitioning to divide work up.
Within intra-query parallelism, there are three types of parallelism: intra-operator, inter-operator, and
bushy. These approaches are not mutually exclusive. It is the DBMS’ responsibility to combine these
techniques in a way that optimizes performance on a given workload.

Intra-Operator Parallelism (Horizontal)

In intra-operator parallelism, the query plan’s operators are decomposed into independent fragments that
perform the same function on different (disjoint) subsets of data.
The DBMS inserts an exchange operator into the query plan to coalesce results from child operators. The
exchange operator prevents the DBMS from executing operators above it in the plan until it receives all of
the data from the children. An example of this is shown in Figure 4.
In general, there are three types of exchange operators:
• Gather: Combine the results from multiple workers into a single output stream. This is the most
common type used in parallel DBMSs.
• Distribute: Split a single input stream into multiple output streams.
• Repartition: Reorganize multiple input streams across multiple output streams. This allows the
DBMS take inputs that are partitioned one way and then redistribute them in another way.

Inter-Operator Parallelism (Vertical)

In inter-operator parallelism, the DBMS overlaps operators in order to pipeline data from one stage to the
next without materialization. This is sometimes called pipelined parallelism. See example in Figure 5.

15-445/645 Database Systems

Page 4 of 6
Fall 2024 – Lecture #14 Query Execution II

Figure 5: Inter-operator Parallelism – In the JOIN statement to the left, a single

worker performs the join and then emits the result to another worker that performs
the projection and then emits the result again.

Figure 6: Bushy Parallelism – To perform a 4-way JOIN on three tables, the query
plan is divided into four fragments as shown. Different portions of the query plan
run at the same time, in a manner similar to inter-operator parallelism.

This approach is widely used in stream processing systems, which are systems that continually execute a
query over a stream of input tuples.

Bushy Parallelism
Bushy parallelism is a hybrid of intra-operator and inter-operator parallelism where workers execute mul-
tiple operators from different segments of the query plan at the same time.
The DBMS still uses exchange operators to combine intermediate results from these segments. An example
is shown in Figure 6.

6 I/O Parallelism
Using additional processes/threads to execute queries in parallel will not improve performance if the disk
is always the main bottleneck. Therefore, it is important to be able to split a database across multiple
storage devices.

15-445/645 Database Systems

Page 5 of 6
Fall 2024 – Lecture #14 Query Execution II

To get around this, DBMSs use I/O parallelism to split installation across multiple devices. Two approaches
to I/O parallelism are multi-disk parallelism and database partitioning.

Multi-Disk Parallelism
In multi-disk parallelism, the OS/hardware is configured to store the DBMS’s files across multiple storage
devices. This can be done through storage appliances or RAID configuration based on performance, dura-
bility, and capacity constraints. All of the storage setup is transparent to the DBMS so workers cannot
operate on different devices because the DBMS is unaware of the underlying parallelism.

Database Partitioning
In database partitioning, the database is split up into disjoint subsets that can be assigned to discrete disks.
Some DBMSs allow for specification of the disk location of each individual database. This is easy to do at
the file-system level if the DBMS stores each database in a separate directory. The log file of changes made
is usually shared.
The idea of logical partitioning is to split single logical table into disjoint physical segments that are stored/-
managed separately. Such partitioning is ideally transparent to the application. That is, the application
should be able to access logical tables without caring how things are stored.
We will cover these approaches later in the semester when discussing distributed databases.

15-445/645 Database Systems

Page 6 of 6

To Do List Using Java
No ratings yet
To Do List Using Java
45 pages
AWS Certified Solutions Architect - Professional
From Everand
AWS Certified Solutions Architect - Professional
VB Dev
No ratings yet
Getting Started With MQTT
100% (3)
Getting Started With MQTT
7 pages
List 5
No ratings yet
List 5
15 pages
Sketch The Architecture of Iot Toolkit and Explain Each Entity in Brief
100% (1)
Sketch The Architecture of Iot Toolkit and Explain Each Entity in Brief
9 pages
14-queryexecution2
No ratings yet
14-queryexecution2
47 pages
Query Execution: Intro To Database Systems Andy Pavlo
No ratings yet
Query Execution: Intro To Database Systems Andy Pavlo
63 pages
Unit No.4 Parallel Database
No ratings yet
Unit No.4 Parallel Database
32 pages
UNIT-3: Introduction To Parallel Database and I/O Parallelism
No ratings yet
UNIT-3: Introduction To Parallel Database and I/O Parallelism
52 pages
Ads unit 3
No ratings yet
Ads unit 3
8 pages
Parallel Execution in Oracle
No ratings yet
Parallel Execution in Oracle
17 pages
ADBMS Parallel and Distributed Databases
No ratings yet
ADBMS Parallel and Distributed Databases
98 pages
Module1 ADBMS
No ratings yet
Module1 ADBMS
99 pages
Week7 Lecture
No ratings yet
Week7 Lecture
64 pages
Cs6005 - Advanced Database Systems (Unit-1)
No ratings yet
Cs6005 - Advanced Database Systems (Unit-1)
136 pages
Parallel and Distributed Databases in DBMS
No ratings yet
Parallel and Distributed Databases in DBMS
31 pages
Unit 2adtnotes
No ratings yet
Unit 2adtnotes
74 pages
Unit VIII - Query Processing and Security
No ratings yet
Unit VIII - Query Processing and Security
29 pages
9-Database System Architecture
No ratings yet
9-Database System Architecture
37 pages
Co-So-Du-Lieu - Carlo-A.-Curino - Mit6 - 830f10 - Lec04-Introduction-To-Database-Internals - (CC) - (Cuuduongthancong - Com)
No ratings yet
Co-So-Du-Lieu - Carlo-A.-Curino - Mit6 - 830f10 - Lec04-Introduction-To-Database-Internals - (CC) - (Cuuduongthancong - Com)
7 pages
Query Processing
No ratings yet
Query Processing
28 pages
Parallel Database: Architecture For Parallel Databases. Parallel Query Evaluation Parallelizing Individual Operations
No ratings yet
Parallel Database: Architecture For Parallel Databases. Parallel Query Evaluation Parallelizing Individual Operations
27 pages
1 Intro 2 Up
No ratings yet
1 Intro 2 Up
16 pages
DWHM 1
No ratings yet
DWHM 1
12 pages
ISBN 0071230572 DBMS Chapter01
No ratings yet
ISBN 0071230572 DBMS Chapter01
22 pages
DBMS MASTER: Become Pro in Database Management System
From Everand
DBMS MASTER: Become Pro in Database Management System
Ummed Singh
No ratings yet
DBMS
No ratings yet
DBMS
27 pages
Chapter - 7 Distributed Database System
100% (1)
Chapter - 7 Distributed Database System
54 pages
Dbms
No ratings yet
Dbms
14 pages
Unit 5 Parallel and Distributed Databases
No ratings yet
Unit 5 Parallel and Distributed Databases
22 pages
Distributed Databases
No ratings yet
Distributed Databases
32 pages
Introduction To Database Systems (CS 4320 at Cornell) : Immanuel Trummer
No ratings yet
Introduction To Database Systems (CS 4320 at Cornell) : Immanuel Trummer
33 pages
Unit 6 Advanced Databases
No ratings yet
Unit 6 Advanced Databases
108 pages
SAS Programming Guidelines Interview Questions You'll Most Likely Be Asked
From Everand
SAS Programming Guidelines Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet
Elective-I Advanced Database Management Systems: Unit Ii
100% (1)
Elective-I Advanced Database Management Systems: Unit Ii
141 pages
ADBMS Exam Question Answers
No ratings yet
ADBMS Exam Question Answers
54 pages
AI-Driven Web Apps: Practical Machine Learning for Software Developers
From Everand
AI-Driven Web Apps: Practical Machine Learning for Software Developers
Sivaramarajalu Ramadurai Venkataraajalu
No ratings yet
Parallel Dbms
No ratings yet
Parallel Dbms
5 pages
Parallel Database Systems and Their Architecture
No ratings yet
Parallel Database Systems and Their Architecture
17 pages
Parallel & Distributed Databases: C S 5 6 1 - S P R I N G 2 0 1 2 Wpi, Mohamed Eltabakh
No ratings yet
Parallel & Distributed Databases: C S 5 6 1 - S P R I N G 2 0 1 2 Wpi, Mohamed Eltabakh
23 pages
ParallelDBs PDF
No ratings yet
ParallelDBs PDF
23 pages
ADTHEORY1
No ratings yet
ADTHEORY1
15 pages
Dmbs New Slides Unit 3
No ratings yet
Dmbs New Slides Unit 3
20 pages
6.830/6.814 - Notes For Lecture 4: Database Internals Overview
No ratings yet
6.830/6.814 - Notes For Lecture 4: Database Internals Overview
7 pages
CouchDB Essentials: Definitive Reference for Developers and Engineers
From Everand
CouchDB Essentials: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
DBMS Unit-4
No ratings yet
DBMS Unit-4
66 pages
DBMS
No ratings yet
DBMS
65 pages
Revis Ioin
No ratings yet
Revis Ioin
5 pages
TDD: Topics in Distributed Databases: Parallel Database Management Systems
No ratings yet
TDD: Topics in Distributed Databases: Parallel Database Management Systems
38 pages
CH 2
No ratings yet
CH 2
51 pages
DB Chapter no 3
No ratings yet
DB Chapter no 3
20 pages
16-concurrencycontrol (1)
No ratings yet
16-concurrencycontrol (1)
4 pages
adbms-unit4
No ratings yet
adbms-unit4
24 pages
Client Server Systems
No ratings yet
Client Server Systems
35 pages
CPS 216: Advanced Database Systems: Shivnath Babu Fall 2006
No ratings yet
CPS 216: Advanced Database Systems: Shivnath Babu Fall 2006
23 pages
16 Concurrencycontrol
No ratings yet
16 Concurrencycontrol
98 pages
Database Management
No ratings yet
Database Management
18 pages
Parallel Processing: Types of Parallelism
No ratings yet
Parallel Processing: Types of Parallelism
7 pages
Chapter - 7 Distributed Database System
No ratings yet
Chapter - 7 Distributed Database System
58 pages
Cloud Computing Essentials: A Practical Guide with Examples
From Everand
Cloud Computing Essentials: A Practical Guide with Examples
William E. Clark
No ratings yet
Introduction To Parallel Databases
No ratings yet
Introduction To Parallel Databases
24 pages
CPS 216: Advanced Database Systems: Shivnath Babu Fall 2006
No ratings yet
CPS 216: Advanced Database Systems: Shivnath Babu Fall 2006
23 pages
Unit_I DBMS
No ratings yet
Unit_I DBMS
74 pages
Distributed Database Management Systems
No ratings yet
Distributed Database Management Systems
23 pages
customer discovery (1)
No ratings yet
customer discovery (1)
62 pages
21 Recovery
No ratings yet
21 Recovery
83 pages
21 Recovery (1)
No ratings yet
21 Recovery (1)
7 pages
20-logging
No ratings yet
20-logging
91 pages
Computer Memory
No ratings yet
Computer Memory
9 pages
Xanadu Now Platform Capabilities Domain Separation and Document Management 2024-11-16-13-34-25
No ratings yet
Xanadu Now Platform Capabilities Domain Separation and Document Management 2024-11-16-13-34-25
5 pages
VxRail Appliance - VxRail Software Upgrade Procedures-VxRail E560 - E560F
No ratings yet
VxRail Appliance - VxRail Software Upgrade Procedures-VxRail E560 - E560F
27 pages
Proposal Memohon Penajaan Vivacad MTDC
No ratings yet
Proposal Memohon Penajaan Vivacad MTDC
16 pages
End-User License Agreement: 1. Definitions
No ratings yet
End-User License Agreement: 1. Definitions
2 pages
Morales, Daren Christian
No ratings yet
Morales, Daren Christian
1 page
What Tools Should A Data Scientist Know?: 12 Answers
No ratings yet
What Tools Should A Data Scientist Know?: 12 Answers
7 pages
Introduction To Java Presentation
100% (1)
Introduction To Java Presentation
16 pages
Digital Marketing Services in Delhi
No ratings yet
Digital Marketing Services in Delhi
5 pages
Business Intelligence
No ratings yet
Business Intelligence
28 pages
Chap 2 Software Testing Terminology and Methodology.ppt
No ratings yet
Chap 2 Software Testing Terminology and Methodology.ppt
29 pages
AnsysEMInstallGuide Windows
No ratings yet
AnsysEMInstallGuide Windows
96 pages
Endpoint Monitoring and Alerting Playbook For MSPs - EN - 030321
No ratings yet
Endpoint Monitoring and Alerting Playbook For MSPs - EN - 030321
11 pages
IMT 201 Materials
No ratings yet
IMT 201 Materials
42 pages
Database Management Systems 3rd Edition Raghu Ramakrishnan Johannes Gehrke instant download
100% (1)
Database Management Systems 3rd Edition Raghu Ramakrishnan Johannes Gehrke instant download
81 pages
Security, Standards and Applications
No ratings yet
Security, Standards and Applications
3 pages
Cloud Computing
No ratings yet
Cloud Computing
24 pages
OUM & AIM Comparision
No ratings yet
OUM & AIM Comparision
8 pages
Main Category: Basis Sub Category: RFC
No ratings yet
Main Category: Basis Sub Category: RFC
8 pages
Computer Fundamentals
No ratings yet
Computer Fundamentals
29 pages
Lecture Slide 2
No ratings yet
Lecture Slide 2
26 pages
FLIGHT BOOKING project
No ratings yet
FLIGHT BOOKING project
20 pages
Chapter 5
No ratings yet
Chapter 5
8 pages
Tutorial For Self-Certification
No ratings yet
Tutorial For Self-Certification
10 pages
IOT FOUNDATION
No ratings yet
IOT FOUNDATION
3 pages
SAP Invoice Management PDF
100% (7)
SAP Invoice Management PDF
45 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

14-queryexecution2 (1)

Uploaded by

14-queryexecution2 (1)

Uploaded by

Lecture #14: Query Execution II

15-445/645 Database Systems (Fall 2024)

2 Parallel vs Distributed Databases

Figure 1: Process per Worker Model

Process per Worker

Thread per Worker

15-445/645 Database Systems

Figure 2: Thread per Worker Model

Figure 3: Embedded DBMS Scheduling

15-445/645 Database Systems

Intra-Operator Parallelism (Horizontal)

Inter-Operator Parallelism (Vertical)

15-445/645 Database Systems

Figure 5: Inter-operator Parallelism – In the JOIN statement to the left, a single

15-445/645 Database Systems

15-445/645 Database Systems

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.