0% found this document useful (0 votes)

16 views

HPA - Notes

Uploaded by

xboxxbo8

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views

HPA - Notes

Uploaded by

xboxxbo8

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

HPA - Notes

13 May 2024 11:56

Introduction
Architecture of the central Processing Unit

Parts of a CPU:
1. ALU - The arithmetic logic unit executes all calculations within
the CPU
2. CU - control unit, coordinates how data moves around, decodes
instructions
Registers, a memory location within the actual processor that
work at very fast speeds. It stores instructions which await to be
decoded or executed.
1. PC - program counter - stores address of the -> next <-
• Memory holds both data and instructions
instruction in RAM
• The arithmetic/logic gate unit is capable of performing arithmetic and
2. MAR - memory address register - stores the address of the
logic operations on data
current instruction being executed
• A processor register is a quickly accessible location available to a
3. MDR - memory data register - stores the data that is to be sent to
digital processor's central processing unit (CPU). Registers usually
or fetched from memory
consist of a small amount of fast storage, although some registers
4. CIR - current instruction register - stores actual instruction that is
have specific hardware functions, and may be read-only or write-
being decoded and executed
only[3]
5. ACC - accumulator - stores result of calculations
• The control unit controls the flow of data within the CPU - (which is Buses
the Fetch-Execute cycle)
1. address bus - carries the ADDRESS of the instruction or data
• Input arrives into a CPU via a bus 2. data bus - carries data between processor and the memory
• Output exits the CPU via a bus 3. control bus - sends control signals such as: memory read,
memory write

Layers of Abstraction

Layers of abstraction: https://slikts.github.io/concurrency-glossary/?id=layers-of-abstraction

Instruction Level Parallelism

HPA Page 1
Instruction Level Parallelism
Different instructions within a stream can
be executed in parallel
Pipelining, out-of-order execution,
speculative execution, VLIW
Dataflow

Data Parallelism
Different pieces of data can be operated on
in parallel
SIMD: Vector processing, array processing
Systolic arrays, streaming processors
Task Level Parallelism
Different “tasks/threads” can be executed
in parallel
Multithreading
Multiprocessing (multi-core)

• Flynn's taxonomy distinguishes multi-processor computer architectures according

to how they can be classified along the two independent dimensions
of Instruction Stream and Data Stream. Each of these dimensions can have
only one of two possible states: Single or Multiple.
• The matrix below defines the 4 possible classifications according to Flynn:

Single Instruction, Single Data (SISD)

• A serial (non-parallel) computer
• Single Instruction: Only one instruction stream is being acted on by the CPU during any one
clock cycle
• Single Data: Only one data stream is being used as input during any one clock cycle
• Deterministic execution
• This is the oldest type of computer
• Examples: older generation mainframes, minicomputers, workstations and single
processor/core PCs.

Single Instruction, Multiple Data (SIMD)

• A type of parallel computer
• Single Instruction: All processing units execute the same instruction at any given clock cycle
• Multiple Data: Each processing unit can operate on a different data element
• Best suited for specialized problems characterized by a high degree of regularity, such as
graphics/image processing.
• Synchronous (lockstep) and deterministic execution
• Two varieties: Processor Arrays and Vector Pipelines
• Most modern computers, particularly those with graphics processor units (GPUs) employ SIMD
instructions and execution units.

Multiple Instruction, Single Data (MISD)

• A type of parallel computer
• Multiple Instruction: Each processing unit operates on the data independently via separate
instruction streams.
• Single Data: A single data stream is fed into multiple processing units.
• Few (if any) actual examples of this class of parallel computer have ever existed.
• Some conceivable uses might be:
○ multiple frequency filters operating on a single signal stream
○ multiple cryptography algorithms attempting to crack a single coded message.

Multiple Instruction, Multiple Data (MIMD)

HPA Page 2
Multiple Instruction, Multiple Data (MIMD)
• A type of parallel computer
• Multiple Instruction: Every processor may be executing a different instruction stream
• Multiple Data: Every processor may be working with a different data stream
• Execution can be synchronous or asynchronous, deterministic or non-deterministic
• Currently, the most common type of parallel computer - most modern supercomputers fall into
this category.
• Examples: most current supercomputers, networked parallel computer clusters and "grids",
multi-processor SMP computers, multi-core PCs.

Parallel Computer Memory Architectures

Shared Memory
General Characteristics
• Shared memory parallel computers vary widely, but generally have in common the
ability for all processors to access all memory as global address space.
• Multiple processors can operate independently but share the same memory
resources.
• Changes in a memory location effected by one processor are visible to all other
processors.
• Historically, shared memory machines have been classified as UMA and NUMA,
based upon memory access times.

Uniform Memory Access (UMA)

• Most commonly represented today by Symmetric Multiprocessor (SMP) machines
• Identical processors
• Equal access and access times to memory
• Sometimes called CC-UMA - Cache Coherent UMA. Cache coherent means if one
processor updates a location in shared memory, all the other processors know about
the update. Cache coherency is accomplished at the hardware level.

Non-Uniform Memory Access (NUMA)

• Often made by physically linking two or more SMPs

• One SMP can directly access memory of another SMP
• Not all processors have equal access time to all memories
• Memory access across link is slower
• If cache coherency is maintained, then may also be called CC-NUMA - Cache
Coherent NUMA

Advantages
• Global address space provides a user-friendly programming perspective to memory
• Data sharing between tasks is both fast and uniform due to the proximity of memory to
CPUs
Disadvantages
• Primary disadvantage is the lack of scalability between memory and CPUs. Adding
more CPUs can geometrically increases traffic on the shared memory-CPU path, and
for cache coherent systems, geometrically increase traffic associated with
cache/memory management.
• Programmer responsibility for synchronization constructs that ensure "correct" access
of global memory.

Distributed Memory
General Characteristics
• Like shared memory systems, distributed memory systems vary widely but share a
common characteristic. Distributed memory systems require a communication network
to connect inter-processor memory.
• Processors have their own local memory. Memory addresses in one processor do not
map to another processor, so there is no concept of global address space across all
processors.
• Because each processor has its own local memory, it operates independently.
Changes it makes to its local memory have no effect on the memory of other
processors. Hence, the concept of cache coherency does not apply.
• When a processor needs access to data in another processor, it is usually the task of
the programmer to explicitly define how and when data is communicated.
Synchronization between tasks is likewise the programmer's responsibility.
• The network "fabric" used for data transfer varies widely, though it can be as simple as
Ethernet.

Advantages
• Memory is scalable with the number of processors. Increase the number of processors
and the size of memory increases proportionately.
• Each processor can rapidly access its own memory without interference and without
the overhead incurred with trying to maintain global cache coherency.
• Cost effectiveness: can use commodity, off-the-shelf processors and networking.

Disadvantages
• The programmer is responsible for many of the details associated with data
communication between processors.
• It may be difficult to map existing data structures, based on global memory, to this
memory organization.
• Non-uniform memory access times - data residing on a remote node takes longer to
access than node local data.

HPA Page 3
• Non-uniform memory access times - data residing on a remote node takes longer to
access than node local data.

Hybrid Distributed-Shared Memory

General Characteristics
• The largest and fastest computers in the world today employ both shared and
distributed memory architectures.
• The shared memory component can be a shared memory machine and/or graphics
processing units (GPU).
• The distributed memory component is the networking of multiple shared memory/GPU
machines, which know only about their own memory - not the memory on another
machine. Therefore, network communications are required to move data from one
machine to another.
• Current trends seem to indicate that this type of memory architecture will continue to
prevail and increase at the high end of computing for the foreseeable future.
Advantages and Disadvantages
• Whatever is common to both shared and distributed memory architectures.
• Increased scalability is an important advantage
• Increased programmer complexity is an important disadvantage

Scalability Prospects
High performance computing (HPC) systems face several key scalability challenges as they continue to
grow in size and complexity:

Bandwidth Scaling
Maintaining sufficient memory bandwidth is critical for HPC performance. As core counts increase, the
memory bandwidth per core tends to decrease, leading to memory bandwidth becoming a bottleneck.
Techniques like 3D stacking, wide I/O, and high-bandwidth memory can help increase memory
bandwidth, but scaling bandwidth remains a major challenge

Latency Scaling
Latency between processors and memory is another key challenge. As systems scale, the average
distance between processors and memory increases, leading to higher latency. Techniques like non -
uniform memory access (NUMA) can help mitigate this, but latency will continue to be a concern

Cost Scaling
Building and operating large-scale HPC systems is extremely expensive. The costs of the hardware,
power, cooling, and facilities grow rapidly as systems scale. Reducing these costs while maintaining
performance is crucial for the continued growth of HPC

Physical Scaling
There are physical limits to how large HPC systems can be built. Factors like the size of data centers,
power delivery, and cooling capacity constrain the maximum size. Innovative approaches to system
architecture and cooling will be needed to push the boundaries of physical scaling

Generic Scaling Methods for Applications

Ensuring HPC applications can effectively utilize large-scale systems is challenging. Techniques like
parallelization, load balancing, and efficient algorithms are needed. But as systems scale, the complexity
of these techniques increases. Developing generic methods to scale applications to exascale and beyond
is an active area of research3.In summary, while HPC systems continue to grow rapidly, bandwidth,
latency, cost, physical size, and application scaling remain major challenges that must be addressed
through continued innovation in hardware and software. Overcoming these challenges will be crucial for
realizing the full potential of exascale and future generations of HPC.

SIMT
• SIMT is the thread equivalent of SIMD. While the latter uses Execution Units or Vector Units, SIMT
expands it to leverage threads. In SIMT, multiple threads perform the same instruction on
different data sets. The main advantage of SIMT is that it reduces the latency that comes with
instruction prefetching.

• SIMD is generally used in CPUs while SIMT is used in GPUs

• SIMT is generally used in Super-scalar processors to implement SIMD. So technically, each core is
scalar in nature but it still works similarly to an SIMD model by leveraging multiple threads to do
the same task on various data sets.

• Every time the GPU needs to execute a particular instruction, the data and instructions are
fetched from the memory and then decoded and executed. In this case, all the data sets (up to a
certain limit) that need the same instruction for execution are prefetched and executed
simultaneously using the various threads available to the processor.

Execution Model CUDA

HPA Page 4
Execution Model CUDA
CUDA is a C extension consisting of:

- Serial CPU code

- Parallel GPU code (kernels)

GPU kernel is a C function that operates as follows:

- Each thread executes kernel code

- A group of threads form a thread block (1D, 2D, or 3D)
- Thread blocks are organized into a grid (1D, 2D, or 3D)
- Threads within the same thread block can synchronize execution and share access to local scratchpad
memory

Key idea: hierarchy of parallelism to handle thousands of threads.

Thread blocks are allocated (dynamically) to Streaming Multiprocessors (SMs) and run to completion.
Threads (warps) within a block run on the same SM, allowing them to share data and synchronize.
Different blocks in a grid cannot interact with each other.

Fermi Architecture
The NVIDIA Fermi GPU architecture, introduced in 2010, represents a significant advancement
in GPU design for high-performance computing. The key aspects of the Fermi architecture and
CUDA execution model are:

SMs
• Each Fermi GPU consists of multiple SMs, with each SM containing 32 CUDA cores
• The SMs are supported by a second-level cache, host interface, GigaThread scheduler, and
multiple DRAM interfaces

Memory Hierarchy
• Fermi introduced improvements to the memory hierarchy compared to previous GPU
architectures
• Each SM has its own L1 cache, instead of multiple SMs sharing a cache
• The memory was upgraded to GDDR5, capable of up to 144 GB/s of bandwidth

CUDA Cores and Execution

• Each SM has two ports that can access either the ALU or FPU, but not both simultaneously
• Each port has 16 execution units, meaning a warp (32 threads) takes 2 cycles to complete
• Warps are scoreboarded to track structural hazards more carefully

CUDA Programming Model

• CUDA is NVIDIA's parallel programming model that utilizes the Fermi architecture
• It is based on a hierarchy of abstraction layers, including threads, thread blocks, and grids.
• Threads within a block can communicate through shared memory and are executed in a SIMD
fashion

Improvements over Previous Architectures

• Fermi aimed to address issues with previous GPU architectures by allowing a wider variety of
work to be executed on the GPU
• It introduced constructs previously exclusive to CPUs, such as exceptions and support for
multiple kernels
• The number of threads per SM was increased to 1536, enabling greater parallelism and support
for multiple, independent kernels
The Fermi architecture and CUDA programming model represented a significant step towards
making GPUs more suitable for general-purpose high-performance computing applications,
while still maintaining their strengths in graphics processing.

HPA Page 5

Dont Breath Screenplay PDF
100% (1)
Dont Breath Screenplay PDF
89 pages
Report 2 - System Design Specification Template
No ratings yet
Report 2 - System Design Specification Template
14 pages
Kalebodur
No ratings yet
Kalebodur
6 pages
dx79sr TechProdSpec02
No ratings yet
dx79sr TechProdSpec02
98 pages
Part 1 - Lecture 2 - Parallel Hardware
No ratings yet
Part 1 - Lecture 2 - Parallel Hardware
60 pages
Parallel Computing: Er. Anupama Singh Department of Computer Science & Engg
No ratings yet
Parallel Computing: Er. Anupama Singh Department of Computer Science & Engg
22 pages
Slides Taken From: Parallel Computing Platforms
No ratings yet
Slides Taken From: Parallel Computing Platforms
11 pages
Module 2 - Parallel Computing
No ratings yet
Module 2 - Parallel Computing
55 pages
APznzaaBPbq19r7DttJsFJDiz6xdljQmPxg0oflqRAoyoqcN6IEEo4yrW Ck8XgHkH5PDMZIHRNz7h0ZpQWHOHwyjvO3PX93sVHvLd5fwcGETUu8XvmdTkaodNRbNrLgkDFPQZVQMfz8KHkZay30aqD0CVLA10PSummzrUt1vN32NEahcaq-m3CTYqZXjSBaBus9kPl5fj8KDKPT (1)
No ratings yet
APznzaaBPbq19r7DttJsFJDiz6xdljQmPxg0oflqRAoyoqcN6IEEo4yrW Ck8XgHkH5PDMZIHRNz7h0ZpQWHOHwyjvO3PX93sVHvLd5fwcGETUu8XvmdTkaodNRbNrLgkDFPQZVQMfz8KHkZay30aqD0CVLA10PSummzrUt1vN32NEahcaq-m3CTYqZXjSBaBus9kPl5fj8KDKPT (1)
80 pages
Parallel_computing
No ratings yet
Parallel_computing
32 pages
Unit -01 easid
No ratings yet
Unit -01 easid
18 pages
02 Lecture Flynn IN
No ratings yet
02 Lecture Flynn IN
78 pages
2. Parallel Computers
No ratings yet
2. Parallel Computers
39 pages
Parallel Computer Models: PCA Chapter 1
No ratings yet
Parallel Computer Models: PCA Chapter 1
61 pages
KCS 713 Unit 1 Lecture 5
No ratings yet
KCS 713 Unit 1 Lecture 5
32 pages
Unit 1 - Part - 2
No ratings yet
Unit 1 - Part - 2
30 pages
Memory in Multiprocessor System
No ratings yet
Memory in Multiprocessor System
52 pages
5 4 Parallel
No ratings yet
5 4 Parallel
47 pages
High Performance Computing
No ratings yet
High Performance Computing
17 pages
CS 213: Parallel Processing Architectures: Laxmi Narayan Bhuyan
No ratings yet
CS 213: Parallel Processing Architectures: Laxmi Narayan Bhuyan
26 pages
Architecture
No ratings yet
Architecture
67 pages
CS 213: Parallel Processing Architectures: Laxmi Narayan Bhuyan
No ratings yet
CS 213: Parallel Processing Architectures: Laxmi Narayan Bhuyan
26 pages
Parallel Processing
No ratings yet
Parallel Processing
22 pages
Unit 7 - Parallel Processing Paradigm
No ratings yet
Unit 7 - Parallel Processing Paradigm
26 pages
Explicitly Parallel Platforms
No ratings yet
Explicitly Parallel Platforms
90 pages
Parallel Computer Architecture A Hardware-Software
No ratings yet
Parallel Computer Architecture A Hardware-Software
18 pages
Classification of Parallel Computation
No ratings yet
Classification of Parallel Computation
33 pages
Lect6-SPC_ Flynns
No ratings yet
Lect6-SPC_ Flynns
16 pages
Chapter 1 (Parallel Computer Models)
No ratings yet
Chapter 1 (Parallel Computer Models)
20 pages
01 Intro Parallel Computing
No ratings yet
01 Intro Parallel Computing
40 pages
Advanced Computer Architecture Unit 1
No ratings yet
Advanced Computer Architecture Unit 1
23 pages
SISd
No ratings yet
SISd
17 pages
Computer Organization: - by Rama Krishna Thelagathoti (M.Tech CSE From IIT Madras)
No ratings yet
Computer Organization: - by Rama Krishna Thelagathoti (M.Tech CSE From IIT Madras)
118 pages
Organization CH 2
No ratings yet
Organization CH 2
102 pages
Multi Processors and Thread Level Parallelism
No ratings yet
Multi Processors and Thread Level Parallelism
74 pages
Parallel Processing Parallel Processing
No ratings yet
Parallel Processing Parallel Processing
64 pages
Ca - Unit 4
No ratings yet
Ca - Unit 4
77 pages
Parallel Processors: Session 2
No ratings yet
Parallel Processors: Session 2
32 pages
Introduction To Parallel Processing
No ratings yet
Introduction To Parallel Processing
49 pages
CS82 Advanced Computer Architecture: Parallel Computer Models 1.2 Multiprocessors and Multicomputers
No ratings yet
CS82 Advanced Computer Architecture: Parallel Computer Models 1.2 Multiprocessors and Multicomputers
19 pages
Why Multiprocessors?: Motivation: Opportunity
No ratings yet
Why Multiprocessors?: Motivation: Opportunity
20 pages
HPC-Unit-1
No ratings yet
HPC-Unit-1
65 pages
Model
No ratings yet
Model
14 pages
Introduction To Parallel Programming
No ratings yet
Introduction To Parallel Programming
268 pages
1. GPU Unit-1
No ratings yet
1. GPU Unit-1
10 pages
What Is Parallel Computing
No ratings yet
What Is Parallel Computing
9 pages
Chapter - 5 Parallel Processing
No ratings yet
Chapter - 5 Parallel Processing
117 pages
General System Architecture
No ratings yet
General System Architecture
28 pages
downloadfile (3)
No ratings yet
downloadfile (3)
16 pages
ACA UNIT-5 Notes
No ratings yet
ACA UNIT-5 Notes
15 pages
Module 4- Architecture
No ratings yet
Module 4- Architecture
22 pages
Lecture 2 General Parallelism Terms
No ratings yet
Lecture 2 General Parallelism Terms
22 pages
01. Stored Program Concept (1)
No ratings yet
01. Stored Program Concept (1)
27 pages
Introduction Mod1
No ratings yet
Introduction Mod1
120 pages
Hpc_unit-1 Insem Notes
No ratings yet
Hpc_unit-1 Insem Notes
76 pages
Unit V
No ratings yet
Unit V
95 pages
Parallel Processors From Client To Cloud: Omputer Rganization and Esign
No ratings yet
Parallel Processors From Client To Cloud: Omputer Rganization and Esign
43 pages
CS326 Parallel and Distributed Computing: SPRING 2021 National University of Computer and Emerging Sciences
No ratings yet
CS326 Parallel and Distributed Computing: SPRING 2021 National University of Computer and Emerging Sciences
47 pages
1/1 Multiprocessors (Or) Shared Memory Multi-Processor Model
No ratings yet
1/1 Multiprocessors (Or) Shared Memory Multi-Processor Model
17 pages
Flynn's Classification
No ratings yet
Flynn's Classification
46 pages
Programação Paralela e Distribuída
No ratings yet
Programação Paralela e Distribuída
39 pages
Chapter 6 Parallel and Concurrent Computing
No ratings yet
Chapter 6 Parallel and Concurrent Computing
27 pages
Computer Science: Learn about Algorithms, Cybersecurity, Databases, Operating Systems, and Web Design
From Everand
Computer Science: Learn about Algorithms, Cybersecurity, Databases, Operating Systems, and Web Design
Jonathan Rigdon
No ratings yet
Computer Science II Essentials
From Everand
Computer Science II Essentials
Randall Raus
No ratings yet
MDD Omc
67% (3)
MDD Omc
2 pages
Collocations 2
No ratings yet
Collocations 2
3 pages
How To Write A Business Plan Template
100% (6)
How To Write A Business Plan Template
16 pages
Discussion On Interfaces Between Embankments and Concrete Dams and Their Instrumentation
100% (1)
Discussion On Interfaces Between Embankments and Concrete Dams and Their Instrumentation
11 pages
Gleneagles Community Centre
100% (2)
Gleneagles Community Centre
8 pages
Wxyz Logo Table
No ratings yet
Wxyz Logo Table
5 pages
HUAWEI CloudEngine 6800 Switch Datasheet
No ratings yet
HUAWEI CloudEngine 6800 Switch Datasheet
12 pages
Servlet Notes
No ratings yet
Servlet Notes
16 pages
Sylvania AlleyKat Low Wattage HID Spec Sheet 5-81
No ratings yet
Sylvania AlleyKat Low Wattage HID Spec Sheet 5-81
4 pages
City of Tucson/Pima County Outdoor Lighting Code
No ratings yet
City of Tucson/Pima County Outdoor Lighting Code
17 pages
Communication Interface To KUKA Robots PDF
No ratings yet
Communication Interface To KUKA Robots PDF
6 pages
CV3 PDF
No ratings yet
CV3 PDF
2 pages
Indian Institute of Technology Roorkee Department of Civil Engineering CEN 305 Design of Steel Elements Tutorial No. 1
No ratings yet
Indian Institute of Technology Roorkee Department of Civil Engineering CEN 305 Design of Steel Elements Tutorial No. 1
1 page
PROBA SCRIS - FILOLOGIE Model
No ratings yet
PROBA SCRIS - FILOLOGIE Model
2 pages
The Challenges of Green Construction in Oman
No ratings yet
The Challenges of Green Construction in Oman
9 pages
Downloadfile PDF
No ratings yet
Downloadfile PDF
50 pages
Monuments of National Importance
0% (1)
Monuments of National Importance
3 pages
Separator 520Kw Panel
No ratings yet
Separator 520Kw Panel
88 pages
Info Sheet 4 Temperature and Humidity
No ratings yet
Info Sheet 4 Temperature and Humidity
6 pages
DHIS2 Developer Manual
No ratings yet
DHIS2 Developer Manual
222 pages
An Introduction To SAP Business One Cloud
No ratings yet
An Introduction To SAP Business One Cloud
14 pages
Fig. 1: Types of Shear Wall, (A) Single Storey, (B) Multi-Storey
No ratings yet
Fig. 1: Types of Shear Wall, (A) Single Storey, (B) Multi-Storey
3 pages
Cem Flex
No ratings yet
Cem Flex
3 pages
Memoirs of MR - Brooke - 1847
100% (1)
Memoirs of MR - Brooke - 1847
16 pages
IRC 058 1988+Design+Rigid+Pavements
100% (2)
IRC 058 1988+Design+Rigid+Pavements
47 pages
EMBMS Technical Discussion Session - I - April'17
No ratings yet
EMBMS Technical Discussion Session - I - April'17
86 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

HPA - Notes

Uploaded by

HPA - Notes

Uploaded by

HPA - Notes

13 May 2024 11:56

Layers of abstraction: https://slikts.github.io/concurrency-glossary/?id=layers-of-abstraction

Instruction Level Parallelism

• Flynn's taxonomy distinguishes multi-processor computer architectures according

Single Instruction, Single Data (SISD)

Single Instruction, Multiple Data (SIMD)

Multiple Instruction, Single Data (MISD)

Multiple Instruction, Multiple Data (MIMD)

Parallel Computer Memory Architectures

Uniform Memory Access (UMA)

Non-Uniform Memory Access (NUMA)

• Often made by physically linking two or more SMPs

Hybrid Distributed-Shared Memory

Generic Scaling Methods for Applications

• SIMD is generally used in CPUs while SIMT is used in GPUs

Execution Model CUDA

- Serial CPU code

GPU kernel is a C function that operates as follows:

- Each thread executes kernel code

Key idea: hierarchy of parallelism to handle thousands of threads.

CUDA Cores and Execution

CUDA Programming Model

Improvements over Previous Architectures

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.