Open navigation menu

Scribd

0% found this document useful (0 votes)

9 views

Exploring the Gpu Architecture

The document explores GPU architecture, highlighting its role in High Performance Computing (HPC) and its advantages over CPUs in handling parallel processing tasks. It discusses the differences in latency and throughput between CPUs and GPUs, emphasizing the GPU's ability to manage large-scale computations efficiently. The conclusion reinforces that GPUs significantly enhance HPC workloads when integrated with VMware vSphere ESXi.

Uploaded by

Copyright

© © All Rights Reserved

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views

Exploring the Gpu Architecture

The document explores GPU architecture, highlighting its role in High Performance Computing (HPC) and its advantages over CPUs in handling parallel processing tasks. It discusses the differences in latency and throughput between CPUs and GPUs, emphasizing the GPU's ability to manage large-scale computations efficiently. The conclusion reinforces that GPUs significantly enhance HPC workloads when integrated with VMware vSphere ESXi.

Uploaded by

Copyright

© © All Rights Reserved

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

Exploring the GPU Architecture

VMware AI/ML

©️ VMware LLC.
Exploring the GPU Architecture

Table of contents

Exploring the GPU Architecture .................................................................................................................... 3

Overview ................................................................................................................................................... 3

Latency vs Throughput ............................................................................................................................... 4

Diﬀerences and Similarities ......................................................................................................................... 5

Exploring the GPU Architecture .................................................................................................................... 6

To Conclude ............................................................................................................................................... 8

©️ VMware LLC.
Exploring the GPU Architecture

Exploring the GPU Architecture

Overview
A Graphics Processor Unit (GPU) is mostly known for the hardware device used when running applications that weigh heavy on
graphics, i.e. 3D modeling software or VDI infrastructures. In the consumer market, a GPU is mostly used to accelerate gaming
graphics. Today, GPGPU’s (General Purpose GPU) are the choice of hardware to accelerate computational workloads in modern
High Performance Computing (HPC) landscapes.
HPC in itself is the platform serving workloads like Machine Learning (ML), Deep Learning (DL), and Artificial Intelligence (AI). Using
a GPGPU is not only about ML computations that require image recognition anymore. Calculations on tabular data is also a
common exercise in i.e. healthcare, insurance and financial industry verticals. But why do we need a GPU for these types of all
these workloads? This blogpost will go into the GPU architecture and why they are a good fit for HPC workloads running on vSphere
ESXi.

©️ VMware LLC. Document | 3

Exploring the GPU Architecture

Latency vs Throughput
Let’s first take a look at the main differences between a Central Processing Unit (CPU) and a GPU. A common CPU is optimized to
be as quick as possible to finish a task at a as low as possible latency, while keeping the ability to quickly switch between
operations. It’s nature is all about processing tasks in a serialized way. A GPU is all about throughput optimization, allowing to push
as many as possible tasks through is internals at once. It does so by being able to parallel process a task. The following exemplary
diagram shows the ‘core’ count of a CPU and GPU. It emphasizes that the main contrast between both is that a GPU has a lot more
cores to process a task.

©️ VMware LLC. Document | 4

Exploring the GPU Architecture

Diﬀerences and Similarities

However, it is not only about the number of cores. And when we speak of cores in a NVIDIA GPU, we refer to CUDA cores that
consists of ALU’s (Arithmetic Logic Unit). Terminology may vary between vendors.
Looking at the overall architecture of a CPU and GPU, we can see a lot of similarities between the two. Both use the memory
constructs of cache layers, memory controller and global memory. A high-level overview of modern CPU architectures indicates it
is all about low latency memory access by using signiﬁcant cache memory layers. Let’s ﬁrst take a look at a diagram that shows an
generic, memory focussed, modern CPU package (note: the precise lay-out strongly depends on vendor/model).

A single CPU package consists of cores that contains separate data and instruction layer-1 caches, supported by the layer-2 cache.
The layer-3 cache, or last level cache, is shared across multiple cores. If data is not residing in the cache layers, it will fetch the
data from the global DDR-4 memory. The numbers of cores per CPU can go up to 28 or 32 that run up to 2.5 GHz or 3.8 GHz with
Turbo mode, depending on make and model. Caches sizes range up to 2MB L2 cache per core.

©️ VMware LLC. Document | 5

Exploring the GPU Architecture

Exploring the GPU Architecture

If we inspect the high-level architecture overview of a GPU (again, strongly depended on make/model), it looks like the nature of a
GPU is all about putting available cores to work and it’s less focussed on low latency cache memory access.

A single GPU device consists of multiple Processor Clusters (PC) that contain multiple Streaming Multiprocessors (SM). Each SM
accommodates a layer-1 instruction cache layer with its associated cores. Typically, one SM uses a dedicated layer-1 cache and a
shared layer-2 cache before pulling data from global GDDR-5 (or GDDR-6 in newer GPU models) memory. Its architecture is
tolerant of memory latency.
Compared to a CPU, a GPU works with fewer, and relatively small, memory cache layers. Reason being is that a GPU has more
transistors dedicated to computation meaning it cares less how long it takes the retrieve data from memory. The potential memory

©️ VMware LLC. Document | 6

Exploring the GPU Architecture

access ‘latency’ is masked as long as the GPU has enough computations at hand, keeping it busy.

A GPU is optimized for data parallel throughput computations.

Looking at the numbers of cores it quickly shows you the possibilities on parallelism that is it is capable of. When examining the
2019 NVIDIA ﬂagship oﬀering, the Tesla V100, one device contains 80 SM’s, each containing 64 cores making a total of 5120
cores! Tasks aren’t scheduled to individual cores, but to processor clusters and SM’s. That’s how it’s able to process in parallel.
Now combine this powerful hardware device with a programming framework so applications can fully utilize the computing power
of a GPU.

©️ VMware LLC. Document | 7

Exploring the GPU Architecture

To Conclude

High Performance Computing (HPC) is the use of parallel processing for running advanced application
programs eﬃciently, reliably and quickly.

This is exactly why GPU’s are a perfect fit for HPC workloads. Workloads can greatly benefit from using GPU’s as it enables them to
have massive increases in throughput. A HPC platform using GPU’s will become much more versatile, flexible and efficient when
running it on top of the VMware vSphere ESXi hypervisor. It allows for GPU-based workloads to allocate GPU resources in a very
flexible and dynamic way.

©️ VMware LLC. Document | 8

©️ VMware LLC. Copyright © 2005-2024 Broadcom. All Rights Reserved. The term “Broadcom” refers to Broadcom Inc.
and/or its subsidiaries.

You might also like

Report On Gpu
No ratings yet
Report On Gpu
39 pages
System Modelling (ESL)
No ratings yet
System Modelling (ESL)
45 pages
GPU Introduction
No ratings yet
GPU Introduction
52 pages
Comp Arch Project 2 Final
No ratings yet
Comp Arch Project 2 Final
29 pages
Why GPU?: CS8803SC Software and Hardware Cooperative Computing
No ratings yet
Why GPU?: CS8803SC Software and Hardware Cooperative Computing
14 pages
10 GPU-IntroCUDA3
No ratings yet
10 GPU-IntroCUDA3
141 pages
Gpu Series i Cpu vs Gpu 1720694318
No ratings yet
Gpu Series i Cpu vs Gpu 1720694318
4 pages
GPGPU
No ratings yet
GPGPU
139 pages
Lecture - 01 - CUDA Programming
No ratings yet
Lecture - 01 - CUDA Programming
52 pages
3-1
No ratings yet
3-1
35 pages
GPU in Supercomputer
No ratings yet
GPU in Supercomputer
7 pages
gpus
No ratings yet
gpus
32 pages
Intro Computing BCSM-F18-071 - Assignment 1
No ratings yet
Intro Computing BCSM-F18-071 - Assignment 1
10 pages
0-gpu-computing-i-give-it
No ratings yet
0-gpu-computing-i-give-it
57 pages
Graphics Processing Unit
No ratings yet
Graphics Processing Unit
9 pages
GPU Architecture
No ratings yet
GPU Architecture
12 pages
GPU Khoruzhenko
No ratings yet
GPU Khoruzhenko
5 pages
UNIT-4
No ratings yet
UNIT-4
48 pages
Chapter 5 - General Purpose PGPU, CUDA
No ratings yet
Chapter 5 - General Purpose PGPU, CUDA
70 pages
Graphics Processing Unit (Gpu) Memory Hierarchy: Presented by Vu Dinh and Donald Macintyre
No ratings yet
Graphics Processing Unit (Gpu) Memory Hierarchy: Presented by Vu Dinh and Donald Macintyre
24 pages
2024-aq-compute-blogpost_cpu-vs-gpu
No ratings yet
2024-aq-compute-blogpost_cpu-vs-gpu
9 pages
WINSEM2022-23_CSE4001_ETH_VL2022230503160_Reference_Material_I_10-01-2023_2.2_GPU
No ratings yet
WINSEM2022-23_CSE4001_ETH_VL2022230503160_Reference_Material_I_10-01-2023_2.2_GPU
34 pages
Lecture GPUArchCUDA01
No ratings yet
Lecture GPUArchCUDA01
57 pages
NVIDIA GPU Computing - A Journey From PC Gaming To Deep Learning
100% (1)
NVIDIA GPU Computing - A Journey From PC Gaming To Deep Learning
91 pages
GPU Cluster4
No ratings yet
GPU Cluster4
31 pages
Kirk+Hwu GPU
No ratings yet
Kirk+Hwu GPU
92 pages
GPU (Graphics Processing Unit)
No ratings yet
GPU (Graphics Processing Unit)
23 pages
HPC 5th Unit - 240504 - 160548
No ratings yet
HPC 5th Unit - 240504 - 160548
18 pages
Lecture 1
No ratings yet
Lecture 1
17 pages
Presentation Prepared by Saatwik Kumar 1101219423 ETC, ET-2
No ratings yet
Presentation Prepared by Saatwik Kumar 1101219423 ETC, ET-2
18 pages
COE4590_15_GPU1
No ratings yet
COE4590_15_GPU1
14 pages
Gpgpu Workshop Cuda
No ratings yet
Gpgpu Workshop Cuda
10 pages
Graphic Processing Unit
100% (1)
Graphic Processing Unit
20 pages
Lecture 1: Introduction: Graphics Processing Units (Gpus) : Architecture and Programming
No ratings yet
Lecture 1: Introduction: Graphics Processing Units (Gpus) : Architecture and Programming
33 pages
GPU Architecture Ebook
No ratings yet
GPU Architecture Ebook
67 pages
Graphics Processing Unit
No ratings yet
Graphics Processing Unit
14 pages
A Look Into Parallel Architectures
No ratings yet
A Look Into Parallel Architectures
43 pages
Developers Had To Map Scientific Calculations Onto Problems That Could Be Represented by Triangles and Polygons
No ratings yet
Developers Had To Map Scientific Calculations Onto Problems That Could Be Represented by Triangles and Polygons
2 pages
Comparative Study On CPU GPU and TPU
No ratings yet
Comparative Study On CPU GPU and TPU
9 pages
Graphics Processing Unit Graphics Processing Unit: Dhan V Sagar CB - EN.P2CSE13007
No ratings yet
Graphics Processing Unit Graphics Processing Unit: Dhan V Sagar CB - EN.P2CSE13007
21 pages
An Introduction To Graphical Processing Unit: Jayshree Ghorpade, Jitendra Parande, Rohan Kasat, Amit Anand
No ratings yet
An Introduction To Graphical Processing Unit: Jayshree Ghorpade, Jitendra Parande, Rohan Kasat, Amit Anand
6 pages
GPU Architecture
No ratings yet
GPU Architecture
8 pages
Gpu Research Paper
No ratings yet
Gpu Research Paper
6 pages
Demystifying GPU microarchitecture through microbenchmarking
No ratings yet
Demystifying GPU microarchitecture through microbenchmarking
12 pages
Compute Unified Device Architecture
No ratings yet
Compute Unified Device Architecture
6 pages
CUDA Tutorial
No ratings yet
CUDA Tutorial
50 pages
GPGPU Programming With CUDA: Leandro Avila - University of Northern Iowa
No ratings yet
GPGPU Programming With CUDA: Leandro Avila - University of Northern Iowa
29 pages
GPU Architecture
0% (2)
GPU Architecture
28 pages
Graphics Processing Unit
No ratings yet
Graphics Processing Unit
22 pages
GPU Architecture & Implications: David Luebke NVIDIA Research
No ratings yet
GPU Architecture & Implications: David Luebke NVIDIA Research
94 pages
Gpgpu Final
No ratings yet
Gpgpu Final
124 pages
Graphics Processing Unit (Gpu) : BY Amal Raj.R Electronics C.P.T.C
No ratings yet
Graphics Processing Unit (Gpu) : BY Amal Raj.R Electronics C.P.T.C
30 pages
Introduction To GP-GPU and CUDA: High Performance Computing Center Hanoi University of Science & Technology
No ratings yet
Introduction To GP-GPU and CUDA: High Performance Computing Center Hanoi University of Science & Technology
43 pages
gpu (1)
No ratings yet
gpu (1)
11 pages
Graphics Processing Unit Thesis
100% (2)
Graphics Processing Unit Thesis
4 pages
Using GPUs
No ratings yet
Using GPUs
18 pages
Graphics Processing Unit: Shashwat Shriparv Infinitysoft
No ratings yet
Graphics Processing Unit: Shashwat Shriparv Infinitysoft
39 pages
CAO Report
No ratings yet
CAO Report
17 pages
Unit 2 - GPU DFG
No ratings yet
Unit 2 - GPU DFG
27 pages
GPU Programming: Dr. Florian Ferreira
No ratings yet
GPU Programming: Dr. Florian Ferreira
101 pages
789
No ratings yet
789
5 pages
Newscast CW27
No ratings yet
Newscast CW27
1 page
Sri Suktam - Devanagari - Vaidika Vignanam
No ratings yet
Sri Suktam - Devanagari - Vaidika Vignanam
5 pages
C++ Tips and Tricks (Bruce Merry)
No ratings yet
C++ Tips and Tricks (Bruce Merry)
51 pages
Mde 4871
No ratings yet
Mde 4871
338 pages
Compare Laptop
No ratings yet
Compare Laptop
7 pages
SMART MIRROR - Final
No ratings yet
SMART MIRROR - Final
17 pages
DX Diag
No ratings yet
DX Diag
16 pages
CCW SDR-4+ Receiver Quick Start Guide v1.6
No ratings yet
CCW SDR-4+ Receiver Quick Start Guide v1.6
12 pages
FastbootET01Ver1005 Operation Instruction
100% (1)
FastbootET01Ver1005 Operation Instruction
8 pages
Android Emulator
100% (1)
Android Emulator
23 pages
Atme College of Engineering: First Internal Assesment
No ratings yet
Atme College of Engineering: First Internal Assesment
1 page
MacBook Pro - Apple (UK)
No ratings yet
MacBook Pro - Apple (UK)
5 pages
Splc780d Ds
No ratings yet
Splc780d Ds
48 pages
Opencpn Raspberry Pi4 Plotter V1hv4
No ratings yet
Opencpn Raspberry Pi4 Plotter V1hv4
9 pages
Chapter1 Basic Concept of Computer
No ratings yet
Chapter1 Basic Concept of Computer
36 pages
Imx Linux Users Guide
No ratings yet
Imx Linux Users Guide
128 pages
The Architecture of Pentium Microprocessor
No ratings yet
The Architecture of Pentium Microprocessor
3 pages
Ram Timeline
No ratings yet
Ram Timeline
8 pages
Ai6808hg L F
No ratings yet
Ai6808hg L F
4 pages
Microcontrollers For Embedded Systems Nec022r
No ratings yet
Microcontrollers For Embedded Systems Nec022r
1 page
Synthesis
No ratings yet
Synthesis
5 pages
Operate Personal Computer-1 UC 2 2016
No ratings yet
Operate Personal Computer-1 UC 2 2016
44 pages
Hardware
No ratings yet
Hardware
22 pages
GPIO Programming
No ratings yet
GPIO Programming
12 pages
MCU - PIC24FV32KA304 - MICROCHIP - Section 8. Interrupts - 39707a PDF
No ratings yet
MCU - PIC24FV32KA304 - MICROCHIP - Section 8. Interrupts - 39707a PDF
26 pages
08 Lab Manual - Program To Find Out Factorial of A Number
No ratings yet
08 Lab Manual - Program To Find Out Factorial of A Number
4 pages
Travelmate P253-M: Model: Tmp253-M-33114G32Mnks Part No: Nx.V7Ved.005 Family: Tmp253-M Ean: 4712196600709
No ratings yet
Travelmate P253-M: Model: Tmp253-M-33114G32Mnks Part No: Nx.V7Ved.005 Family: Tmp253-M Ean: 4712196600709
1 page
Engineering 4862 Microprocessors: Assignment 2
No ratings yet
Engineering 4862 Microprocessors: Assignment 2
6 pages
Abbey Road Keyboards Info
No ratings yet
Abbey Road Keyboards Info
1 page
Penis
No ratings yet
Penis
13 pages
Automated Aquarium Feeder
No ratings yet
Automated Aquarium Feeder
11 pages
GROUP (1)
No ratings yet
GROUP (1)
28 pages
Linux Boot Process
No ratings yet
Linux Boot Process
9 pages

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Alternative Proxies:

Alternative Proxy