FinalProject

Download as pdf or txt
Download as pdf or txt
You are on page 1of 14

Parallelize and optimize an application

4.0
fit@hcmus
Parallelizing/optimizing a program
What part(s) should be parallelized/optimized?
• Measure times of parts to decide
• When optimizing (after parallelizing), can measure times
of GPU activities quickly:
• nvprof --print-gpu-trace ./a.out

1. Analyze
chỗ nào cần song2 hóa • Each loop will create a
new version based on
previous versions
• We should go step by
song2 hóa như thế nào
step, from sequential to
parallel, from parallel to
4. Evaluate 2. Design optimized parallel

How to parallelize/optimize?
Does the idea work?
If not, do you know
why? 3. Implement
2
Parallelizing/optimizing a program
What part(s) should be parallelized/optimized?
• Measure times of parts to decide
• When optimizing (after parallelizing), can measure times
of GPU activities quickly:
• nvprof --print-gpu-trace ./a.out

How to go through this process


1. Analyze as well as possible?

Some advices:
• Keep the mind still
• Keep the code clean
• Code fast or slow?
4. Evaluate 2. Design • Use a good editor and learn
how to use it efficiently

How to parallelize/optimize?
Does the idea work?
If not, do you know
why? 3. Implement
3
General optimization guidelines
 Expose enough independent tasks to utilize
GPU hardware resources
 Expose enough blocks to utilize SMs
 In each SM, expose enough independent instructions
(coming from the same warp, or from different warps)
to utilize execution pipelines, hide latency
 Access DRAM efficiently
 Don’t let threads in the same warp access scattered
addresses in DRAM
 Use SMEM to reduce DRAM accesses, as well as to
access DRAM efficiently
 Reduce warp divergence
4
Final project - Contents of Colab notebook

1. Application description
 What is your chosen application?
đưa ví dụ vào (ảnh) input gì output là gì
 Input? Output?
 Use cases?
 Does it need to speed up?

5
Final project - Contents of Colab notebook

2. Sequential implementation
 Design: Describe steps to go from input to output
(don’t show code) chi tiết nhất có thể
 Evaluate:
 Describe your experiment setup testing trên bộ gì, train/test bnh, sài GPU gì

 Run the code to see results


 Does it run correctly?

6
Final project - Contents of Colab notebook

3. Parallel implementation naive version

 Analyze: Which steps do you parallelize? Why these


steps?
 Design: How do you parallelize? (don’t show code)
 Evaluate:
 Describe your experiment setup
 Run the code to see results
 Does it run correctly & faster? If not, do you know why?

7
Final project - Contents of Colab notebook

4. Parallel implementation + optimization


You should have ≥ 2 optimized versions
At each version:
 Analyze: Which parts (often: which kernels) do you
optimize? Why these parts?
 Design: How do you optimize? (don’t show code)
 Evaluate:
 Describe your experiment setup
 Run the code to see results
 Does it run correctly & faster? If not, do you know why?

8
Final project - Contents of Colab notebook

5. Reflection
 Each member: What difficulties have you encountered?
 Each member: What have you learned?
 Your team: If you had more time, what would you do?

9
Final project - Contents of Colab notebook

6. References
To finish this project, what materials have you
consulted?

10
Final project - Code files

Each version (sequential version, parallel version,


1st optimized parallel version, 2nd optimized parallel
version, …) should be in a separate file

11
Final project - Teamwork

Your team should have a plan file

All members in your team should understand the


team’s project thoroughly (of course, it includes
code)

12
Final project - Submission & presentation
 x = presentation day
 x will be one day from xxx to xxx (I will decide and let you
know later)
 Before 23:55 day x-1: upload your team’s project to a link in
Moodle, include:
 Team plan file and work distribution
 Colab notebook file
 All source code file and an instruction file on how to set up and run
your project
 A presentation video about 15-20min. Upload on YouTube with
Unlisted option
 Day x: present offline in classroom (use Colab notebook file
to present, no need to prepare slides)
 Each team will have ~15 minutes to present (each member will
present ~1/2 contents, and I will decide who will present which)
and ~10 minute to Q & A

13
Thank you

14

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy