0% found this document useful (0 votes)

3 views16 pages

gpu-object-linking

CUDA 5.0 introduces separate compilation and linking, allowing independent object files to be built and linked into a single program, which eases code porting and reduces build times. It supports the creation of static libraries and enables the use of third-party libraries, facilitating code reuse. The document also highlights compatibility warnings and the need for relinking objects for different architectures.

Uploaded by

Ali

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views16 pages

gpu-object-linking

Uploaded by

Ali

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 16

Separate Compilation in

CUDA 5.0
by Mike Murphy
No Separate Compilation in earlier releases

all.cu
main.cpp + a.cu b.cu c.cu program.exe

Include files together to build

Earlier CUDA required single source file for a single kernel

No linking external device code
CUDA 5: Separate Compilation & Linking

a.cu b.cu c.cu

main.cpp + program.exe

a.o b.o c.o

Separate compilation allows building independent object files

CUDA 5 can link multiple object files into one program

Benefits of Separate Compilation

Eases porting code

no longer have to include files together
“extern” attribute is respected

Incremental compilation reduces build time

e.g. 47000 line app used to take 50 seconds to build, now when split
into multiple files takes 4 seconds to build if only one file changed

Can create and use 3rd party libraries

CUDA 5: Library Support
main.cpp

+
foo.cu
a.cu b.cu
+
ab.a

a.o + b.o

program.exe

Can combine object files into static libraries

Link and externally call device code
CUDA 5: Library Support
main.cpp main2.cpp

+ +
foo.cu bar.cu
a.cu b.cu
+ +
ab.a ab.a

a.o + b.o

program.exe program2.exe

Can combine object files into static libraries

Link and externally call device code

Facilitates code reuse, reduces compile time

CUDA 5: Callbacks
main.cpp

+
foo.cu

Enables closed-source device +

libraries to call user-defined
vendor.a + callback.cu
device callback functions

program.exe
Separate Compilation Features

SM_2x and above (Fermi & Kepler, no support for sm_1x)

All platforms (Linux, Windows, and MacOS)
All CUDA features
Optimized and Debug (-G) compilations
Support both previous whole-program compilation and new
separate compilation.
Default is whole-program compilation, have to opt in to separate
compilation.
Libraries

Can link static host libraries (.a,.lib) that contain

device code
Shared libraries (.dylib,.so,.dll) are ignored by device
linker
libcublas_device.a is linkable device library that we
ship and is used for dynamic parallelism
Example usage

nvcc –arch=sm_20 –dc *.cu

-c is used for host compile to object, so invented -dc
-dc == --device-c == --relocatable-device-code -c
Without –dc we default to old whole program compilation
nvcc –arch=sm_20 *.o
Device linker is implicitly run for sm_20 and above, but does nothing if
does not find relocatable device code.

If want to use host linker:

nvcc –arch=sm_20 *.o –dlink –o link.o
create new object; -dlink == --device-link
g++ *.o –lcudart
link all objects, including new link.o
CUDA host objects must be passed to both device and host linkers
Demo
Multiple Device Links

Can do multiple device links within a single host executable

nvcc a.o b.o –dlink –o link1.o
nvcc c.o d.o –dlink –o link2.o
g++ a.o b.o c.o d.o link1.o link2.o
Useful when separate code sections
Similar to how we previously allowed multiple device modules in
a single host executable (x.cu and y.cu)
If library writer wants to device-link some code together, then user
can still invoke device linker on own code
Can reduce resource requirements, e.g. if function pointers then
may assume that code from another section is reached, and thus
require more registers than really needed
Compatibility warning

Current 5.0 linker will not JIT to future architectures

SASS is linked, not PTX
PTX can be input to linker, but is first compiled to SASS then linked
Must relink objects for each architecture
nvcc –arch=compute_20 –code=sm_20,sm_30
Will support JIT linking in future release
Summary

Separate Compilation of device code is supported in

CUDA 5.0
Eases porting
Incremental Recompilation
Library Support
For more info, see “Using Separate Compilation in
CUDA” section at end of NVCC document.
nvcc compile

a.cu

Frontend

Device code Host code

Device
Compiler
Host
Fatbinary
Compiler

a.o
nvcc separate compilation and link
a.cu b.cu c.cpp

Frontend Frontend

Device code Host code Device code Host code

Device Compiler Device Compiler

Fatbinary Host Compiler Fatbinary Host Compiler Host Compiler

a.o Device Linker b.o c.o

dlink.o

Host Linker

Executable

Report Compiler
No ratings yet
Report Compiler
9 pages
clustering
No ratings yet
clustering
1 page
Unit 5 - CUDA Architecture
No ratings yet
Unit 5 - CUDA Architecture
17 pages
CUDA_Binary_Utilities
No ratings yet
CUDA_Binary_Utilities
32 pages
Introduction To CUDA C
No ratings yet
Introduction To CUDA C
67 pages
cuuda nvidai guide_Part2
No ratings yet
cuuda nvidai guide_Part2
15 pages
3-computation
No ratings yet
3-computation
28 pages
Huber a CPlusPlus Toolchain for Your GPU
No ratings yet
Huber a CPlusPlus Toolchain for Your GPU
24 pages
Lecture3 Fundamentals of CUDA(Part1)_2025
No ratings yet
Lecture3 Fundamentals of CUDA(Part1)_2025
52 pages
em_rep_final
No ratings yet
em_rep_final
11 pages
Lec 03
No ratings yet
Lec 03
19 pages
2-Computation
No ratings yet
2-Computation
15 pages
CUDA_Building_2019.05.29.
No ratings yet
CUDA_Building_2019.05.29.
15 pages
006 Mold Slides
No ratings yet
006 Mold Slides
44 pages
CUDA Compiler Driver NVCC
No ratings yet
CUDA Compiler Driver NVCC
68 pages
4. CUDA Programming
No ratings yet
4. CUDA Programming
35 pages
11 - Separate Compilation
No ratings yet
11 - Separate Compilation
14 pages
21.L18 Intro To GPU and CUDA C
No ratings yet
21.L18 Intro To GPU and CUDA C
89 pages
04 IntroductionGPUsCUDA
No ratings yet
04 IntroductionGPUsCUDA
25 pages
Introduction To CUDA C 3
No ratings yet
Introduction To CUDA C 3
67 pages
GPUMod 2
No ratings yet
GPUMod 2
64 pages
Cuda Binary Utilities: Application Note
No ratings yet
Cuda Binary Utilities: Application Note
41 pages
CUDAProgModel
No ratings yet
CUDAProgModel
24 pages
CUDA Programming Basic: High Performance Computing Center Hanoi University of Science & Technology
No ratings yet
CUDA Programming Basic: High Performance Computing Center Hanoi University of Science & Technology
38 pages
Cbuild
No ratings yet
Cbuild
22 pages
GPU_Programming_slides_2
No ratings yet
GPU_Programming_slides_2
37 pages
1 Cuda
100% (1)
1 Cuda
173 pages
Loading Pandas
No ratings yet
Loading Pandas
23 pages
Cuda Compiler Driver NVCC: Reference Guide
No ratings yet
Cuda Compiler Driver NVCC: Reference Guide
43 pages
CUDA C Programming Guide
No ratings yet
CUDA C Programming Guide
173 pages
KNN in python
No ratings yet
KNN in python
11 pages
Gpucc: An Open-Source GPGPU Compiler
No ratings yet
Gpucc: An Open-Source GPGPU Compiler
12 pages
CUDA Binary Utilities
No ratings yet
CUDA Binary Utilities
36 pages
_08ClassBasic_v1
No ratings yet
_08ClassBasic_v1
46 pages
The Cuda Compiler Driver NVCC: Last Modified On
No ratings yet
The Cuda Compiler Driver NVCC: Last Modified On
39 pages
10ClusBasic Editted v1
No ratings yet
10ClusBasic Editted v1
41 pages
ACA Unit3 Revised
No ratings yet
ACA Unit3 Revised
53 pages
_03Preprocessing
No ratings yet
_03Preprocessing
60 pages
subdivision
No ratings yet
subdivision
5 pages
NVCC 1.1
No ratings yet
NVCC 1.1
30 pages
Stagnation Properties
67% (3)
Stagnation Properties
25 pages
HPC Final 4-8
No ratings yet
HPC Final 4-8
25 pages
2112.10318
No ratings yet
2112.10318
34 pages
Cuda C
No ratings yet
Cuda C
70 pages
Chapter 5 - General Purpose PGPU, CUDA
No ratings yet
Chapter 5 - General Purpose PGPU, CUDA
70 pages
Compiling Programs
No ratings yet
Compiling Programs
20 pages
GPU Basics
No ratings yet
GPU Basics
93 pages
بارگذاری فایل
No ratings yet
بارگذاری فایل
2 pages
02 CUDA Shared Memory
No ratings yet
02 CUDA Shared Memory
21 pages
01 Laurie Stephey
No ratings yet
01 Laurie Stephey
14 pages
CUDA Binary Utilities
No ratings yet
CUDA Binary Utilities
32 pages
1.10. Decision Trees — scikit-learn 0.24.1 documentation
No ratings yet
1.10. Decision Trees — scikit-learn 0.24.1 documentation
10 pages
rCUDA Guide
No ratings yet
rCUDA Guide
13 pages
MCA 2023 Syllabus - 27-10-2023
No ratings yet
MCA 2023 Syllabus - 27-10-2023
107 pages
Linking and Loading
No ratings yet
Linking and Loading
23 pages
Introduccion CUDA C
No ratings yet
Introduccion CUDA C
51 pages
NVCC - CUDA Toolkit Documentation
No ratings yet
NVCC - CUDA Toolkit Documentation
1 page
Cuda Talk
100% (1)
Cuda Talk
82 pages
CuPrintf Readme
No ratings yet
CuPrintf Readme
6 pages
Lecture 11 Programming On Gpus Part 1 Zxu2acms60212 40212 S15lec 11 Gpupdf
No ratings yet
Lecture 11 Programming On Gpus Part 1 Zxu2acms60212 40212 S15lec 11 Gpupdf
121 pages
02Data Edited v2
No ratings yet
02Data Edited v2
43 pages
04 Systems Programming-Preprocessing Compilation Linking
No ratings yet
04 Systems Programming-Preprocessing Compilation Linking
29 pages
CUDA 6.0: Acknowledgements
No ratings yet
CUDA 6.0: Acknowledgements
13 pages
Chapter 3 Statistic
No ratings yet
Chapter 3 Statistic
8 pages
4 - Key Concepts
No ratings yet
4 - Key Concepts
2 pages
Cuda 9 and Beyond
100% (1)
Cuda 9 and Beyond
45 pages
CUDA Compute Unified Device Architecture
No ratings yet
CUDA Compute Unified Device Architecture
26 pages
Linkers and Loaders
No ratings yet
Linkers and Loaders
8 pages
8 Cud A 1
No ratings yet
8 Cud A 1
38 pages
Weight Clamp Drawing
No ratings yet
Weight Clamp Drawing
1 page
Marc-Andre Giroux - Production Ready GraphQL (2020) PDF
No ratings yet
Marc-Andre Giroux - Production Ready GraphQL (2020) PDF
186 pages
лк CUDA - 1 PDCn
No ratings yet
лк CUDA - 1 PDCn
31 pages
Comments Compliance Sheet For The Calculation Report Ref.6
No ratings yet
Comments Compliance Sheet For The Calculation Report Ref.6
4 pages
Introduction To Gpu Programming With Cuda and Openacc
100% (1)
Introduction To Gpu Programming With Cuda and Openacc
40 pages
Laboratory Activity 2 - Bisection Method
No ratings yet
Laboratory Activity 2 - Bisection Method
7 pages
Elementary Probability Theory For CS648A
No ratings yet
Elementary Probability Theory For CS648A
19 pages
Introduction To Telephoney: R.K.Gupta de (Training)
No ratings yet
Introduction To Telephoney: R.K.Gupta de (Training)
43 pages
Reading - Bành Ngân Hà
No ratings yet
Reading - Bành Ngân Hà
12 pages
Cuda C/C++ Basics: NVIDIA Corporation
No ratings yet
Cuda C/C++ Basics: NVIDIA Corporation
67 pages
Concepts of Fields
No ratings yet
Concepts of Fields
21 pages
The Cubic Quest Isaam Wafidur Rahman Class 9 Sunnydale
No ratings yet
The Cubic Quest Isaam Wafidur Rahman Class 9 Sunnydale
4 pages
ISO 8501eng
No ratings yet
ISO 8501eng
12 pages
Linked List Cheat Sheet
No ratings yet
Linked List Cheat Sheet
3 pages
07-Huawei EDesigner & SCT Tools Pre-Sales Training V1.7-Qian Wei
No ratings yet
07-Huawei EDesigner & SCT Tools Pre-Sales Training V1.7-Qian Wei
42 pages
Dtic Ada128624
No ratings yet
Dtic Ada128624
212 pages
Volume Correction Factors - Diesel Fuel: Issued: July 2018
No ratings yet
Volume Correction Factors - Diesel Fuel: Issued: July 2018
5 pages
Std1090-07 Chapter 14 Below The Hook Lifting Devices
No ratings yet
Std1090-07 Chapter 14 Below The Hook Lifting Devices
30 pages
Troubles of Lakshmi
No ratings yet
Troubles of Lakshmi
7 pages
Important Questions BEEM
No ratings yet
Important Questions BEEM
5 pages
Update to Modern C++
From Everand
Update to Modern C++
James Raynard
No ratings yet
Programming Gpus With Cuda: John Mellor-Crummey
No ratings yet
Programming Gpus With Cuda: John Mellor-Crummey
42 pages
Producer Distributor: Suppliers' Suppliers Direct Suppliers Final Customers
No ratings yet
Producer Distributor: Suppliers' Suppliers Direct Suppliers Final Customers
20 pages
Bhu1101 Lecture 3 Notes
No ratings yet
Bhu1101 Lecture 3 Notes
5 pages
Getting Started With CUDA Samples
No ratings yet
Getting Started With CUDA Samples
9 pages
Linux DevOps Tools Engineer (701) Practice Tests: 400 Questions to Ace Your Certification
From Everand
Linux DevOps Tools Engineer (701) Practice Tests: 400 Questions to Ace Your Certification
Steve Brown
No ratings yet
Top-Down Processing Mediated by Interareal Synchronization: Astrid Von Stein, Carl Chiang, and Peter Ko Nig
No ratings yet
Top-Down Processing Mediated by Interareal Synchronization: Astrid Von Stein, Carl Chiang, and Peter Ko Nig
6 pages
MMA by The Numbers
No ratings yet
MMA by The Numbers
11 pages
LF 351
No ratings yet
LF 351
14 pages
Dynamics Multiple Choice-2012!02!13
No ratings yet
Dynamics Multiple Choice-2012!02!13
8 pages
Fisher EU and EW Valve
No ratings yet
Fisher EU and EW Valve
20 pages
How To Store Any File Into SQL Database
No ratings yet
How To Store Any File Into SQL Database
15 pages
Low Speed Wind Tunnel Testing 20170519145206
No ratings yet
Low Speed Wind Tunnel Testing 20170519145206
1 page
1500kVA-3 Phase LSA 50.2 L8 - 440V - 60Hz
No ratings yet
1500kVA-3 Phase LSA 50.2 L8 - 440V - 60Hz
2 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

gpu-object-linking

Uploaded by

gpu-object-linking

Uploaded by

Separate Compilation in

Include files together to build

Earlier CUDA required single source file for a single kernel

a.cu b.cu c.cu

a.o b.o c.o

Separate compilation allows building independent object files

CUDA 5 can link multiple object files into one program

Eases porting code

Incremental compilation reduces build time

Can create and use 3rd party libraries

Can combine object files into static libraries

Can combine object files into static libraries

Facilitates code reuse, reduces compile time

Enables closed-source device +

SM_2x and above (Fermi & Kepler, no support for sm_1x)

Can link static host libraries (.a,.lib) that contain

nvcc –arch=sm_20 –dc *.cu

If want to use host linker:

Can do multiple device links within a single host executable

Current 5.0 linker will not JIT to future architectures

Separate Compilation of device code is supported in

Device code Host code

Device code Host code Device code Host code

Device Compiler Device Compiler

Fatbinary Host Compiler Fatbinary Host Compiler Host Compiler

a.o Device Linker b.o c.o

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.