[ROCm] Improve Type Safety of C10_WARP_SIZE #158271

xinyazhang · 2025-07-14T20:12:44Z

Background

The C10_WARP_SIZE, although always be 32 on CUDA platform, varies across different AMD GPUs.
Therefore, to correctly refer this value, the host code must be a variable instead of a literal defined by macro, or a constexpr int.

This PR may cause more compiler errors for third party code on AMD GPU, which is intentional. Having a fixed C10_WARP_SIZE value on host code for AMD GPU only defers compile time error to runtime.

This PR is recommended to be included as part of Release Notes to describe an API change for whoever uses this macro.

Users are recommended to use C10_WARP_SIZE directly, which adapts for various scenarios, or define a macro to use C10_WARP_SIZE. Assignment of this macro to symbols shared by host/device code causes problems on ROCM platform. (See the fix at aten/src/ATen/native/cuda/layer_norm_kernel.cu for a concrete example)

Behaviors

If compiling with HIPCC (i.e defined(__HIPCC__)):
- Define C10_WARP_SIZE to be non-constexpr at::cuda::warp_size() for host-compilation pass (as compared to static constexpr int C10_WARP_SIZE = 1; set in 04bd7e6)
- Define C10_WARP_SIZE to be a function returning constexpr int 64 for __GFX9__, and 32 otherwise, for device-compilation pass
  - __GFX8__ is also 64 but we do not support any GFX8 GPU.
If not compiling with HIPCC:
- Define C10_WARP_SIZE to be non-constexpr at::cuda::warp_size()

`constexpr` variant for host code

For host-compilation cases where a constexpr value is needed for warp size (eg. launch bounds), use C10_WARP_SIZE_STATIC, which is defined as 64. This macro follows the pre 04bd7e6 behavior of C10_WARP_SIZE

cc @jeffdaily @sunway513 @jithunnair-amd @pruthvistony @ROCmSupport @dllehr-amd @jataylo @hongxiayang @naromero77amd

pytorch-bot · 2025-07-14T20:12:48Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/158271

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (1 Unrelated Failure)

As of commit 3b0d6ba with merge base 6e07d6a ():

UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:

pull / cuda12.8-py3.10-gcc9-sm75 / test (pr_time_benchmarks, 1, 1, linux.g4dn.metal.nvidia.gpu, unstable) (gh) (#153987)
MISSING REGRESSION TEST

This comment was automatically generated by Dr. CI and updates every 15 minutes.

xinyazhang · 2025-07-14T20:50:20Z

lint error due to broken main

c10/macros/Macros.h

xinyazhang · 2025-07-16T16:11:54Z

@pytorchbot rebase

pytorchmergebot · 2025-07-16T16:13:43Z

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

pytorchmergebot · 2025-07-16T16:13:47Z

Successfully rebased xinyazhang/new_c10_warp_size onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout xinyazhang/new_c10_warp_size && git pull --rebase)

xinyazhang · 2025-07-16T20:31:53Z

Move the change to torch/headeronly/macros/Macros.h

xinyazhang · 2025-07-18T16:06:26Z

@pytorchbot rebase

pytorchmergebot · 2025-07-18T16:07:59Z

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

pytorchmergebot · 2025-07-18T16:08:02Z

Successfully rebased xinyazhang/new_c10_warp_size onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout xinyazhang/new_c10_warp_size && git pull --rebase)

jeffdaily · 2025-07-18T23:35:22Z

@pytorchbot rebase

pytorchmergebot · 2025-07-18T23:36:53Z

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

If compiling with HIPCC (i.e `__HIPCC__` is [defined](https://rocm.docs.amd.com/projects/HIP/en/docs-develop/how-to/hip_porting_guide.html#compiler-defines-summary)): * Define `C10_WARP_SIZE` to be non-constexpr `at::cuda::warp_size()` for host-compilation pass (as compared to `static constexpr int C10_WARP_SIZE = 1;` set in 538a57d) * Define `C10_WARP_SIZE` to be constexpr `64` for `__GFX9__`, and `32` otherwise, for device-compilation pass If not compiling with HIPCC: * Define `C10_WARP_SIZE` to be non-constexpr `at::cuda::warp_size()` For host-compilation cases where we need a constexpr value of warp size (eg. launch bounds), use `C10_WARP_SIZE_STATIC`, defined as `64` (Better to err on 64 for launch bounds) Fixes SWDEV-542227 --------- Co-authored-by: Jithun Nair <37884920+jithunnair-amd@users.noreply.github.com>

pytorchmergebot · 2025-07-18T23:36:56Z

Successfully rebased xinyazhang/new_c10_warp_size onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout xinyazhang/new_c10_warp_size && git pull --rebase)

pytorch-bot bot added module: rocm AMD GPU support for Pytorch release notes: sparse release notes category labels Jul 14, 2025

xinyazhang requested review from jeffdaily and jithunnair-amd July 14, 2025 20:12

pytorchbot added the open source label Jul 14, 2025

jeffdaily reviewed Jul 14, 2025

View reviewed changes

c10/macros/Macros.h Outdated Show resolved Hide resolved

jeffdaily approved these changes Jul 14, 2025

View reviewed changes

jithunnair-amd changed the title ~~[ROCm] Improve Type Safty of C10_WARP_SIZE~~ [ROCm] Improve Type Safety of C10_WARP_SIZE Jul 14, 2025

pytorch-bot bot added ciflow/rocm Trigger "default" config CI on ROCm and removed ciflow/rocm Trigger "default" config CI on ROCm labels Jul 14, 2025

jeffdaily marked this pull request as ready for review July 15, 2025 22:46

jeffdaily requested review from eqy and syed-ahmed as code owners July 15, 2025 22:46

jithunnair-amd added release notes: rocm mandatorylabel release notes: cuda release notes category and removed release notes: sparse release notes category labels Jul 16, 2025

pytorchmergebot force-pushed the xinyazhang/new_c10_warp_size branch from 65374b3 to 6c4c465 Compare July 16, 2025 16:13

xinyazhang force-pushed the xinyazhang/new_c10_warp_size branch from 6c4c465 to 2b83205 Compare July 16, 2025 20:30

jeffdaily added ciflow/rocm Trigger "default" config CI on ROCm and removed release notes: cuda release notes category labels Jul 16, 2025

pytorchmergebot force-pushed the xinyazhang/new_c10_warp_size branch from 2b83205 to 57b83f2 Compare July 18, 2025 16:08

pytorch-bot bot removed the ciflow/rocm Trigger "default" config CI on ROCm label Jul 18, 2025

jeffdaily added the ciflow/rocm Trigger "default" config CI on ROCm label Jul 18, 2025

xinyazhang and others added 2 commits July 18, 2025 23:36

static inline

3b0d6ba

pytorchmergebot force-pushed the xinyazhang/new_c10_warp_size branch from 57b83f2 to 3b0d6ba Compare July 18, 2025 23:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[ROCm] Improve Type Safety of C10_WARP_SIZE #158271

[ROCm] Improve Type Safety of C10_WARP_SIZE #158271

xinyazhang commented Jul 14, 2025 •

edited

Loading

Uh oh!

pytorch-bot bot commented Jul 14, 2025 •

edited

Loading

Uh oh!

xinyazhang commented Jul 14, 2025

Uh oh!

Uh oh!

xinyazhang commented Jul 16, 2025

Uh oh!

pytorchmergebot commented Jul 16, 2025

Uh oh!

pytorchmergebot commented Jul 16, 2025

Uh oh!

xinyazhang commented Jul 16, 2025

Uh oh!

xinyazhang commented Jul 18, 2025

Uh oh!

pytorchmergebot commented Jul 18, 2025

Uh oh!

pytorchmergebot commented Jul 18, 2025

Uh oh!

jeffdaily commented Jul 18, 2025

Uh oh!

pytorchmergebot commented Jul 18, 2025

Uh oh!

pytorchmergebot commented Jul 18, 2025

Uh oh!

Uh oh!

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

[ROCm] Improve Type Safety of C10_WARP_SIZE #158271

Are you sure you want to change the base?

[ROCm] Improve Type Safety of C10_WARP_SIZE #158271

Conversation

xinyazhang commented Jul 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Background

Behaviors

constexpr variant for host code

Uh oh!

pytorch-bot bot commented Jul 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/158271

✅ You can merge normally! (1 Unrelated Failure)

Uh oh!

xinyazhang commented Jul 14, 2025

Uh oh!

Uh oh!

xinyazhang commented Jul 16, 2025

Uh oh!

pytorchmergebot commented Jul 16, 2025

Uh oh!

pytorchmergebot commented Jul 16, 2025

Uh oh!

xinyazhang commented Jul 16, 2025

Uh oh!

xinyazhang commented Jul 18, 2025

Uh oh!

pytorchmergebot commented Jul 18, 2025

Uh oh!

pytorchmergebot commented Jul 18, 2025

Uh oh!

jeffdaily commented Jul 18, 2025

Uh oh!

pytorchmergebot commented Jul 18, 2025

Uh oh!

pytorchmergebot commented Jul 18, 2025

Uh oh!

Uh oh!

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

xinyazhang commented Jul 14, 2025 •

edited

Loading

`constexpr` variant for host code

pytorch-bot bot commented Jul 14, 2025 •

edited

Loading