Skip to content

[ROCm] Improve Type Safety of C10_WARP_SIZE #158271

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

xinyazhang
Copy link
Collaborator

@xinyazhang xinyazhang commented Jul 14, 2025

Background

The C10_WARP_SIZE, although always be 32 on CUDA platform, varies across different AMD GPUs.
Therefore, to correctly refer this value, the host code must be a variable instead of a literal defined by macro, or a constexpr int.

This PR may cause more compiler errors for third party code on AMD GPU, which is intentional. Having a fixed C10_WARP_SIZE value on host code for AMD GPU only defers compile time error to runtime.

This PR is recommended to be included as part of Release Notes to describe an API change for whoever uses this macro.

Users are recommended to use C10_WARP_SIZE directly, which adapts for various scenarios, or define a macro to use C10_WARP_SIZE. Assignment of this macro to symbols shared by host/device code causes problems on ROCM platform. (See the fix at aten/src/ATen/native/cuda/layer_norm_kernel.cu for a concrete example)

Behaviors

  • If compiling with HIPCC (i.e defined(__HIPCC__)):
    • Define C10_WARP_SIZE to be non-constexpr at::cuda::warp_size() for host-compilation pass (as compared to static constexpr int C10_WARP_SIZE = 1; set in 04bd7e6)
    • Define C10_WARP_SIZE to be a function returning constexpr int 64 for __GFX9__, and 32 otherwise, for device-compilation pass
      • __GFX8__ is also 64 but we do not support any GFX8 GPU.
  • If not compiling with HIPCC:
    • Define C10_WARP_SIZE to be non-constexpr at::cuda::warp_size()

constexpr variant for host code

For host-compilation cases where a constexpr value is needed for warp size (eg. launch bounds), use C10_WARP_SIZE_STATIC, which is defined as 64. This macro follows the pre 04bd7e6 behavior of C10_WARP_SIZE

cc @jeffdaily @sunway513 @jithunnair-amd @pruthvistony @ROCmSupport @dllehr-amd @jataylo @hongxiayang @naromero77amd

Copy link

pytorch-bot bot commented Jul 14, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/158271

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (1 Unrelated Failure)

As of commit 3b0d6ba with merge base 6e07d6a (image):

UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-bot pytorch-bot bot added module: rocm AMD GPU support for Pytorch release notes: sparse release notes category labels Jul 14, 2025
@xinyazhang
Copy link
Collaborator Author

lint error due to broken main

@jithunnair-amd jithunnair-amd changed the title [ROCm] Improve Type Safty of C10_WARP_SIZE [ROCm] Improve Type Safety of C10_WARP_SIZE Jul 14, 2025
@pytorch-bot pytorch-bot bot added ciflow/rocm Trigger "default" config CI on ROCm and removed ciflow/rocm Trigger "default" config CI on ROCm labels Jul 14, 2025
@jeffdaily jeffdaily marked this pull request as ready for review July 15, 2025 22:46
@jeffdaily jeffdaily requested review from eqy and syed-ahmed as code owners July 15, 2025 22:46
@jithunnair-amd jithunnair-amd added release notes: rocm mandatorylabel release notes: cuda release notes category and removed release notes: sparse release notes category labels Jul 16, 2025
@xinyazhang
Copy link
Collaborator Author

@pytorchbot rebase

@pytorchmergebot
Copy link
Collaborator

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

@pytorchmergebot
Copy link
Collaborator

Successfully rebased xinyazhang/new_c10_warp_size onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout xinyazhang/new_c10_warp_size && git pull --rebase)

@pytorchmergebot pytorchmergebot force-pushed the xinyazhang/new_c10_warp_size branch from 65374b3 to 6c4c465 Compare July 16, 2025 16:13
@xinyazhang xinyazhang force-pushed the xinyazhang/new_c10_warp_size branch from 6c4c465 to 2b83205 Compare July 16, 2025 20:30
@xinyazhang
Copy link
Collaborator Author

Move the change to torch/headeronly/macros/Macros.h

@jeffdaily jeffdaily added ciflow/rocm Trigger "default" config CI on ROCm and removed release notes: cuda release notes category labels Jul 16, 2025
@xinyazhang
Copy link
Collaborator Author

@pytorchbot rebase

@pytorchmergebot
Copy link
Collaborator

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

@pytorchmergebot
Copy link
Collaborator

Successfully rebased xinyazhang/new_c10_warp_size onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout xinyazhang/new_c10_warp_size && git pull --rebase)

@pytorchmergebot pytorchmergebot force-pushed the xinyazhang/new_c10_warp_size branch from 2b83205 to 57b83f2 Compare July 18, 2025 16:08
@pytorch-bot pytorch-bot bot removed the ciflow/rocm Trigger "default" config CI on ROCm label Jul 18, 2025
@jeffdaily jeffdaily added the ciflow/rocm Trigger "default" config CI on ROCm label Jul 18, 2025
@jithunnair-amd jithunnair-amd added ciflow/periodic Trigger jobs ran periodically on master (periodic.yml) on the PR ciflow/rocm-mi300 Trigger "default" config CI on ROCm MI300 ciflow/periodic-rocm-mi300 Trigger "distributed" config CI on ROCm MI300 ciflow/inductor-rocm Trigger "inductor" config CI on ROCm labels Jul 18, 2025
@jeffdaily
Copy link
Collaborator

@pytorchbot rebase

@pytorchmergebot
Copy link
Collaborator

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

xinyazhang and others added 2 commits July 18, 2025 23:36
If compiling with HIPCC (i.e `__HIPCC__` is
[defined](https://rocm.docs.amd.com/projects/HIP/en/docs-develop/how-to/hip_porting_guide.html#compiler-defines-summary)):
* Define `C10_WARP_SIZE` to be non-constexpr `at::cuda::warp_size()` for
host-compilation pass (as compared to `static constexpr int
C10_WARP_SIZE = 1;` set in
538a57d)
* Define `C10_WARP_SIZE` to be constexpr `64` for `__GFX9__`, and `32`
otherwise, for device-compilation pass

If not compiling with HIPCC:
* Define `C10_WARP_SIZE` to be non-constexpr `at::cuda::warp_size()`

For host-compilation cases where we need a constexpr value of warp size
(eg. launch bounds), use `C10_WARP_SIZE_STATIC`, defined as `64` (Better
to err on 64 for launch bounds)

Fixes SWDEV-542227

---------

Co-authored-by: Jithun Nair <37884920+jithunnair-amd@users.noreply.github.com>
@pytorchmergebot
Copy link
Collaborator

Successfully rebased xinyazhang/new_c10_warp_size onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout xinyazhang/new_c10_warp_size && git pull --rebase)

@pytorchmergebot pytorchmergebot force-pushed the xinyazhang/new_c10_warp_size branch from 57b83f2 to 3b0d6ba Compare July 18, 2025 23:36
@pytorch-bot pytorch-bot bot removed ciflow/periodic Trigger jobs ran periodically on master (periodic.yml) on the PR ciflow/rocm Trigger "default" config CI on ROCm ciflow/inductor-rocm Trigger "inductor" config CI on ROCm ciflow/rocm-mi300 Trigger "default" config CI on ROCm MI300 ciflow/periodic-rocm-mi300 Trigger "distributed" config CI on ROCm MI300 labels Jul 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module: rocm AMD GPU support for Pytorch open source release notes: rocm mandatorylabel
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy