-
Notifications
You must be signed in to change notification settings - Fork 24.7k
[ROCm] Improve Type Safety of C10_WARP_SIZE #158271
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/158271
Note: Links to docs will display an error until the docs builds have been completed. ✅ You can merge normally! (1 Unrelated Failure)As of commit 3b0d6ba with merge base 6e07d6a ( UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
lint error due to broken main |
@pytorchbot rebase |
@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here |
Successfully rebased |
65374b3
to
6c4c465
Compare
6c4c465
to
2b83205
Compare
Move the change to |
@pytorchbot rebase |
@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here |
Successfully rebased |
2b83205
to
57b83f2
Compare
@pytorchbot rebase |
@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here |
If compiling with HIPCC (i.e `__HIPCC__` is [defined](https://rocm.docs.amd.com/projects/HIP/en/docs-develop/how-to/hip_porting_guide.html#compiler-defines-summary)): * Define `C10_WARP_SIZE` to be non-constexpr `at::cuda::warp_size()` for host-compilation pass (as compared to `static constexpr int C10_WARP_SIZE = 1;` set in 538a57d) * Define `C10_WARP_SIZE` to be constexpr `64` for `__GFX9__`, and `32` otherwise, for device-compilation pass If not compiling with HIPCC: * Define `C10_WARP_SIZE` to be non-constexpr `at::cuda::warp_size()` For host-compilation cases where we need a constexpr value of warp size (eg. launch bounds), use `C10_WARP_SIZE_STATIC`, defined as `64` (Better to err on 64 for launch bounds) Fixes SWDEV-542227 --------- Co-authored-by: Jithun Nair <37884920+jithunnair-amd@users.noreply.github.com>
Successfully rebased |
57b83f2
to
3b0d6ba
Compare
Background
The
C10_WARP_SIZE
, although always be32
on CUDA platform, varies across different AMD GPUs.Therefore, to correctly refer this value, the host code must be a variable instead of a literal defined by macro, or a
constexpr int
.This PR may cause more compiler errors for third party code on AMD GPU, which is intentional. Having a fixed
C10_WARP_SIZE
value on host code for AMD GPU only defers compile time error to runtime.This PR is recommended to be included as part of Release Notes to describe an API change for whoever uses this macro.
Users are recommended to use
C10_WARP_SIZE
directly, which adapts for various scenarios, or define a macro to useC10_WARP_SIZE
. Assignment of this macro to symbols shared by host/device code causes problems on ROCM platform. (See the fix ataten/src/ATen/native/cuda/layer_norm_kernel.cu
for a concrete example)Behaviors
defined(__HIPCC__)
):C10_WARP_SIZE
to be non-constexpr
at::cuda::warp_size()
for host-compilation pass (as compared tostatic constexpr int C10_WARP_SIZE = 1;
set in 04bd7e6)C10_WARP_SIZE
to be a function returningconstexpr int
64
for__GFX9__
, and32
otherwise, for device-compilation pass__GFX8__
is also 64 but we do not support any GFX8 GPU.C10_WARP_SIZE
to be non-constexprat::cuda::warp_size()
constexpr
variant for host codeFor host-compilation cases where a
constexpr
value is needed for warp size (eg. launch bounds), useC10_WARP_SIZE_STATIC
, which is defined as64
. This macro follows the pre 04bd7e6 behavior ofC10_WARP_SIZE
cc @jeffdaily @sunway513 @jithunnair-amd @pruthvistony @ROCmSupport @dllehr-amd @jataylo @hongxiayang @naromero77amd