-
Notifications
You must be signed in to change notification settings - Fork 24.7k
[MPS] Add native strided API for MPSNDArray starting with macOS 15 #128393
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MPS] Add native strided API for MPSNDArray starting with macOS 15 #128393
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/128393
Note: Links to docs will display an error until the docs builds have been completed. ✅ You can merge normally! (2 Unrelated Failures)As of commit 132fb66 with merge base 63e5b09 ( FLAKY - The following jobs failed but were likely due to flakiness present on trunk:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
e3617e7
to
0064862
Compare
54c057f
to
a0a2517
Compare
a0a2517
to
e7da1c1
Compare
e7da1c1
to
80733f5
Compare
cc98e6a
to
beb4421
Compare
df6b189
to
1f343ba
Compare
@pytorchbot merge |
This PR needs to be approved by an authorized maintainer before merge. |
@pytorchbot merge |
This PR needs to be approved by an authorized maintainer before merge. |
test/test_mps.py
Outdated
@@ -3543,6 +3547,116 @@ def test_slice(self): | |||
mps_slice4 = mps_x[1, :].to('cpu') | |||
self.assertEqual(cpu_slice4, mps_slice4) | |||
|
|||
def test_slice_reshape_view_api_test_1(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These tests seems to have a lot of things in common.
- are they not covered by OpInfo based testing already?
- If not, should they be refactored such that they have a lot less boiler plate and consistent output metadata checking?
- While making them generic, you can use parametrize decorator to make them run on all dtypes (avoiding int-specific tests)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These tests are needed in addition to the opInfo tests, I'll parametrize them as you recommended
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've updated the tests, but I can't get parametrize to work. It keeps returning a missing argument error
@@ -608,17 +608,20 @@ Tensor index_select_mps(const Tensor& self, int64_t dim, const Tensor& index) { | |||
newCachedGraph->outputTensor_ = outputTensor; | |||
}); | |||
|
|||
// MPS TODO: MPS Gather is failing with MPS strided API. Fallback to old gather. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are a couple TODOs lying around, should we open issues to track them down?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will update with the issue number
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok!
This essentially undoes large skips on everything but MacOS sequioia to nn.modules made by #128393 Instead it uses existing `xfail`, but guards it on `_macos15_or_newer` boolean
This essentially undoes large skips on everything but MacOS Sequoia to nn.modules made by #128393 Instead it uses existing `xfail`, but guards it on `_macos15_or_newer` boolean Before the change if run on MacOS 14: ``` % python3 ../test/test_modules.py -v -k Hardswish 2>&1|tail -n3 Ran 57 tests in 0.053s OK (skipped=32) ``` After ``` % python3 ../test/test_modules.py -v -k Hardswish 2>&1|tail -n3 Ran 57 tests in 0.229s OK (skipped=10, expected failures=2) ``` Pull Request resolved: #134858 Approved by: https://github.com/janeyx99
The issue reported in #135223 was already solved in #128393. This PR adds a regression test for it. Fixes #135223 Pull Request resolved: #135440 Approved by: https://github.com/ezyang
This essentially undoes large skips on everything but MacOS Sequoia to nn.modules made by pytorch#128393 Instead it uses existing `xfail`, but guards it on `_macos15_or_newer` boolean Before the change if run on MacOS 14: ``` % python3 ../test/test_modules.py -v -k Hardswish 2>&1|tail -n3 Ran 57 tests in 0.053s OK (skipped=32) ``` After ``` % python3 ../test/test_modules.py -v -k Hardswish 2>&1|tail -n3 Ran 57 tests in 0.229s OK (skipped=10, expected failures=2) ``` Pull Request resolved: pytorch#134858 Approved by: https://github.com/janeyx99
The issue reported in pytorch#135223 was already solved in pytorch#128393. This PR adds a regression test for it. Fixes pytorch#135223 Pull Request resolved: pytorch#135440 Approved by: https://github.com/ezyang
[MPS] Add regression test for `fft.fftfreq` (#135440) The issue reported in #135223 was already solved in #128393. This PR adds a regression test for it. Fixes #135223 Pull Request resolved: #135440 Approved by: https://github.com/ezyang (cherry picked from commit 09287e3) Co-authored-by: Roy Hvaara <roy@lightyear.no>
…`nn.Conv3d` (#141780) When the input tensor to Conv3d is in the channels_last_3d memory format the Conv3d op will generate incorrect output (see example image in #141471). This PR checks if the op is 3d, and then attempts to convert the input tensor to contiguous. Added a regression test that verifies the output by running the same op on the CPU. I'm unsure if Conv3d supports the channels last memory format after #128393. If it does, we should consider updating the logic to utilize this as it would be more efficient. Perhaps @DenisVieriu97 knows or has more context? Fixes #141471 Pull Request resolved: #141780 Approved by: https://github.com/malfet
…`nn.Conv3d` (pytorch#141780) When the input tensor to Conv3d is in the channels_last_3d memory format the Conv3d op will generate incorrect output (see example image in pytorch#141471). This PR checks if the op is 3d, and then attempts to convert the input tensor to contiguous. Added a regression test that verifies the output by running the same op on the CPU. I'm unsure if Conv3d supports the channels last memory format after pytorch#128393. If it does, we should consider updating the logic to utilize this as it would be more efficient. Perhaps @DenisVieriu97 knows or has more context? Fixes pytorch#141471 Pull Request resolved: pytorch#141780 Approved by: https://github.com/malfet
Caused by #128393 that change semantic of `needsGather`, which resulted in silent correctness errors on MacOS-15+ if output tensor is non-contiguous Fixes #145203 Pull Request resolved: #146085 Approved by: https://github.com/dcci
Add support for native strides in MPS starting with macOS Sequoia. This will get rid of the additional gather and scatter operations needed to solve the strides or storage offsets of the tensors.
Summary of changes (starting with macOS 15):
OSes older than macOS 15 will run the old gather/scatter code path to solve strides/storage offsets.
Couple performance stats collected from torchbench comparing macOS 15 vs macOS 14:
cc @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10