Skip to content

[MPS] Add native strided API for MPSNDArray starting with macOS 15 #128393

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

DenisVieriu97
Copy link
Collaborator

@DenisVieriu97 DenisVieriu97 commented Jun 11, 2024

Add support for native strides in MPS starting with macOS Sequoia. This will get rid of the additional gather and scatter operations needed to solve the strides or storage offsets of the tensors.

Summary of changes (starting with macOS 15):

OSes older than macOS 15 will run the old gather/scatter code path to solve strides/storage offsets.


Couple performance stats collected from torchbench comparing macOS 15 vs macOS 14:

- test_train[functorch_maml_omniglot-mps]: 27% faster
- test_train[timm_vision_transformer-mps]: 12% faster
- test_train[hf_T5-mps]: 9.46% faster

cc @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10

Copy link

pytorch-bot bot commented Jun 11, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/128393

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (2 Unrelated Failures)

As of commit 132fb66 with merge base 63e5b09 (image):

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-bot pytorch-bot bot added ciflow/mps Run MPS tests (subset of trunk) module: cpu CPU specific problem (e.g., perf, algorithm) release notes: mps Release notes category labels Jun 11, 2024
@DenisVieriu97 DenisVieriu97 force-pushed the dev/denis/strided_mps_support branch from e3617e7 to 0064862 Compare June 11, 2024 13:18
@colesbury colesbury added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Jun 12, 2024
@DenisVieriu97 DenisVieriu97 force-pushed the dev/denis/strided_mps_support branch from 54c057f to a0a2517 Compare June 27, 2024 20:01
@skotapati
Copy link
Collaborator

@pytorchbot merge

Copy link

pytorch-bot bot commented Aug 6, 2024

This PR needs to be approved by an authorized maintainer before merge.

@skotapati
Copy link
Collaborator

@pytorchbot merge

Copy link

pytorch-bot bot commented Aug 7, 2024

This PR needs to be approved by an authorized maintainer before merge.

test/test_mps.py Outdated
@@ -3543,6 +3547,116 @@ def test_slice(self):
mps_slice4 = mps_x[1, :].to('cpu')
self.assertEqual(cpu_slice4, mps_slice4)

def test_slice_reshape_view_api_test_1(self):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These tests seems to have a lot of things in common.

  1. are they not covered by OpInfo based testing already?
  2. If not, should they be refactored such that they have a lot less boiler plate and consistent output metadata checking?
  3. While making them generic, you can use parametrize decorator to make them run on all dtypes (avoiding int-specific tests)

Copy link
Collaborator

@skotapati skotapati Aug 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These tests are needed in addition to the opInfo tests, I'll parametrize them as you recommended

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated the tests, but I can't get parametrize to work. It keeps returning a missing argument error

@@ -608,17 +608,20 @@ Tensor index_select_mps(const Tensor& self, int64_t dim, const Tensor& index) {
newCachedGraph->outputTensor_ = outputTensor;
});

// MPS TODO: MPS Gather is failing with MPS strided API. Fallback to old gather.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a couple TODOs lying around, should we open issues to track them down?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will update with the issue number

Copy link
Collaborator

@albanD albanD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok!

malfet added a commit that referenced this pull request Aug 30, 2024
This essentially undoes large skips on everything but MacOS sequioia to nn.modules made by #128393

Instead it uses existing `xfail`, but guards it on `_macos15_or_newer` boolean
pytorchmergebot pushed a commit that referenced this pull request Aug 31, 2024
This essentially undoes large skips on everything but MacOS Sequoia to nn.modules made by #128393

Instead it uses existing `xfail`, but guards it on `_macos15_or_newer` boolean

Before the change if run on MacOS 14:
```
 % python3 ../test/test_modules.py -v -k Hardswish 2>&1|tail -n3
Ran 57 tests in 0.053s

OK (skipped=32)
```
After
```
% python3 ../test/test_modules.py -v -k Hardswish 2>&1|tail -n3
Ran 57 tests in 0.229s

OK (skipped=10, expected failures=2)
```

Pull Request resolved: #134858
Approved by: https://github.com/janeyx99
pytorchmergebot pushed a commit that referenced this pull request Sep 9, 2024
The issue reported in #135223 was already solved in #128393. This PR adds a regression test for it.

Fixes #135223

Pull Request resolved: #135440
Approved by: https://github.com/ezyang
yushangdi pushed a commit that referenced this pull request Sep 12, 2024
The issue reported in #135223 was already solved in #128393. This PR adds a regression test for it.

Fixes #135223

Pull Request resolved: #135440
Approved by: https://github.com/ezyang
malfet pushed a commit to aditew01/pytorch that referenced this pull request Sep 13, 2024
…ytorch#128393)

Add support for native strides in MPS starting with macOS Sequoia. This will get rid of the additional gather and scatter operations needed to solve the strides or storage offsets of the tensors.

Summary of changes (starting with macOS 15):
- Add support for **MPS strided API** (strides/storage offsets etc):
   - [initWithBuffer:offset:descriptor:](https://developer.apple.com/documentation/metalperformanceshaders/mpsndarray/4391636-initwithbuffer?language=objc)
   - [arrayViewWithCommandBuffer:descriptor:aliasing:](https://developer.apple.com/documentation/metalperformanceshaders/mpsndarray/3114040-arrayviewwithcommandbuffer?language=objc)
   - [arrayViewWithShape:strides:](https://developer.apple.com/documentation/metalperformanceshaders/mpsndarray/4408694-arrayviewwithshape?language=objc)
   - [reshapeWithCommandBuffer:sourceArray:shape:destinationArray:](https://developer.apple.com/documentation/metalperformanceshaders/mpsndarrayidentity/4438557-reshapewithcommandbuffer?language=objc)
- Add native support for NHWC convolutions (without incurring any extra copy from NCHW -> NHWC -> NCHW).
- Add support for strided output buffers (previously we would create a contiguous buffer

OSes older than macOS 15 will run the old gather/scatter code path to solve strides/storage offsets.

---

Couple performance stats collected from torchbench comparing macOS 15 vs macOS 14:
```
- test_train[functorch_maml_omniglot-mps]: 27% faster
- test_train[timm_vision_transformer-mps]: 12% faster
- test_train[hf_T5-mps]: 9.46% faster
```

Pull Request resolved: pytorch#128393
Approved by: https://github.com/albanD

Co-authored-by: Siddharth Kotapati <skotapati@apple.com>
tolleybot pushed a commit to tolleybot/pytorch that referenced this pull request Sep 14, 2024
This essentially undoes large skips on everything but MacOS Sequoia to nn.modules made by pytorch#128393

Instead it uses existing `xfail`, but guards it on `_macos15_or_newer` boolean

Before the change if run on MacOS 14:
```
 % python3 ../test/test_modules.py -v -k Hardswish 2>&1|tail -n3
Ran 57 tests in 0.053s

OK (skipped=32)
```
After
```
% python3 ../test/test_modules.py -v -k Hardswish 2>&1|tail -n3
Ran 57 tests in 0.229s

OK (skipped=10, expected failures=2)
```

Pull Request resolved: pytorch#134858
Approved by: https://github.com/janeyx99
tolleybot pushed a commit to tolleybot/pytorch that referenced this pull request Sep 14, 2024
The issue reported in pytorch#135223 was already solved in pytorch#128393. This PR adds a regression test for it.

Fixes pytorch#135223

Pull Request resolved: pytorch#135440
Approved by: https://github.com/ezyang
Chao1Han pushed a commit to Chao1Han/pytorch that referenced this pull request Sep 20, 2024
This essentially undoes large skips on everything but MacOS Sequoia to nn.modules made by pytorch#128393

Instead it uses existing `xfail`, but guards it on `_macos15_or_newer` boolean

Before the change if run on MacOS 14:
```
 % python3 ../test/test_modules.py -v -k Hardswish 2>&1|tail -n3
Ran 57 tests in 0.053s

OK (skipped=32)
```
After
```
% python3 ../test/test_modules.py -v -k Hardswish 2>&1|tail -n3
Ran 57 tests in 0.229s

OK (skipped=10, expected failures=2)
```

Pull Request resolved: pytorch#134858
Approved by: https://github.com/janeyx99
Chao1Han pushed a commit to Chao1Han/pytorch that referenced this pull request Sep 20, 2024
The issue reported in pytorch#135223 was already solved in pytorch#128393. This PR adds a regression test for it.

Fixes pytorch#135223

Pull Request resolved: pytorch#135440
Approved by: https://github.com/ezyang
pytorchbot pushed a commit that referenced this pull request Oct 2, 2024
The issue reported in #135223 was already solved in #128393. This PR adds a regression test for it.

Fixes #135223

Pull Request resolved: #135440
Approved by: https://github.com/ezyang

(cherry picked from commit 09287e3)
kit1980 pushed a commit that referenced this pull request Oct 2, 2024
[MPS] Add regression test for `fft.fftfreq` (#135440)

The issue reported in #135223 was already solved in #128393. This PR adds a regression test for it.

Fixes #135223

Pull Request resolved: #135440
Approved by: https://github.com/ezyang

(cherry picked from commit 09287e3)

Co-authored-by: Roy Hvaara <roy@lightyear.no>
pytorchmergebot pushed a commit that referenced this pull request Dec 1, 2024
…`nn.Conv3d` (#141780)

When the input tensor to Conv3d is in the channels_last_3d memory format the Conv3d op will generate incorrect output (see example image in #141471). This PR checks if the op is 3d, and then attempts to convert the input tensor to contiguous.

Added a regression test that verifies the output by running the same op on the CPU.

I'm unsure if Conv3d supports the channels last memory format after #128393. If it does, we should consider updating the logic to utilize this as it would be more efficient. Perhaps @DenisVieriu97 knows or has more context?

Fixes #141471
Pull Request resolved: #141780
Approved by: https://github.com/malfet
pytorchmergebot pushed a commit to sujoysaraswati/pytorch that referenced this pull request Dec 2, 2024
…`nn.Conv3d` (pytorch#141780)

When the input tensor to Conv3d is in the channels_last_3d memory format the Conv3d op will generate incorrect output (see example image in pytorch#141471). This PR checks if the op is 3d, and then attempts to convert the input tensor to contiguous.

Added a regression test that verifies the output by running the same op on the CPU.

I'm unsure if Conv3d supports the channels last memory format after pytorch#128393. If it does, we should consider updating the logic to utilize this as it would be more efficient. Perhaps @DenisVieriu97 knows or has more context?

Fixes pytorch#141471
Pull Request resolved: pytorch#141780
Approved by: https://github.com/malfet
pobin6 pushed a commit to pobin6/pytorch that referenced this pull request Dec 5, 2024
…`nn.Conv3d` (pytorch#141780)

When the input tensor to Conv3d is in the channels_last_3d memory format the Conv3d op will generate incorrect output (see example image in pytorch#141471). This PR checks if the op is 3d, and then attempts to convert the input tensor to contiguous.

Added a regression test that verifies the output by running the same op on the CPU.

I'm unsure if Conv3d supports the channels last memory format after pytorch#128393. If it does, we should consider updating the logic to utilize this as it would be more efficient. Perhaps @DenisVieriu97 knows or has more context?

Fixes pytorch#141471
Pull Request resolved: pytorch#141780
Approved by: https://github.com/malfet
malfet added a commit that referenced this pull request Jan 30, 2025
Caused by #128393 that change semantic of `needsGather`, which resulted in silent correctness errors on MacOS-15+ if output tensor is non-contiguous

Fixes #145203
pytorchmergebot pushed a commit that referenced this pull request Jan 30, 2025
Caused by #128393 that change semantic of `needsGather`, which resulted in silent correctness errors on MacOS-15+ if output tensor is non-contiguous

Fixes #145203

Pull Request resolved: #146085
Approved by: https://github.com/dcci
mori360 pushed a commit to mori360/pytorch that referenced this pull request Feb 6, 2025
Caused by pytorch#128393 that change semantic of `needsGather`, which resulted in silent correctness errors on MacOS-15+ if output tensor is non-contiguous

Fixes pytorch#145203

Pull Request resolved: pytorch#146085
Approved by: https://github.com/dcci
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ciflow/mps Run MPS tests (subset of trunk) ciflow/trunk Trigger trunk jobs on your pull request Merged module: cpu CPU specific problem (e.g., perf, algorithm) open source release notes: mps Release notes category triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy