[dtensor][view_op] add as_strided op support to DTensor in FakeTensorMode #151495

XilunWu · 2025-04-17T01:14:24Z

Stack from ghstack (oldest at bottom):

-> [dtensor][view_op] add as_strided op support to DTensor in FakeTensorMode #151495

Introduction

flex_attention's FakeTensor propagation flex_attention_fake_impl permutes the stride of out (the attention score) based on query's stride. To enable flex_attention call on DTensor, this requires us add as_strided support on DTensor in FakeTensorMode.

Limited Support

Due to the complexity of supporting actual as_strided on DTensor, I choose to only enable a limited subset:

as_strided only works correctly in FakeTensorMode i.e. shape and strided propagation.
as_strided is only allowed in case where size == input.shape because this PR specifically unblocks the use case of flex_attention_fake_impl.
as_strided requires storage_offset=None because the other case is not defined in DTensor.

Test

pytest test/distributed/tensor/test_view_ops.py -s -k test_as_strided

cc @H-Huang @awgu @wanchaol @fegin @fduwjj @wz337 @wconstab @d4l3k @tianyu-l

[ghstack-poisoned]

pytorch-bot · 2025-04-17T01:14:27Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/151495

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (1 Unrelated Failure)

As of commit 6a84ea5 with merge base b7c7000 ():

BROKEN TRUNK - The following job failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

inductor / cuda12.6-py3.10-gcc9-sm86 / test (inductor_torchbench, 1, 2, ephemeral.linux.g5.4xlarge.nvidia.gpu) (gh) (trunk failure)
demucs

This comment was automatically generated by Dr. CI and updates every 15 minutes.

… FakeTensorMode" Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at bottom): * #151497 * #151496 * __->__ #151495 ## Introduction `flex_attention`'s FakeTensor propagation `flex_attention_fake_impl` [permutes](https://github.com/pytorch/pytorch/blob/fb6ac2f16132f7953711ce6924bc2ee4a033228c/torch/_higher_order_ops/flex_attention.py#L459) the stride of `out` (the attention score) based on `query`'s stride. To enable `flex_attention` call on DTensor, this requires us add `as_strided` support on DTensor in `FakeTensorMode`. ## Limited Support Due to the complexity of supporting actual `as_strided` on DTensor, I choose to only enable a limited subset: 1. `as_strided` only works correctly in `FakeTensorMode` i.e. shape and strided propagation. 2. `as_strided` is only allowed in case where `size == input.shape` because this PR specifically unblocks the use case of `flex_attention_fake_impl`. 3. `as_strided` requires `storage_offset=None` because the other case is not defined in DTensor. ## Test `pytest test/distributed/tensor/test_view_ops.py -s -k test_as_strided` cc H-Huang awgu wanchaol fegin fduwjj wz337 wconstab d4l3k tianyu-l [ghstack-poisoned]

…rt to DTensor in FakeTensorMode" Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at bottom): * #151497 * #151496 * __->__ #151495 ## Introduction `flex_attention`'s FakeTensor propagation `flex_attention_fake_impl` [permutes](https://github.com/pytorch/pytorch/blob/fb6ac2f16132f7953711ce6924bc2ee4a033228c/torch/_higher_order_ops/flex_attention.py#L459) the stride of `out` (the attention score) based on `query`'s stride. To enable `flex_attention` call on DTensor, this requires us add `as_strided` support on DTensor in `FakeTensorMode`. ## Limited Support Due to the complexity of supporting actual `as_strided` on DTensor, I choose to only enable a limited subset: 1. `as_strided` only works correctly in `FakeTensorMode` i.e. shape and strided propagation. 2. `as_strided` is only allowed in case where `size == input.shape` because this PR specifically unblocks the use case of `flex_attention_fake_impl`. 3. `as_strided` requires `storage_offset=None` because the other case is not defined in DTensor. ## Test `pytest test/distributed/tensor/test_view_ops.py -s -k test_as_strided` cc H-Huang awgu wanchaol fegin fduwjj wz337 wconstab d4l3k tianyu-l [ghstack-poisoned]

… FakeTensorMode" Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at bottom): * #151497 * #151496 * __->__ #151495 ## Introduction `flex_attention`'s FakeTensor propagation `flex_attention_fake_impl` [permutes](https://github.com/pytorch/pytorch/blob/fb6ac2f16132f7953711ce6924bc2ee4a033228c/torch/_higher_order_ops/flex_attention.py#L459) the stride of `out` (the attention score) based on `query`'s stride. To enable `flex_attention` call on DTensor, this requires us add `as_strided` support on DTensor in `FakeTensorMode`. ## Limited Support Due to the complexity of supporting actual `as_strided` on DTensor, I choose to only enable a limited subset: 1. `as_strided` only works correctly in `FakeTensorMode` i.e. shape and strided propagation. 2. `as_strided` is only allowed in case where `size == input.shape` because this PR specifically unblocks the use case of `flex_attention_fake_impl`. 3. `as_strided` requires `storage_offset=None` because the other case is not defined in DTensor. ## Test `pytest test/distributed/tensor/test_view_ops.py -s -k test_as_strided` cc H-Huang awgu wanchaol fegin fduwjj wz337 wconstab d4l3k tianyu-l [ghstack-poisoned]

…rt to DTensor in FakeTensorMode" Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at bottom): * #151497 * #151496 * __->__ #151495 ## Introduction `flex_attention`'s FakeTensor propagation `flex_attention_fake_impl` [permutes](https://github.com/pytorch/pytorch/blob/fb6ac2f16132f7953711ce6924bc2ee4a033228c/torch/_higher_order_ops/flex_attention.py#L459) the stride of `out` (the attention score) based on `query`'s stride. To enable `flex_attention` call on DTensor, this requires us add `as_strided` support on DTensor in `FakeTensorMode`. ## Limited Support Due to the complexity of supporting actual `as_strided` on DTensor, I choose to only enable a limited subset: 1. `as_strided` only works correctly in `FakeTensorMode` i.e. shape and strided propagation. 2. `as_strided` is only allowed in case where `size == input.shape` because this PR specifically unblocks the use case of `flex_attention_fake_impl`. 3. `as_strided` requires `storage_offset=None` because the other case is not defined in DTensor. ## Test `pytest test/distributed/tensor/test_view_ops.py -s -k test_as_strided` cc H-Huang awgu wanchaol fegin fduwjj wz337 wconstab d4l3k tianyu-l [ghstack-poisoned]

wconstab

Can you explain further why we need this support only in fake mode? Is it because we trace through this op and the resulting graph needs to capture a local as_strided operation, but the operation would run on a plain tensor after compilation?

wconstab · 2025-04-17T04:20:24Z

torch/distributed/tensor/_ops/_view_ops.py

+
+
+@register_op_strategy(aten.as_strided.default, schema_info=RuntimeSchemaInfo(1))
+def as_strided_strategy(op_schema: OpSchema) -> StrategyType:


Where do we enforce that this is only used in fake mode?

this is the part I don't feel confident -- I did not enforce that. I only tested that it works out for my use case in Fake Tensor mode and totally rely on users to read the comment and not use it in other case.

Can you explain further why we need this support only in fake mode?
When I try calling flex_attention on DTensor in #145353 , I hit error on DTensor as_strided not being implemented. This is because flex_attention in dynamo calls flex_attention_fake_impl to trace shape and stride of its output, and flex_attention_fake_impl uses as_strided. As @bdhirsh said in #145353 comment, one solution is to have dynamo call flex_attention rather than flex_attention_fake_impl but that requires code change in dynamo. This PR is a quick workaround.

… FakeTensorMode" Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at bottom): * #151497 * #151507 * __->__ #151495 ## Introduction `flex_attention`'s FakeTensor propagation `flex_attention_fake_impl` [permutes](https://github.com/pytorch/pytorch/blob/fb6ac2f16132f7953711ce6924bc2ee4a033228c/torch/_higher_order_ops/flex_attention.py#L459) the stride of `out` (the attention score) based on `query`'s stride. To enable `flex_attention` call on DTensor, this requires us add `as_strided` support on DTensor in `FakeTensorMode`. ## Limited Support Due to the complexity of supporting actual `as_strided` on DTensor, I choose to only enable a limited subset: 1. `as_strided` only works correctly in `FakeTensorMode` i.e. shape and strided propagation. 2. `as_strided` is only allowed in case where `size == input.shape` because this PR specifically unblocks the use case of `flex_attention_fake_impl`. 3. `as_strided` requires `storage_offset=None` because the other case is not defined in DTensor. ## Test `pytest test/distributed/tensor/test_view_ops.py -s -k test_as_strided` cc H-Huang awgu wanchaol fegin fduwjj wz337 wconstab d4l3k tianyu-l [ghstack-poisoned]

…rt to DTensor in FakeTensorMode" Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at bottom): * #151497 * #151507 * __->__ #151495 ## Introduction `flex_attention`'s FakeTensor propagation `flex_attention_fake_impl` [permutes](https://github.com/pytorch/pytorch/blob/fb6ac2f16132f7953711ce6924bc2ee4a033228c/torch/_higher_order_ops/flex_attention.py#L459) the stride of `out` (the attention score) based on `query`'s stride. To enable `flex_attention` call on DTensor, this requires us add `as_strided` support on DTensor in `FakeTensorMode`. ## Limited Support Due to the complexity of supporting actual `as_strided` on DTensor, I choose to only enable a limited subset: 1. `as_strided` only works correctly in `FakeTensorMode` i.e. shape and strided propagation. 2. `as_strided` is only allowed in case where `size == input.shape` because this PR specifically unblocks the use case of `flex_attention_fake_impl`. 3. `as_strided` requires `storage_offset=None` because the other case is not defined in DTensor. ## Test `pytest test/distributed/tensor/test_view_ops.py -s -k test_as_strided` cc H-Huang awgu wanchaol fegin fduwjj wz337 wconstab d4l3k tianyu-l [ghstack-poisoned]

pytorchmergebot · 2025-04-22T06:30:32Z

Starting merge as part of PR stack under #151507

… FakeTensorMode" Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at bottom): * #151497 * #151507 * __->__ #151495 ## Introduction `flex_attention`'s FakeTensor propagation `flex_attention_fake_impl` [permutes](https://github.com/pytorch/pytorch/blob/fb6ac2f16132f7953711ce6924bc2ee4a033228c/torch/_higher_order_ops/flex_attention.py#L459) the stride of `out` (the attention score) based on `query`'s stride. To enable `flex_attention` call on DTensor, this requires us add `as_strided` support on DTensor in `FakeTensorMode`. ## Limited Support Due to the complexity of supporting actual `as_strided` on DTensor, I choose to only enable a limited subset: 1. `as_strided` only works correctly in `FakeTensorMode` i.e. shape and strided propagation. 2. `as_strided` is only allowed in case where `size == input.shape` because this PR specifically unblocks the use case of `flex_attention_fake_impl`. 3. `as_strided` requires `storage_offset=None` because the other case is not defined in DTensor. ## Test `pytest test/distributed/tensor/test_view_ops.py -s -k test_as_strided` cc H-Huang awgu wanchaol fegin fduwjj wz337 wconstab d4l3k tianyu-l [ghstack-poisoned]

…rt to DTensor in FakeTensorMode" Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at bottom): * #151497 * #151507 * __->__ #151495 ## Introduction `flex_attention`'s FakeTensor propagation `flex_attention_fake_impl` [permutes](https://github.com/pytorch/pytorch/blob/fb6ac2f16132f7953711ce6924bc2ee4a033228c/torch/_higher_order_ops/flex_attention.py#L459) the stride of `out` (the attention score) based on `query`'s stride. To enable `flex_attention` call on DTensor, this requires us add `as_strided` support on DTensor in `FakeTensorMode`. ## Limited Support Due to the complexity of supporting actual `as_strided` on DTensor, I choose to only enable a limited subset: 1. `as_strided` only works correctly in `FakeTensorMode` i.e. shape and strided propagation. 2. `as_strided` is only allowed in case where `size == input.shape` because this PR specifically unblocks the use case of `flex_attention_fake_impl`. 3. `as_strided` requires `storage_offset=None` because the other case is not defined in DTensor. ## Test `pytest test/distributed/tensor/test_view_ops.py -s -k test_as_strided` cc H-Huang awgu wanchaol fegin fduwjj wz337 wconstab d4l3k tianyu-l [ghstack-poisoned]

ghstack-source-id: 2ed5ea5 Pull Request resolved: pytorch/pytorch#151495

ghstack-source-id: 0fc1788 Pull Request resolved: pytorch/pytorch#151495

wanchaol

see comments inlined

wanchaol · 2025-04-23T21:59:59Z

test/distributed/tensor/test_view_ops.py

+
+                for target_stride in itertools.permutations(stride):
+                    dtensor_y = dtensor_x.as_strided(size, target_stride)
+                    tensor_y = tensor_x.as_strided(size, target_stride)


I think you should also test on the shape/stride assertion you added in the op?

wanchaol · 2025-04-23T22:01:05Z

torch/distributed/tensor/_ops/_view_ops.py

+    target_size = op_schema.args_schema[1]
+    assert isinstance(target_size, (tuple, list))
+    target_stride = op_schema.args_schema[2]
+    assert isinstance(target_stride, (tuple, list))


don't need the tuple/list assertion if you are converting them to tuple anyways?

wanchaol · 2025-04-23T22:15:14Z

torch/distributed/tensor/_ops/_view_ops.py

+        f"as_strided only supports the same size: input has size {inp_size} but target size is {target_size}"
+    )
+
+    assert len(target_size) == len(target_stride), (


aren't you also need to guard the stride are the same?

wanchaol · 2025-04-23T22:16:14Z

torch/distributed/tensor/_ops/_view_ops.py

+        f"size and stride should have the same length, but got {len(target_size)} and {len(target_stride)}"
+    )
+
+    from torch.distributed.tensor._ops._tensor_ops import default_strategy


I would prefer to not re-use the default strategy, it's simple to just write its own

github-actions · 2025-06-23T08:41:54Z

Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as Stale.
Feel free to remove the Stale label if you feel this was a mistake.
If you are unable to remove the Stale label please contact a maintainer in order to do so.
If you want the bot to never mark this PR stale again, add the no-stale label.
Stale pull requests will automatically be closed after 30 days of inactivity.

ghstack-source-id: 2ed5ea5 Pull Request resolved: pytorch/pytorch#151495

[dtensor][view_op] add as_strided op support to DTensor

08ee6f3

[ghstack-poisoned]

pytorch-bot bot added ciflow/inductor oncall: distributed Add this issue/PR to distributed oncall triage queue labels Apr 17, 2025

This was referenced Apr 17, 2025

[BE] follow autoformating and linter #151496

Closed

[cp] dispatch flex_attention to CP impl in TorchDispatchMode #151497

Open

XilunWu added the module: dtensor distributed tensor tag label Apr 17, 2025

XilunWu changed the title ~~[dtensor][view_op] add as_strided op support to DTensor~~ [dtensor][view_op] add as_strided op support to DTensor in FakeTensorMode Apr 17, 2025

XilunWu added the topic: not user facing topic category label Apr 17, 2025

XilunWu requested review from wz337, tianyu-l and wanchaol and removed request for tianyu-l April 17, 2025 02:34

XilunWu requested a review from bdhirsh April 17, 2025 02:34

XilunWu mentioned this pull request Apr 17, 2025

[BE] follow autoformating and linter #151507

Closed

wconstab reviewed Apr 17, 2025

View reviewed changes

XilunWu mentioned this pull request Apr 18, 2025

DTensor HOP call in TorchDispatchMode #151685

Closed

XilunWu mentioned this pull request Apr 22, 2025

[cp] dispatch flex_attention on DTensor to cp implementation #151900

Draft

This was referenced Apr 22, 2025

[cp] cast tensor to DTensor for flex_attention #151902

Closed

[cp] Context Parallel: dispatch flex_attention to CP impl in TorchDispatchMode #151903

Closed

Divigroup-RAP pushed a commit to Divigroup-RAP/PYTORCH that referenced this pull request Apr 22, 2025

[dtensor][view_op] add as_strided op support to DTensor

73f4f88

ghstack-source-id: 2ed5ea5 Pull Request resolved: pytorch/pytorch#151495

Divigroup-RAP pushed a commit to Divigroup-RAP/PYTORCH that referenced this pull request Apr 22, 2025

[dtensor][view_op] add as_strided op support to DTensor

5ca6a4a

ghstack-source-id: 0fc1788 Pull Request resolved: pytorch/pytorch#151495

wanchaol requested changes Apr 23, 2025

View reviewed changes

github-actions bot added the Stale label Jun 23, 2025

superiwan pushed a commit to superiwan/pytorch that referenced this pull request Jul 14, 2025

[dtensor][view_op] add as_strided op support to DTensor

eba691c

ghstack-source-id: 2ed5ea5 Pull Request resolved: pytorch/pytorch#151495

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[dtensor][view_op] add as_strided op support to DTensor in FakeTensorMode #151495

[dtensor][view_op] add as_strided op support to DTensor in FakeTensorMode #151495

XilunWu commented Apr 17, 2025 •

edited

Loading

Uh oh!

pytorch-bot bot commented Apr 17, 2025 •

edited

Loading

Uh oh!

wconstab left a comment

Uh oh!

wconstab Apr 17, 2025

Uh oh!

XilunWu Apr 17, 2025 •

edited

Loading

Uh oh!

pytorchmergebot commented Apr 22, 2025

Uh oh!

wanchaol left a comment

Uh oh!

wanchaol Apr 23, 2025

Uh oh!

wanchaol Apr 23, 2025

Uh oh!

wanchaol Apr 23, 2025

Uh oh!

wanchaol Apr 23, 2025

Uh oh!

github-actions bot commented Jun 23, 2025

Uh oh!

Uh oh!

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.



		@register_op_strategy(aten.as_strided.default, schema_info=RuntimeSchemaInfo(1))
		def as_strided_strategy(op_schema: OpSchema) -> StrategyType:

[dtensor][view_op] add as_strided op support to DTensor in FakeTensorMode #151495

Are you sure you want to change the base?

[dtensor][view_op] add as_strided op support to DTensor in FakeTensorMode #151495

Conversation

XilunWu commented Apr 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Introduction

Limited Support

Test

Uh oh!

pytorch-bot bot commented Apr 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/151495

✅ You can merge normally! (1 Unrelated Failure)

Uh oh!

wconstab left a comment

Choose a reason for hiding this comment

Uh oh!

wconstab Apr 17, 2025

Choose a reason for hiding this comment

Uh oh!

XilunWu Apr 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pytorchmergebot commented Apr 22, 2025

Uh oh!

wanchaol left a comment

Choose a reason for hiding this comment

Uh oh!

wanchaol Apr 23, 2025

Choose a reason for hiding this comment

Uh oh!

wanchaol Apr 23, 2025

Choose a reason for hiding this comment

Uh oh!

wanchaol Apr 23, 2025

Choose a reason for hiding this comment

Uh oh!

wanchaol Apr 23, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Jun 23, 2025

Uh oh!

Uh oh!

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

XilunWu commented Apr 17, 2025 •

edited

Loading

pytorch-bot bot commented Apr 17, 2025 •

edited

Loading

XilunWu Apr 17, 2025 •

edited

Loading