Skip to content

RuntimeError when running backward on MPS: "view size is not compatible" with self-attention block #142344

@NripeshN

Description

@NripeshN

🐛 Describe the bug

I’m running a simple model with a self-attention block on an Apple M2 Max using the MPS backend. The code runs fine on CPU and CUDA, but fails on MPS with a runtime error during the backward pass. The error suggests an issue with tensor shape or memory layout. Even after using .contiguous() and .reshape() instead of .view(), the problem persists only on MPS.

Minimal Reproducer:

import torch
import torch.nn as nn
import torch.optim as optim

device = torch.device("mps" if torch.backends.mps.is_available() else "cpu")

class SelfAttentionBlock(nn.Module):
    def __init__(self, in_channels):
        super().__init__()
        self.query_conv = nn.Conv2d(in_channels, in_channels // 8, kernel_size=1)
        self.key_conv = nn.Conv2d(in_channels, in_channels // 8, kernel_size=1)
        self.value_conv = nn.Conv2d(in_channels, in_channels, kernel_size=1)
        self.gamma = nn.Parameter(torch.zeros(1))

    def forward(self, x):
        B, C, H, W = x.size()
        query = self.query_conv(x).contiguous().reshape(B, -1, H*W)
        key   = self.key_conv(x).contiguous().reshape(B, -1, H*W)
        value = self.value_conv(x).contiguous().reshape(B, -1, H*W)

        attention = torch.bmm(query.permute(0, 2, 1), key)
        attention = torch.softmax(attention, dim=-1)
        out = torch.bmm(value, attention.permute(0, 2, 1))
        out = out.contiguous().reshape(B, C, H, W)
        return self.gamma * out + x

class SimpleDenoiseNet(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv_in = nn.Conv2d(3, 32, kernel_size=3, padding=1)
        self.attn = SelfAttentionBlock(32)
        self.conv_out = nn.Conv2d(32, 3, kernel_size=3, padding=1)
    
    def forward(self, x):
        x = self.conv_in(x)
        x = self.attn(x)
        x = self.conv_out(x)
        return x

model = SimpleDenoiseNet().to(device)
optimizer = optim.Adam(model.parameters(), lr=1e-3)
criterion = nn.MSELoss()

input_data = torch.rand(2, 3, 32, 32, device=device)
target_data = torch.rand(2, 3, 32, 32, device=device)

optimizer.zero_grad()
output = model(input_data)
loss = criterion(output, target_data)
loss.backward()  # Fails on MPS, works on CPU/CUDA
optimizer.step()

Error:

RuntimeError: view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(...) instead.

Full Traceback (if applicable):

{
	"name": "RuntimeError",
	"message": "view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(...) instead.",
	"stack": "---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Cell In[135], line 52
     50 output = model(input_data)
     51 loss = criterion(output, target_data)
---> 52 loss.backward()  # Check if this triggers the RuntimeError on MPS
     53 optimizer.step()
     55 print(\"Forward and backward pass completed successfully!\")

File ~/Documents/Python Programs/ML_assignment1/venv/lib/python3.10/site-packages/torch/_tensor.py:581, in Tensor.backward(self, gradient, retain_graph, create_graph, inputs)
    571 if has_torch_function_unary(self):
    572     return handle_torch_function(
    573         Tensor.backward,
    574         (self,),
   (...)
    579         inputs=inputs,
    580     )
--> 581 torch.autograd.backward(
    582     self, gradient, retain_graph, create_graph, inputs=inputs
    583 )

File ~/Documents/Python Programs/ML_assignment1/venv/lib/python3.10/site-packages/torch/autograd/__init__.py:347, in backward(tensors, grad_tensors, retain_graph, create_graph, grad_variables, inputs)
    342     retain_graph = create_graph
    344 # The reason we repeat the same comment below is that
    345 # some Python versions print out the first line of a multi-line function
    346 # calls in the traceback and some print out the last line
--> 347 _engine_run_backward(
    348     tensors,
    349     grad_tensors_,
    350     retain_graph,
    351     create_graph,
    352     inputs,
    353     allow_unreachable=True,
    354     accumulate_grad=True,
    355 )

File ~/Documents/Python Programs/ML_assignment1/venv/lib/python3.10/site-packages/torch/autograd/graph.py:825, in _engine_run_backward(t_outputs, *args, **kwargs)
    823     unregister_hooks = _register_logging_hooks_on_whole_graph(t_outputs)
    824 try:
--> 825     return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
    826         t_outputs, *args, **kwargs
    827     )  # Calls into the C++ engine to run the backward pass
    828 finally:
    829     if attach_logging_hooks:

RuntimeError: view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(...) instead."
}

This seems like a backend-specific bug. Any guidance or fix would be appreciated.

Versions

(venv) nripeshn@Nripeshs-MacBook-Pro ~/D/P/ML123 (main)> python collect_env.py
Collecting environment information...
PyTorch version: 2.5.1
Is debug build: False
CUDA used to build PyTorch: None
ROCM used to build PyTorch: N/A

OS: macOS 15.2 (arm64)
GCC version: Could not collect
Clang version: 16.0.0 (clang-1600.0.24.1)
CMake version: version 3.30.3
Libc version: N/A

Python version: 3.10.14 (main, May 12 2024, 02:15:34) [Clang 15.0.0 (clang-1500.3.9.4)] (64-bit runtime)
Python platform: macOS-15.2-arm64-arm-64bit
Is CUDA available: False
CUDA runtime version: No CUDA
CUDA_MODULE_LOADING set to: N/A
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

CPU:
Apple M2 Max

Versions of relevant libraries:
[pip3] nirtorch==1.0
[pip3] numpy==2.1.3
[pip3] snntorch==0.9.1
[pip3] torch==2.5.1
[pip3] torchvision==0.20.1
[conda] Could not collect

cc @ezyang @gchanan @zou3519 @kadeng @msaroufim @kulinseth @albanD @malfet @DenisVieriu97 @jhavukainen

Metadata

Metadata

Assignees

Labels

high prioritymodule: convolutionProblems related to convolutions (THNN, THCUNN, CuDNN)module: mpsRelated to Apple Metal Performance Shaders frameworkmodule: regressionIt used to work, and now it doesn'ttriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate module

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions

    pFad - Phonifier reborn

    Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

    Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


    Alternative Proxies:

    Alternative Proxy

    pFad Proxy

    pFad v3 Proxy

    pFad v4 Proxy