Skip to content

DISABLED test_graph_partition_forward_with_skipped_cudagraphed_backward (__main__.CudaGraphTreeTests) #157723

@pytorch-bot

Description

@pytorch-bot

Platforms: rocm

This test was disabled because it is failing in CI. See recent examples and the most recent trunk workflow logs.

Over the past 6 hours, it has been determined flaky in 16 workflow(s) with 32 failures and 16 successes.

Debugging instructions (after clicking on the recent samples link):
DO NOT ASSUME THINGS ARE OKAY IF THE CI IS GREEN. We now shield flaky tests from developers so CI will thus be green but it will be harder to parse the logs.
To find relevant log snippets:

  1. Click on the workflow logs linked above
  2. Click on the Test step of the job so that it is expanded. Otherwise, the grepping will not work.
  3. Grep for test_graph_partition_forward_with_skipped_cudagraphed_backward
  4. There should be several instances run (as flaky tests are rerun in CI) from which you can study the logs.
Sample error message
Traceback (most recent call last):
  File "/var/lib/jenkins/pytorch/test/inductor/test_cudagraph_trees.py", line 2936, in test_graph_partition_forward_with_skipped_cudagraphed_backward
    out.backward(back_inp)
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_tensor.py", line 653, in backward
    torch.autograd.backward(
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/__init__.py", line 354, in backward
    _engine_run_backward(
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py", line 829, in _engine_run_backward
    return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/function.py", line 311, in apply
    return user_fn(self, *args)
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 2260, in backward
    return impl_fn()
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 2246, in impl_fn
    out = CompiledFunction._backward_impl(ctx, all_args)
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 2348, in _backward_impl
    CompiledFunction.compiled_bw = aot_config.bw_compiler(
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/aot_autograd.py", line 482, in __call__
    return self.compiler_fn(gm, example_inputs)
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/backends/common.py", line 76, in _wrapped_bw_compiler
    disable(
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 962, in _fn
    return fn(*args, **kwargs)
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_utils_internal.py", line 98, in wrapper_function
    return function(*args, **kwargs)
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2373, in bw_compiler
    return inner_compile(
  File "/opt/conda/envs/py_3.10/lib/python3.10/contextlib.py", line 79, in inner
    return func(*args, **kwds)
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 767, in compile_fx_inner
    return wrap_compiler_debug(_compile_fx_inner, compiler_name="inductor")(
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/repro/after_aot.py", line 124, in debug_wrapper
    inner_compiled_fn = compiler_fn(gm, example_inputs)
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1021, in _compile_fx_inner
    compiled_graph.post_compile(example_inputs, constants, graph_kwargs)
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/output_code.py", line 637, in post_compile
    cudagraph_partition_post_compile(
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/output_code.py", line 274, in cudagraph_partition_post_compile
    maybe_handle_backward_generation(compiled_graph, boxed_forward_device_index)
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/output_code.py", line 161, in maybe_handle_backward_generation
    assert manager is not None
AssertionError

Test file path: inductor/test_cudagraph_trees_expandable_segments.py

For all disabled tests (by GitHub issue), see https://hud.pytorch.org/disabled.

cc @ptrblck @msaroufim @eqy @jerryzh168 @jeffdaily @sunway513 @jithunnair-amd @pruthvistony @ROCmSupport @dllehr-amd @jataylo @hongxiayang @naromero77amd @clee2000

Metadata

Metadata

Assignees

No one assigned

    Labels

    module: cudaRelated to torch.cuda, and CUDA support in generalmodule: flaky-testsProblem is a flaky test in CImodule: rocmAMD GPU support for PytorchskippedDenotes a (flaky) test currently skipped in CI.triagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate module

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      pFad - Phonifier reborn

      Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

      Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


      Alternative Proxies:

      Alternative Proxy

      pFad Proxy

      pFad v3 Proxy

      pFad v4 Proxy