-
Notifications
You must be signed in to change notification settings - Fork 24.7k
Description
Platforms: rocm
This test was disabled because it is failing in CI. See recent examples and the most recent trunk workflow logs.
Over the past 6 hours, it has been determined flaky in 16 workflow(s) with 32 failures and 16 successes.
Debugging instructions (after clicking on the recent samples link):
DO NOT ASSUME THINGS ARE OKAY IF THE CI IS GREEN. We now shield flaky tests from developers so CI will thus be green but it will be harder to parse the logs.
To find relevant log snippets:
- Click on the workflow logs linked above
- Click on the Test step of the job so that it is expanded. Otherwise, the grepping will not work.
- Grep for
test_graph_partition_forward_with_skipped_cudagraphed_backward
- There should be several instances run (as flaky tests are rerun in CI) from which you can study the logs.
Sample error message
Traceback (most recent call last):
File "/var/lib/jenkins/pytorch/test/inductor/test_cudagraph_trees.py", line 2936, in test_graph_partition_forward_with_skipped_cudagraphed_backward
out.backward(back_inp)
File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_tensor.py", line 653, in backward
torch.autograd.backward(
File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/__init__.py", line 354, in backward
_engine_run_backward(
File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/graph.py", line 829, in _engine_run_backward
return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/autograd/function.py", line 311, in apply
return user_fn(self, *args)
File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 2260, in backward
return impl_fn()
File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 2246, in impl_fn
out = CompiledFunction._backward_impl(ctx, all_args)
File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 2348, in _backward_impl
CompiledFunction.compiled_bw = aot_config.bw_compiler(
File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/aot_autograd.py", line 482, in __call__
return self.compiler_fn(gm, example_inputs)
File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/backends/common.py", line 76, in _wrapped_bw_compiler
disable(
File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 962, in _fn
return fn(*args, **kwargs)
File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_utils_internal.py", line 98, in wrapper_function
return function(*args, **kwargs)
File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2373, in bw_compiler
return inner_compile(
File "/opt/conda/envs/py_3.10/lib/python3.10/contextlib.py", line 79, in inner
return func(*args, **kwds)
File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 767, in compile_fx_inner
return wrap_compiler_debug(_compile_fx_inner, compiler_name="inductor")(
File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/repro/after_aot.py", line 124, in debug_wrapper
inner_compiled_fn = compiler_fn(gm, example_inputs)
File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1021, in _compile_fx_inner
compiled_graph.post_compile(example_inputs, constants, graph_kwargs)
File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/output_code.py", line 637, in post_compile
cudagraph_partition_post_compile(
File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/output_code.py", line 274, in cudagraph_partition_post_compile
maybe_handle_backward_generation(compiled_graph, boxed_forward_device_index)
File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/output_code.py", line 161, in maybe_handle_backward_generation
assert manager is not None
AssertionError
Test file path: inductor/test_cudagraph_trees_expandable_segments.py
For all disabled tests (by GitHub issue), see https://hud.pytorch.org/disabled.
cc @ptrblck @msaroufim @eqy @jerryzh168 @jeffdaily @sunway513 @jithunnair-amd @pruthvistony @ROCmSupport @dllehr-amd @jataylo @hongxiayang @naromero77amd @clee2000