July 5, 2025 – July 12, 2025

Overview

206 Active pull requests

282 Active issues

21 Pull requests merged by 7 people

[aarch64] Add sm_80 to CUDA SBSA build
#158118 merged Jul 11, 2025
[user triton] AOT inductor support for device-side TMA
#157241 merged Jul 11, 2025
Add sm_70 arch for linux cuda 12.8 and 12.9 builds
#157968 merged Jul 11, 2025
Revert "Turn on compile with NVSHMEM (#154538)"
#158040 merged Jul 11, 2025
Revert "Add NVSHMEM to PYTORCH_EXTRA_INSTALL_REQUIREMENTS (#154568)"
#158039 merged Jul 11, 2025
[cherry-pick] revert #156552
#156767 merged Jul 10, 2025
cherrypick revert of #152932 for release 2.8
#158031 merged Jul 10, 2025
[inductor][user triton] sanitize triple-quoted docstrings in kernel definitions
#157454 merged Jul 9, 2025
[release] Triton pin update to 3.4
#157752 merged Jul 8, 2025
Bump urllib3 from 2.2.2 to 2.5.0 in /tools/build/bazel
#156390 merged Jul 8, 2025
[inductor][static launcher] Skip correctness test for test_floats
#157200 merged Jul 7, 2025
[ONNX] Bump onnxscript api for torch 2.8
#157137 merged Jul 7, 2025
Fix macOS build with USE_MPS=OFF
#156932 merged Jul 7, 2025
[dynamo] do not issue lru_cache warning for functions in the top-level torch namespace
#157718 merged Jul 7, 2025
[dynamo] Fix source for lru_cache method
#157308 merged Jul 7, 2025
[cherry-pick] Organize BUCK for torch/standalone and Rename torch::standalone to headeronly
#157418 merged Jul 7, 2025
[PowerPC] Fixed build issue for vsx vec256 complexfloat and scaled_mm_out_cpu
#157422 merged Jul 7, 2025
[ONNX] Fix conversion of attention - 4D
#157509 merged Jul 7, 2025
[dynamo] Fix bug in dict(mapping_proxy)
#157515 merged Jul 7, 2025
[cherry-pick] [fake tensor] fix issue of no attribute tags (#156689)
#157519 merged Jul 7, 2025
Add einops x torch.compile testing in PyTorch CI (#157416)
#157588 merged Jul 7, 2025

185 Pull requests opened by 106 people

[pt2 event logging] add configurable prefix
#157678 opened Jul 6, 2025
[Inductor][Float8] Add float8_e4m3fn into assertion dtype list.
#157684 opened Jul 7, 2025
[BE] add `SHFMT` linter to format shell scripts
#157685 opened Jul 7, 2025
[BE][1/4] format shell scripts with `SHFMT`
#157686 opened Jul 7, 2025
[BE][2/4] format shell scripts with `SHFMT` in .circleci/ and .github/
#157687 opened Jul 7, 2025
[BE][3/4] format shell scripts with `SHFMT` in .ci/
#157688 opened Jul 7, 2025
[BE][4/4] format shell scripts with `SHFMT` in scripts/
#157689 opened Jul 7, 2025
[inductor] initial triton static config lookup table
#157699 opened Jul 7, 2025
wip, lookup table for reduction configs
#157700 opened Jul 7, 2025
Enhance APoT quantizer
#157710 opened Jul 7, 2025
Add support moviepy 2.x
#157712 opened Jul 7, 2025
Avoid writing temporary modules to disk
#157713 opened Jul 7, 2025
[c10d] Prototype of `group_split` for dist2 work
#157716 opened Jul 7, 2025
Address NaNs if SDPA is called with all values masked from query
#157727 opened Jul 7, 2025
Expose opt_einsum in torch.backends
#157740 opened Jul 7, 2025
Fuse matmul
#157743 opened Jul 7, 2025
Migrate pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11 -> pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc13
#157748 opened Jul 7, 2025
[export] Update docs
#157750 opened Jul 8, 2025
Fix einops x torch.compile interaction
#157754 opened Jul 8, 2025
[SymmMem] Find NVSHMEM from system installation
#157755 opened Jul 8, 2025
[SymmMem] find_path does not search /usr/local/lib
#157765 opened Jul 8, 2025
Ensure large tensor int32 -> int64 indexing is enabled
#157767 opened Jul 8, 2025
[inductor][lookup_table] log entries
#157768 opened Jul 8, 2025
Enable _int_mm on Intel GPU
#157769 opened Jul 8, 2025
[BE]: Fix NVSHMEM builds, add missing 12.9 dependency and update to latest for 2.8RC
#157774 opened Jul 8, 2025
[Test][Do Not Merge] Update Ideep to latest oneDNN commit
#157782 opened Jul 8, 2025
S390X: fix detection of magic number placeholder in inductor
#157784 opened Jul 8, 2025
unskipped mobilenet_v3 quantization and mobilenet_v2 quantization plus tests from https://github.com/pytorch/pytorch/issues/125438
#157786 opened Jul 8, 2025
[BE]: Reduce binary size 40% using aggressive fatbin compression.
#157791 opened Jul 8, 2025
Add flag to fx.passes.split_module to normalize input names
#157793 opened Jul 8, 2025
[PP] Allow schedules to run under torch.no_grad()
#157795 opened Jul 8, 2025
[claude-code] Add top-level module doc for torch/distributed/tensor/_op_schema.py
#157804 opened Jul 8, 2025
feat(dynamo): raise UnsupportedError for ndarray.astype(object)
#157810 opened Jul 8, 2025
Fix broken docs requirements symlink
#157811 opened Jul 8, 2025
Allow docker builds to deal with symlinks
#157812 opened Jul 8, 2025
Make functorch notebook symlinks PEP 517 valid
#157813 opened Jul 8, 2025
Improve MANIFEST.in for source distribution
#157814 opened Jul 8, 2025
Add PEP 517 compliant Python source distribution to release process
#157815 opened Jul 8, 2025
[canary] dedupe args + on by default
#157817 opened Jul 8, 2025
[Inductor] [Triton] Enabling TMA for flex-attention for supported device types
#157822 opened Jul 8, 2025
[WIP][Inductor][Intel GPU] Always use channel last for only freezing mode.
#157828 opened Jul 8, 2025
[SymmMem] Avoid library mismatch in CMake search
#157836 opened Jul 8, 2025
partial reads
#157838 opened Jul 8, 2025
remove allow-untyped-defs from torch/ao/pruning/sparsifier/nearly_diagonal_sparsifier.py
#157846 opened Jul 8, 2025
remove allow-untyped-defs from torch/_higher_order_ops/run_const_graph.py
#157847 opened Jul 8, 2025
remove allow-untyped-defs from torch/ao/nn/intrinsic/quantized/dynamic/modules/linear_relu.py
#157848 opened Jul 8, 2025
remove allow-untyped-defs from torch/fx/passes/backends/cudagraphs.py
#157849 opened Jul 8, 2025
remove allow-untyped-defs from torch/utils/tensorboard/_onnx_graph.py
#157850 opened Jul 8, 2025
remove allow-untyped-defs from torch/nn/utils/_expanded_weights/linear_expanded_weights.py
#157851 opened Jul 8, 2025
remove allow-untyped-defs from torch/ao/quantization/experimental/apot_utils.py
#157852 opened Jul 8, 2025
remove allow-untyped-defs from torch/autograd/_functions/utils.py
#157853 opened Jul 8, 2025
Make compiled mode broadcasting use 0 strides
#157854 opened Jul 8, 2025
Add functions to setup PrivateUse1 as a python backend device.
#157859 opened Jul 8, 2025
ScannedModule
#157864 opened Jul 8, 2025
[Draft][CUDA][CI] Test B200 Runner with Nightly Inductor Perf Test
#157870 opened Jul 9, 2025
[MPS] Improve performance of max_pool3d
#157875 opened Jul 9, 2025
[MPS] Move max_pool2d to Metal
#157876 opened Jul 9, 2025
[Inductor][Triton] Update TMA Compatibility Requirements
#157881 opened Jul 9, 2025
use maybe_mark_dynamic instead of mark_dynamic for -dynamic-batch-only option
#157885 opened Jul 9, 2025
Back out "[Inductor] Fix epilogue fusion decision with 1 Triton caller as choice"
#157887 opened Jul 9, 2025
[profiler] update CUDA runtime kernel identification logic
#157890 opened Jul 9, 2025
Partitioner: Fix to align partition node order with original graph
#157892 opened Jul 9, 2025
[CPU] Support GQA for flash attention
#157893 opened Jul 9, 2025
[Inductor] optimize welford reduction
#157902 opened Jul 9, 2025
[inductor] [cpu] fix the dype hardcoded to int64 in store_reduction
#157904 opened Jul 9, 2025
Fix logdet returning finite values for singular matrices on CUDA
#157910 opened Jul 9, 2025
[autograd] Avoid creating and recording event when unnecessary
#157914 opened Jul 9, 2025
Normalize placeholder names in AOTAutogradCache
#157916 opened Jul 9, 2025
Add type assert for tensor_meta, based on real bug in autoparallel.
#157927 opened Jul 9, 2025
[WIP] try compress-mode balance on CUDA12.6
#157928 opened Jul 9, 2025
Reduce random reads for offset metadata when calling torch.load under FakeTensorMode
#157931 opened Jul 9, 2025
Adding a change to kick off the theme pull
#157932 opened Jul 9, 2025
[WIP] WindowsArm64 CI changes
#157935 opened Jul 9, 2025
Add cuda 12.9 periodic tests
#157939 opened Jul 9, 2025
Fix early return in `_EmptyStateDictLoadPlanner`
#157940 opened Jul 9, 2025
Small fix to the release-feature-request.yml
#157941 opened Jul 9, 2025
[WIP][dynamic shapes] unbacked-safe slicing
#157944 opened Jul 9, 2025
[AOTI][CPP] add flag TORCHINDUCTOR_CPP_FORCE_INLINE_KERNEL
#157949 opened Jul 9, 2025
[cherry-pick] temporarily disabling generation of weblinks for torch v2.8 & removing string literals for weblink generation
#157951 opened Jul 9, 2025
Allow dynamic shapes for DTensor slice
#157953 opened Jul 9, 2025
Refactor AOTInductorModelBase for Device Interface Abstraction
#157954 opened Jul 9, 2025
Add method get_fqn_to_index_mapping_from_metadata for HF storage reader
#157956 opened Jul 9, 2025
Add cuda 12.4 build in CI
#157958 opened Jul 9, 2025
improve typing in torch/test
#157962 opened Jul 9, 2025
[DCP][Prototype] Checkpoint replication via PGTransport
#157963 opened Jul 9, 2025
Ck recipe
#157964 opened Jul 9, 2025
[WIP]What does it take to avoid specializations in this program with out unbacked semantics,
#157965 opened Jul 9, 2025
[do not review][benchmark] test all_gather communication time
#157967 opened Jul 9, 2025
[dynamo] Support more basic output types for `nonstrict_trace`
#157969 opened Jul 9, 2025
[Do not land] Test NVSHMEM build
#157970 opened Jul 9, 2025
[dynamo, nested graph breaks] implement new resume frame stack/locals/cell layout convention
#157971 opened Jul 9, 2025
[Dynamo][Hierarchical Compile] Allow parameters to be propagated to submodules
#157979 opened Jul 9, 2025
Define header type directly instead of using DeviceStreamType
#157982 opened Jul 10, 2025
[dynamo, docs] programming model dynamo core concepts
#157985 opened Jul 10, 2025
[Dynamo] Use proper sources for constructing dataclass defaults
#157993 opened Jul 10, 2025
[test][do not merge] Upgrade oneDNN to v3.9
#157994 opened Jul 10, 2025
Slightly improve error message from repeat_interleave kernel
#157996 opened Jul 10, 2025
[WIP][export] turn on backed_size_oblivious
#158004 opened Jul 10, 2025
Bump requests from 2.32.2 to 2.32.4 in /tools/build/bazel
#158006 opened Jul 10, 2025
Enable nightly PT2 benchmark on B200
#158011 opened Jul 10, 2025
[inductor] consolidate common GEMM triton param retrieval
#158015 opened Jul 10, 2025
remove unnecessary sync point in AveragedModel update
#158017 opened Jul 10, 2025
Update upstream opinfo to generate appropriately scaled sample inputs
#158018 opened Jul 10, 2025
[TESTING] Update pin to disable AMD triton buffer ops
#158019 opened Jul 10, 2025
Add property and setter annotations for Tensor attribute stubs
#158020 opened Jul 10, 2025
[build] remove `wheel` from build requirements
#158027 opened Jul 10, 2025
[precompile][wip] Increment frame and add compile ids when loading packages
#158028 opened Jul 10, 2025
Migrate c10/macros/cmake_macros.h.in to torch/headeronly
#158035 opened Jul 10, 2025
Support DeepSeek-style blockwise scaling scaled-mm for fp8 on Hopper+
#158037 opened Jul 10, 2025
Tag CPython test files with the commit or tag they were copied from.
#158038 opened Jul 10, 2025
[1/N] support of replication fallback strategy
#158046 opened Jul 10, 2025
Move AOTI static linkage header file generation to cpp_wrapper_cpu
#158047 opened Jul 10, 2025
Still run TritonBundler with BundledAOTAutogradCache, save autotune results
#158048 opened Jul 10, 2025
Enable tracing `LOAD_BUILD_CLASS` on CPython tests
#158049 opened Jul 10, 2025
Documentation Fix: torch.empty_like memory preservation
#158050 opened Jul 10, 2025
[DTensor] have split_strategy return OpStrategy instead of TupleStrategy
#158051 opened Jul 10, 2025
[cutlass backend][BE] remove force disable cache in tests
#158053 opened Jul 10, 2025
[docs, dynamo] add fullgraph=True, common graph breaks docs
#158055 opened Jul 10, 2025
[dict] Support `dict.update()` with no args
#158061 opened Jul 10, 2025
[simple_fsdp][inductor_collectives] rewrite reorder_collectives, sink_waits_iterative
#158062 opened Jul 10, 2025
[WIP] Rewrite reorder collectives/sink_wait to preserve peak memory
#158063 opened Jul 10, 2025
adding types to nn module init
#158065 opened Jul 10, 2025
[hop] call mode.__dispatch__ when no mode reigstered for the hop
#158067 opened Jul 10, 2025
c10d inductor tests: do not re initialize dist process group
#158068 opened Jul 10, 2025
Add torch compile force disable caches alias
#158072 opened Jul 10, 2025
[benchmarks] Add scalar loss as model output when training
#158074 opened Jul 10, 2025
Grab bag of (mostly) typing improvements
#158075 opened Jul 10, 2025
[hop] add supports_higher_order_operators flag to TorchDispatchMode
#158077 opened Jul 10, 2025
Remove references to TorchScript in PyTorch docs.
#158079 opened Jul 10, 2025
updated test cases to use MultithreadTestCase
#158082 opened Jul 11, 2025
[pickle] Add polyfills for pickle
#158084 opened Jul 11, 2025
Refactor and Improve the OpenReg Module
#158090 opened Jul 11, 2025
[inductor] add template hashing for template lookup table
#158091 opened Jul 11, 2025
[simplefsdp auto-bucketing] add ir node bucket helper function
#158097 opened Jul 11, 2025
[simplefsdp auto-bucketing] add ir node reorder helper function
#158098 opened Jul 11, 2025
cache sympy.expand in sizevars simplify
#158099 opened Jul 11, 2025
try cache_on_self on ir.py Layout class __str__ method
#158100 opened Jul 11, 2025
[DO NOT MERGE] Stress test #1 new capacity.
#158102 opened Jul 11, 2025
[DO NOT MERGE] Stress test #2 new capacity.
#158103 opened Jul 11, 2025
[build] pin `setuptools>=77` to enable PEP 639
#158104 opened Jul 11, 2025
[BE][Easy] split build system `requirements.txt` to a separate file
#158111 opened Jul 11, 2025
[DTensor][BE] imporve DTensor ops correctness check utils
#158112 opened Jul 11, 2025
Fix AArch64 segfaults by disabling strict-aliasing in GridSamplerKernel for GCC 12 and above
#158117 opened Jul 11, 2025
[inductor_collectives] brute force preserve peak memory for sink_waits
#158119 opened Jul 11, 2025
Clarify that Store.check() was added in PyTorch v2.8
#158124 opened Jul 11, 2025
Add sm_70 to windows 12.9 build
#158126 opened Jul 11, 2025
[Optimus][fp8_activation_quantization] Only log when there's some node to be quantized
#158129 opened Jul 11, 2025
[DTensor] Assert DTensorSpec has valid placements
#158133 opened Jul 11, 2025
[Inductor] addmm + activation function fusion
#158137 opened Jul 11, 2025
Standalone compile API in _Exporter
#158139 opened Jul 11, 2025
[testing only] Layout optimization on
#158141 opened Jul 11, 2025
[testing only] Layout optimization off
#158142 opened Jul 11, 2025
Fix grouped MM output strides when compiled but not max-autotuned
#158143 opened Jul 11, 2025
Use a fixed size of a buffer in ShufflerIterDataPipe to not use append() and len()
#158144 opened Jul 11, 2025
[DO NOT MERGE] Add volta tests to periodic and pull
#158145 opened Jul 11, 2025
dist2: add support for passing custom configs directly to PG
#158147 opened Jul 11, 2025
[DRAFT][DELETE] Y do we have 7 build systems
#158148 opened Jul 11, 2025
Avoid AOTAutogradCache.load in stack trace on cache miss path
#158149 opened Jul 11, 2025
Inline dispatch_and_compile into its call site.
#158150 opened Jul 11, 2025
Modify c10::complex for CUDA 12.9 Win OOM
#158151 opened Jul 11, 2025
[CI] Update mobile build docker image
#158153 opened Jul 11, 2025
For discussion
#158156 opened Jul 11, 2025
[WIP] _convert_element_type_meta test what fails
#158157 opened Jul 11, 2025
[cutlass backend] cache a few things for codegen and properties
#158158 opened Jul 11, 2025
Fix torchrec multiprocess tests
#158159 opened Jul 11, 2025
Add transpose to torch/csrc/stable
#158160 opened Jul 11, 2025
[CI] Move main branch rocm binary builds to its own workflow
#158161 opened Jul 11, 2025
[CI] Do not run inductor rocm on ciflow/inductor
#158162 opened Jul 11, 2025
[CI][TD] Enable TD on all test configs
#158163 opened Jul 11, 2025
[ROCm] Fix tensor.item() for ROCm
#158165 opened Jul 11, 2025
[SymmMem] Fix NCCL Hang in NVSHMEM Triton Wait Until Test
#158167 opened Jul 11, 2025
[scan][cse] avoid cse zeros like gradient buffers
#158168 opened Jul 12, 2025
add eq function to NodeSource
#158170 opened Jul 12, 2025
[dynamo] turn off sys.monitoring if eval_frame is set
#158171 opened Jul 12, 2025
Pipeline _create_aot_dispatcher_function
#158173 opened Jul 12, 2025
Add inputs and outputs in Triton Kernel FX Graph segment
#158174 opened Jul 12, 2025
Hoist choose_dispatcher to top level, remove unnecessary returns
#158176 opened Jul 12, 2025
Update pr_time_benchmarks/expected_results.csv
#158177 opened Jul 12, 2025
[BE] Move repeated code into helper functions
#158178 opened Jul 12, 2025
[MPS] Extend atomic operations to more int types
#158179 opened Jul 12, 2025
don't error out in empty_cache under mempool context
#158180 opened Jul 12, 2025
Reproduce issue from #156097
#158181 opened Jul 12, 2025
[WIP][Inductor] Add cpu_max_other_dimension_decomposition for decompose_mm_pass
#158183 opened Jul 12, 2025
Fix compilation and "import torch" issues for cpython 3.14
#158184 opened Jul 12, 2025
Remove unnecessary CMake checks
#158185 opened Jul 12, 2025

163 Issues closed by 39 people

Inconsistent behavior between eager and compiled mode for `F.conv_transpose2d`
#157909 closed Jul 12, 2025
Question: Is it expected that `QuantStub` and `DeQuantStub` are skipped in `torch.compile`?
#157998 closed Jul 12, 2025
Discrepancy between Numpy and PyTorch advanced indexing
#158134 closed Jul 12, 2025
torch.compile with numpy code differs from numpy's behavior
#157569 closed Jul 12, 2025
dynamo+numpy 2.0 issue: TypeError: 'numpy.bool' object cannot be interpreted as an integer
#157973 closed Jul 12, 2025
Slicing of large tensors is wrong on MPS
#153560 closed Jul 11, 2025
torch.compile produces incorrect output
#155690 closed Jul 11, 2025
RuntimeError: Placeholder storage has not been allocated on MPS device! - when calling torch.export.export on mps
#158121 closed Jul 11, 2025
Bug in cmake/public/cuda.cmake: Incorrect use of set(${...}) leads to missing CUDA version in error message
#157354 closed Jul 11, 2025
[inductor] TorchInductor does not correctly recognize the grad status of model code
#125474 closed Jul 11, 2025
DISABLED test_op_has_batch_rule_linalg_vecdot_cuda_float32 (__main__.TestVmapOperatorsOpInfoCUDA)
#142924 closed Jul 11, 2025
DISABLED test_forward_backward (__main__.CudaGraphTreeTests)
#156957 closed Jul 11, 2025
DISABLED test_sdpa_with_packed_in_proj_cuda_float32 (__main__.TestNestedTensorSubclassCUDA)
#120029 closed Jul 11, 2025
DISABLED test_invalid_status_for_legacy_api (__main__.TestCuda)
#157110 closed Jul 11, 2025
DISABLED test_remove_noop_view_default_cpu (__main__.CpuTests)
#151512 closed Jul 11, 2025
DISABLED test_foreach_reduce_large_input__foreach_max_w_empty_False_cuda_bfloat16 (__main__.TestForeachCUDA)
#150932 closed Jul 11, 2025
Disable bot spamming
#117132 closed Jul 11, 2025
Optimizers should use learning rates passed as tensors directly
#106802 closed Jul 11, 2025
Support TorchBind x PT2
#121266 closed Jul 11, 2025
38 Dynamo test are failing with "BuiltinVariable.tensor_args() got multiple values for argument 'self'".
#120643 closed Jul 11, 2025
Track the accurate check regress for DebertaForQuestionAnswering and nanogpt
#122987 closed Jul 11, 2025
Tracking issue for PT2 dashboard training passrate
#120129 closed Jul 11, 2025
[Dynamo] Prints, logging, and warnings
#93739 closed Jul 11, 2025
torch.compile fails with torch._dynamo.exc.TorchRuntimeError on a function that contains a torch script module
#97784 closed Jul 11, 2025
torch.compile slower than eager on simple MLP
#119611 closed Jul 11, 2025
Compilation Failure of torch.special.exp2 in torch.compile Optimized Mode
#112495 closed Jul 11, 2025
Compilation Failure of torch.cumsum in torch.compile Optimized Mode
#112492 closed Jul 11, 2025
Compile targts cuda:0 rather than the device the model is on
#97693 closed Jul 11, 2025
explain() has confusing explanation of graph breaks
#93656 closed Jul 11, 2025
[inductor] `aten.index_put_` runtime shape mismatch on H100 but not on A100
#126614 closed Jul 11, 2025
AOT inductor should generate source code instead of a library
#115965 closed Jul 11, 2025
DISABLED test_op_has_batch_rule_inner_cuda_float32 (__main__.TestVmapOperatorsOpInfoCUDA)
#157090 closed Jul 11, 2025
DISABLED test_fallback_to_eager_if_recompiling_too_many_times_warn_only_once (__main__.CudaGraphTreeTests)
#156954 closed Jul 11, 2025
Massive initial memory overhead GPU
#12873 closed Jul 11, 2025
Error in Qwen inference: NVML_SUCCESS == r INTERNAL ASSERT FAILED at "/pytorch/c10/cuda/CUDACachingAllocator.cpp
#157535 closed Jul 11, 2025
DISABLED test_sdpfa_xpu.(__main__.TestUnbackedSymintsXPU)
#158095 closed Jul 11, 2025
DISABLED test_foreach_check_stride_ignore_dims_of_one_cuda_float32 (__main__.TestForeachCUDA)
#150026 closed Jul 11, 2025
DISABLED test_dynamic_scalar_cuda (__main__.AOTInductorTestABICompatibleGpu)
#156982 closed Jul 11, 2025
DISABLED test_hessian_vectorize_correctness_multi_input_cuda (__main__.TestHessianCUDA)
#157059 closed Jul 11, 2025
DISABLED test_op_has_batch_rule_dot_cuda_float32 (__main__.TestVmapOperatorsOpInfoCUDA)
#157069 closed Jul 11, 2025
DISABLED test_corrcoef_cuda_float32 (__main__.TestTorchDeviceTypeCUDA)
#156987 closed Jul 11, 2025
DISABLED test_inplace_on_view_non_contig_cpu (__main__.TestAutogradDeviceTypeCPU)
#156265 closed Jul 11, 2025
DISABLED test_inplace_on_view_backprop_base_cpu (__main__.TestAutogradDeviceTypeCPU)
#156143 closed Jul 11, 2025
DISABLED test_inplace_on_view_backprop_view_cpu (__main__.TestAutogradDeviceTypeCPU)
#156163 closed Jul 11, 2025
DISABLED test_op_has_batch_rule_addmv_cuda_float32 (__main__.TestVmapOperatorsOpInfoCUDA)
#157037 closed Jul 11, 2025
DISABLED test_inplace_on_view_undefined_grad_output_cpu (__main__.TestAutogradDeviceTypeCPU)
#156363 closed Jul 11, 2025
DISABLED test_against_reference_multi_input_multi_output_jacfwd_cuda (__main__.TestJacCUDA)
#157036 closed Jul 11, 2025
DISABLED test_fallback_to_eager_if_recompiling_too_many_times_due_to_cudagraph_managed_tensor (__main__.CudaGraphTreeTests)
#156922 closed Jul 11, 2025
DISABLED test_inplace_on_view_then_no_grad_cpu (__main__.TestAutogradDeviceTypeCPU)
#156306 closed Jul 11, 2025
DISABLED test_inplace_on_view_of_view_cpu (__main__.TestAutogradDeviceTypeCPU)
#156289 closed Jul 11, 2025
DISABLED test_inplace_on_view_backprop_view_of_view_cpu (__main__.TestAutogradDeviceTypeCPU)
#156180 closed Jul 11, 2025
DISABLED test_foreach_reduce_large_input__foreach_max_w_empty_False_cuda_int8 (__main__.TestForeachCUDA)
#156960 closed Jul 11, 2025
miss doc for torch.segment_reduce
#153138 closed Jul 11, 2025
torch.compile with flex_attention: 'ShapeAsConstantBuffer' object has no attribute 'dtype'
#157833 closed Jul 11, 2025
test_generate_tensor_from_list_of_numpy_primitive_type fails if run under pytest
#103439 closed Jul 11, 2025
tensor.to_sparse() handling indices incorrectly under dynamo/fake tensor
#93493 closed Jul 11, 2025
[inductor] [dynamic shape] 5 HF models fails with `Constraints violated` using transformers v4.31.0
#107200 closed Jul 11, 2025
ShapeEnv.produce_guards uses a lot of (CPU) memory
#118222 closed Jul 11, 2025
export does not support boolean tensor indexing
#91990 closed Jul 11, 2025
Dynamo can not trace 'int(a_scalar_tensor.item())'
#93515 closed Jul 11, 2025
[PT2.0] [.Compile] [Dynamic] Pytorch FX/JIT graph's inputs/nodes ordering is changed when FX recompiles even though the graph operations are same
#108745 closed Jul 11, 2025
_assert_bound_is_rational can fail
#109514 closed Jul 11, 2025
dlrm and hf_T5_generate fails aot_eager with bfloat16+dynamic_shapes
#103760 closed Jul 11, 2025
Attempt to use minifier on sam model fails
#104301 closed Jul 11, 2025
detectron2_fcos_r_50_fpn and other models have enough graph breaks that we end up with multiple cache entries on module blocks
#103672 closed Jul 11, 2025
mark_dynamic may error too aggressively
#102814 closed Jul 11, 2025
[inductor] test_fft_real_inputs fails with dynamic shapes
#103194 closed Jul 11, 2025
tensor with dims marked with torch._dynamo.mark_dynamic loses dynamic dim marks after being moved to a different device
#100220 closed Jul 11, 2025
Dynamic shapes exhaustive tests should fail (not xfail) if data mismatch
#87576 closed Jul 11, 2025
SymIntType gets translated to int when going through pybind
#91753 closed Jul 11, 2025
Functionalization does something wrong with pad backward when it uses as_strided
#87575 closed Jul 11, 2025
Symbolic tensors are not printable
#82517 closed Jul 11, 2025
test_make_fx_symbolic_exhaustive should pass dynamic ints for shape arguments
#82318 closed Jul 11, 2025
[Inductor] AOTInductorTestABICompatibleGpu.test_on_gpu_device1_cuda fails
#157737 closed Jul 11, 2025
torch.logdet produces incorrect results for singular matrices on CUDA vs CPU
#154312 closed Jul 11, 2025
[Inductor] StableDiffusion unet with `cudagraphs` backend raises fake tensor mismatch error
#114525 closed Jul 10, 2025
Custom backend not called for compiling backward graph
#114189 closed Jul 10, 2025
[inductor] Assert that Inductor preserves output strides if `TORCHINDUCTOR_LAYOUT_OPTIMIZATION=0`
#114070 closed Jul 10, 2025
[inductor] [silent incorrectness] Multiple internal `torch.rand` can lead to inconsistent results with eager
#151524 closed Jul 10, 2025
In the torch.Tensor.scatter_ documentation, self, index, and src (if it is a tensor) should have the same number of dimensions, but in practice, the CPU、Gpu does not add a check. Validation needs to be added.
#157419 closed Jul 10, 2025
The torch.gather documentation states that input and index must have the same number of dimensions. However, no corresponding validation is added.
#157425 closed Jul 10, 2025
Conjugate bit not handled properly in wrapped subclasses
#130646 closed Jul 10, 2025
Enable an MPS benchmark
#115201 closed Jul 10, 2025
Improve cuda OOM message: include by default more useful aggregate memory usage statistics, measures of memory fragmentation, semantic categories (like activations / weights / scratch workspaces), allocator states
#32101 closed Jul 10, 2025
DISABLED test_circular_dependencies (__main__.TestImports)
#110040 closed Jul 10, 2025
TransformerEncoderLayer precision loss when fast path is enabled
#158012 closed Jul 10, 2025
DISABLED test_mps_event_module (__main__.TestMPS)
#145052 closed Jul 10, 2025
[export] run_decompositions generates inefficient operations
#157289 closed Jul 10, 2025
`FractionalMaxPool3d` INTERNAL ASSERT FAILED when computing `jacrev`
#96316 closed Jul 10, 2025
CONTRIBUTING.md install command incorrect
#157680 closed Jul 10, 2025
FFT regression caused by MKL upgrading: MKL FFT error: Intel oneMKL DFTI ERROR: Inconsistent configuration parameters
#154477 closed Jul 10, 2025
Dynamo guard source not implemented due to int specialization
#157992 closed Jul 10, 2025
DISABLED test_foreach_l2_large_value_input__foreach_norm_cuda_float16 (__main__.TestForeachCUDA)
#150509 closed Jul 10, 2025
Wrong error message for wrong dtypes in `torch.binomial`
#157195 closed Jul 10, 2025
Calling unbind on 2D NestedTensor throws RuntimeError
#157404 closed Jul 10, 2025
XPU build failure with DLE 2025.1.0
#150047 closed Jul 9, 2025
UNSTABLE rocm-mi300 / linux-jammy-rocm-py3.10-mi300 / test (default)
#156360 closed Jul 9, 2025
ResNet Onnx export dynamic batch size exported as fixed batch size
#157621 closed Jul 9, 2025
`FSDPModule.set_reduce_scatter_divide_factor` on subset of parameters is broken?
#157485 closed Jul 9, 2025
MPSInductor (aka torch.compile for Apple GPUs)
#157957 closed Jul 9, 2025
Triton pin update for PyTorch 2.8 / Triton 3.4
#154206 closed Jul 9, 2025
High-performance LLM quantization on X86 CPU with native PyTorch
#155435 closed Jul 9, 2025
[RFC][API-Unstable]A16W4 on XPU Device
#153019 closed Jul 9, 2025
[RFC][API-Unstable] Support 3rd party SYCL kernels with CPP Extension API
#153265 closed Jul 9, 2025
test_gradient_all Device Type test regression with Numpy >= 2.0.0
#132450 closed Jul 9, 2025
`setup.py develop` command is disappearing soon from `setuptools`
#152276 closed Jul 9, 2025
DISABLED test_foreach_l2_large_value_input__foreach_norm_cuda_bfloat16 (__main__.TestForeachCUDA)
#150467 closed Jul 9, 2025
DISABLED test_graph_partition_reorder_custom_op_with_no_dependency1 (__main__.CudaGraphTreeTests)
#157900 closed Jul 9, 2025
Add RMS Norm layer
#128713 closed Jul 9, 2025
[Intel GPU] The failures do not block new pull request merge if the failures are also in the main branch for Intel GPU
#137701 closed Jul 9, 2025
[inductor] [cpu] cpu inductor incorrectly processes `.to(torch.uint8)`, resulting in numerical inconsistency
#156788 closed Jul 9, 2025
Inductor throws UnicodeDecodeError when compiling a simple model on Windows with MSVC
#157673 closed Jul 9, 2025
[autograd] Slowdown in backward after #151079
#157407 closed Jul 9, 2025
return type of torch.nn.functional.interpolate not working
#129053 closed Jul 9, 2025
Build error: unrecognizable insn with using gcc-14 on aarch64
#157842 closed Jul 9, 2025
DISABLED test_comprehensive_logcumsumexp_xpu_float16 (__main__.TestInductorOpInfoXPU)
#157697 closed Jul 9, 2025
aten._cdist_backward
#105561 closed Jul 9, 2025
aten.multilabel_margin_loss_backward
#105562 closed Jul 9, 2025
Reflection_pad1d
#105566 closed Jul 9, 2025
Move Inductor-specific decompositions to general decomposition registrations.
#105568 closed Jul 9, 2025
Lowering topk to reductions and pointwise when k is small
#105569 closed Jul 9, 2025
Using scans
#105570 closed Jul 9, 2025
Add color-coding to fx graph readable printouts :)
#105572 closed Jul 9, 2025
TorchInductor Hack-a-Day on July 19th
#105328 closed Jul 9, 2025
How to compose HSDP with CP?
#157393 closed Jul 9, 2025
[aarch64] Inductor benchmark - fail with IDEEP update
#157785 closed Jul 8, 2025
torch.ops._c10d_functional_autograd.all_to_all_single missing dynamic shapes support
#157479 closed Jul 8, 2025
PyTorch fails to detect AVX through it's detected
#157538 closed Jul 8, 2025
extern declaration of the entity XXX is treated as a static definition
#157674 closed Jul 8, 2025
PyTorch wheel binary size increase ~80mb
#150647 closed Jul 8, 2025
DISABLED test_progressive (__main__.TestSubprocess)
#157787 closed Jul 8, 2025
AOTI: Failure in compile_fx.py with FakeScriptObject (with possible fix)
#157401 closed Jul 8, 2025
MPS internal assertion with jacfwd and concatenation
#152701 closed Jul 8, 2025
The opp is not compatible with compile mode="reduce-overhead" and linear layers for large inputs.
#157363 closed Jul 8, 2025
UNSTABLE trunk / macos-py3-arm64 / test (mps)
#156833 closed Jul 8, 2025
DISABLED test_graph_partition_reorder_cpu_and_gpu (__main__.CudaGraphTreeTests)
#157760 closed Jul 8, 2025
Win32 Build crashes on startup (C++).
#146240 closed Jul 8, 2025
DISABLED test_non_contiguous_input_mm_plus_mm (__main__.TestMaxAutotune)
#126867 closed Jul 8, 2025
DISABLED test_add_complex_conj (__main__.ReproTests)
#156579 closed Jul 8, 2025
[Segfault Bug] Out-of-bounds write of at::native::cpubla::gemm (bfloat16) in at::native::cpu_flash_attention_backward
#156022 closed Jul 8, 2025
Vmap error raised by mask_mod of FlexAttention
#157543 closed Jul 8, 2025
DISABLED test_graph_partition_forward_with_skipped_cudagraphed_backward (__main__.CudaGraphTreeTests)
#157722 closed Jul 8, 2025
Cannot compile a block that contains Flex attention without graph breaks
#143163 closed Jul 8, 2025
[FlexAttention] Support the number of shared query heads in GQA to not be the power of 2
#143117 closed Jul 8, 2025
FlexAttention gives me an INTERNAL_ASSERT_FAILED during mask_mod
#140363 closed Jul 8, 2025
Dynamo's einops version check is bogus
#157451 closed Jul 7, 2025
Compiling the pytorch project produces unnecessary integer comparison warnings, which can lead to a poor user experience.
#157701 closed Jul 7, 2025
test_tensor_with_grad_to_scalar_warning failure
#157252 closed Jul 7, 2025
Cannot create a mask for each sequence in a batch with Flex Attention
#157675 closed Jul 7, 2025
DISABLED test_foreach_reduce_large_input__foreach_max_w_empty_False_cuda_int64 (__main__.TestForeachCUDA)
#156612 closed Jul 7, 2025
[RFC] Integrate NCCL scalable init API
#136539 closed Jul 7, 2025
[FSDP2] set_reduce_scatter_divide_factor errors with non-trivial MixedPrecisionPolicy
#155223 closed Jul 7, 2025
[CI] s390x-periodic tests broken with "No matching distribution found for cuda-bindings<13.0,>=12.0"
#157409 closed Jul 7, 2025
pytorch
#157531 closed Jul 7, 2025
DISABLED test_tracker_with_activation_checkpointing (__main__.TestTrackerFullyShard1DTrainingCompose)
#139814 closed Jul 7, 2025
DISABLED test_tracker_non_root_forward_backward (__main__.TestTrackerFullyShard1DTrainingCore)
#129692 closed Jul 7, 2025
DISABLED test_aoti (__main__.TestMemoryPlanning)
#145211 closed Jul 7, 2025
DISABLED test_graph_partition_forward_backward_not_called (__main__.CudaGraphTreeTests)
#157642 closed Jul 7, 2025
Will the Metal4 update bring significant optimizations for future pytorch mps performance and compatibility?
#157660 closed Jul 6, 2025
`torch.compile` fails with `UnicodeDecodeError` when model contains extreme value injection
#156451 closed Jul 6, 2025
torch.utils.cpp_extension fails to parse clang version 20.1.7+libcxx
#157665 closed Jul 6, 2025
Mispelled "paramter" in test_fully_shard_training.py
#157564 closed Jul 5, 2025
torch.nonzero(t, as_tuple=...) does not work with the JIT because the as_tuple signatures are not exposed properly
#45499 closed Jul 5, 2025

119 Issues opened by 66 people

Flash Attention 2 + Dynamo + FSDP accelerate plugin + torch.compile error
#158186 opened Jul 12, 2025
UNSTABLE rocm-mi300 / linux-noble-rocm-py3.12-mi300 / test (default)
#158182 opened Jul 12, 2025
`torch.fmin` has inconsistent overflow behavior on CPU and GPU
#158172 opened Jul 12, 2025
Invalid `torch.linalg.lstsq` check during vmapped implementation
#158169 opened Jul 12, 2025
`torch.nn.LPPool2d` throws inconsistent error on CPU and GPU
#158166 opened Jul 11, 2025
[dynamo] Incompatibility with sys.monitoring
#158164 opened Jul 11, 2025
`torch.native_channel_shuffle` throws FPE when `groups` > size of the second dimension of the input tensor
#158154 opened Jul 11, 2025
DISABLED test_one_shot_all_reduce (__main__.SymmMemCollectiveTest)
#158138 opened Jul 11, 2025
DISABLED test_cuda_nvlink_connectivity_detection (__main__.SymmetricMemoryTest)
#158136 opened Jul 11, 2025
DISABLED test_allow_overlapping_devices (__main__.SymmetricMemoryTest)
#158135 opened Jul 11, 2025
Eager success but inductor failed on torch.ops.aten._cudnn_rnn.default
#158131 opened Jul 11, 2025
A "Merge-when-ready" label when ci finishes?
#158127 opened Jul 11, 2025
index.Tensor doesn't properly account for boolean masks acting on multiple dimensions.
#158125 opened Jul 11, 2025
[RFC] A Distributed CUDA Unified Memory Backend for PyTorch
#158122 opened Jul 11, 2025
AttributeError: partially initialized module 'torch._dynamo' has no attribute 'disable' (most likely due to a circular import)
#158120 opened Jul 11, 2025
DISABLED test_gru (__main__.TestXNNPACKQuantizer)
#158116 opened Jul 11, 2025
DISABLED test_conv_transpose_unary_fusion_ops (__main__.TestMkldnnFusion)
#158115 opened Jul 11, 2025
DISABLED test_triton_signal_wait_until (__main__.NVSHMEMTritonTest)
#158114 opened Jul 11, 2025
Combination of USE_MEM_EFF_ATTENTION and AOTRITON_INSTALLED_PREFIX misbehaves
#158109 opened Jul 11, 2025
DISABLED test_vmap_exhaustive_nn_functional_conv_transpose3d_cuda_float32 (__main__.TestVmapOperatorsOpInfoCUDA)
#158108 opened Jul 11, 2025
DISABLED test_pool_method_subprocess (__main__.TestAsyncCompile)
#158107 opened Jul 11, 2025
DISABLED test_triton_quiet (__main__.NVSHMEMTritonTest)
#158106 opened Jul 11, 2025
Exporting ONNX Model with Operator Name as Output Name Produces Invalid Model When dynamo=True
#158094 opened Jul 11, 2025
kvstore Documentation Inconsistency: check() Method Claimed in v2.7 Docs But Missing in Implementation
#158093 opened Jul 11, 2025
`torch.compile` doesn't properly raise eager fake tensor exception
#158088 opened Jul 11, 2025
`torch.compile` ignores nan check in `int(torch.tensor(torch.nan))`
#158087 opened Jul 11, 2025
`torch.compile` on `.sum() `and `.item()` calls errors from `tensorify_python_scalars`
#158083 opened Jul 11, 2025
`torch.compile` errors with inductor raised `GuardOnDataDependentSymNode` exception
#158081 opened Jul 11, 2025
loaded storage won't be cached during deserialization if location is meta
#158080 opened Jul 10, 2025
torch.compile on BFloat16 Segment Anything segfaults in cpp_CppMicroGemmRef_micro_gemm<false, false> on Mac
#158076 opened Jul 10, 2025
Mutating a tensor while serializing with safetensors crashes free-threaded PyTorch
#158071 opened Jul 10, 2025
InductorError: CppCompileError: C++ compile error on a function with a single `item` call
#158060 opened Jul 10, 2025
`torch.compile` on `torch.arange` hard errors with `PendingUnbackedSymbolNotFound`
#158058 opened Jul 10, 2025
DISABLED test_generalized_sq_cases (__main__.TestSolve)
#158054 opened Jul 10, 2025
Add GoLU Activation Function 🚀
#158043 opened Jul 10, 2025
[inductor] grouped_mm is autotuning under torch.compile default mode
#158042 opened Jul 10, 2025
pow_test fails on Aarch64
#158041 opened Jul 10, 2025
extension-cpp claims ROCM support with no additional changes; we should test this.
#158032 opened Jul 10, 2025
NCCL + cudagraphs + expandable segments result in IMA
#158029 opened Jul 10, 2025
DISABLED test_vmap_exhaustive_nn_functional_conv_transpose2d_cuda_float32 (__main__.TestVmapOperatorsOpInfoCUDA)
#158025 opened Jul 10, 2025
DISABLED test_pool_method_spawn (__main__.TestAsyncCompile)
#158024 opened Jul 10, 2025
DISABLED test_triton_put_signal_set (__main__.NVSHMEMTritonTest)
#158023 opened Jul 10, 2025
empty_like(tensor, memory_format=torch.preserve_format) does not preserve strides for views
#158022 opened Jul 10, 2025
[2.8 regression] CUDAAllocator has BC-breaking changes
#158021 opened Jul 10, 2025
dist.all_to_all_single, when input tensors of different shapes result in undefined output behavior
#158016 opened Jul 10, 2025
DISABLED test_vmap_exhaustive_nn_functional_conv_transpose1d_cuda_float32 (__main__.TestVmapOperatorsOpInfoCUDA)
#158010 opened Jul 10, 2025
DISABLED test_pool_method_fork (__main__.TestAsyncCompile)
#158009 opened Jul 10, 2025
DISABLED test_triton_put_signal_add (__main__.NVSHMEMTritonTest)
#158008 opened Jul 10, 2025
`torch.compile` cannot compile a model with a basic `LSTM`, even on latest main
#158007 opened Jul 10, 2025
BUG: numpy very slow after import torch
#158005 opened Jul 10, 2025
[inuctor] [triton] `torch.cumsum` outputs inconsistent results when meeting large tensors
#158003 opened Jul 10, 2025
[inductor] performance of torchbench alexnet has regression compare to 2.7 due to inductor changes a triton kernel
#158000 opened Jul 10, 2025
[XPU] torch.xpu.mem_get_info() query failed on BMG
#157989 opened Jul 10, 2025
DISABLED test_vmap_exhaustive_nn_functional_conv2d_strided_padding_dilation_with_bias_cuda_float32 (__main__.TestVmapOperatorsOpInfoCUDA)
#157987 opened Jul 10, 2025
DISABLED test_triton_put (__main__.NVSHMEMTritonTest)
#157986 opened Jul 10, 2025
support DTensor's local_map in compile
#157976 opened Jul 9, 2025
[Inductor] Online softmax disabled due to reduction split – Unexpected performance warning
#157975 opened Jul 9, 2025
Torch compile don't work correctly with divide by scalar
#157959 opened Jul 9, 2025
[RFC]: PyTorch Low-Precision GEMMs Public API
#157950 opened Jul 9, 2025
Dim argument of torch.max can currently be only int or name, but documentation says it can be int or tuple
#157948 opened Jul 9, 2025
torch.compile fails with InternalTorchDynamoError when slicing torch.linalg.svd results
#157945 opened Jul 9, 2025
torch.compile() disables torch.distributions parameter validation globally
#157926 opened Jul 9, 2025
TypeError: message must be a callable when calling grouped_mm with incompatible batch size for offsets
#157922 opened Jul 9, 2025
BC-breaking change to symint range constraints from 2.7 -> 2.8
#157921 opened Jul 9, 2025
tighten pt2 FX graph invariants
#157919 opened Jul 9, 2025
[Precompile] Umbrella task
#157918 opened Jul 9, 2025
tlparse symint guard user provenance slightly off
#157915 opened Jul 9, 2025
DISABLED test_qlinear_add_int8_mixed_bf16_use_relu_False_is_qat_False_is_dynamic_True (__main__.TestPatternMatcher)
#157911 opened Jul 9, 2025
DISABLED test_graph_partition_reorder_custom_op_with_no_dependency1 (__main__.CudaGraphTreeTests)
#157901 opened Jul 9, 2025
DISABLED test_vmap_exhaustive_nn_functional_conv2d_strided_padding_dilation_no_bias_cuda_float32 (__main__.TestVmapOperatorsOpInfoCUDA)
#157899 opened Jul 9, 2025
DISABLED test_triton_get_ring (__main__.NVSHMEMTritonTest)
#157898 opened Jul 9, 2025
DISABLED test_reduce_scatter_float8 (__main__.ProcessGroupNCCLOpTest)
#157897 opened Jul 9, 2025
DISABLED test_nccl_watchdog_cudagraph (__main__.ProcessGroupNCCLOpTest)
#157896 opened Jul 9, 2025
DISABLED test_2d_reductions_mixed_indexing_reduction_op0_cpu (__main__.TritonBlockPointerTestCPU)
#157895 opened Jul 9, 2025
Partitioner: Fix to align partition node order with original graph
#157891 opened Jul 9, 2025
Way to customize HOPs overrided arguments selection
#157888 opened Jul 9, 2025
DISABLED test_vmap_exhaustive_nn_functional_conv2d_cuda_float32 (__main__.TestVmapOperatorsOpInfoCUDA)
#157880 opened Jul 9, 2025
DISABLED test_allreduce_float8 (__main__.ProcessGroupNCCLOpTest)
#157879 opened Jul 9, 2025
DISABLED test_triton_template_generated_code_caching_mm_plus_mm (__main__.TestMaxAutotune)
#157878 opened Jul 9, 2025
DISABLED test_triton_get (__main__.NVSHMEMTritonTest)
#157877 opened Jul 9, 2025
DISABLED test_int64_index_intermediate (__main__.CudaReproTests)
#157872 opened Jul 9, 2025
DISABLED test_graph_partition_reorder_custom_op_with_no_dependency (__main__.CudaGraphTreeTests)
#157871 opened Jul 9, 2025
Exporting tensor.to("cuda") under FakeTensorMode doesn't work on a CPU-only machine
#157869 opened Jul 8, 2025
Spurious "Grad strides do not match bucket view strides" warning for 1x1 convolution
#157862 opened Jul 8, 2025
CUDA ec2 runner with no cuda runtime
#157844 opened Jul 8, 2025
[FSDP2] should pass args as is instead of creating new ones
#157832 opened Jul 8, 2025
sparse_csr bfloat16 matrix multiplication backward is 10x slower than float16
#157808 opened Jul 8, 2025
[RFC] Replace setuptools build backend with scikit-build-core
#157807 opened Jul 8, 2025
Consider changing AOTAutograd cache to hit on graphs with different input and node names
#157792 opened Jul 8, 2025
DISABLED test_vmap_exhaustive_matmul_cuda_float32 (__main__.TestVmapOperatorsOpInfoCUDA)
#157790 opened Jul 8, 2025
DISABLED test_triton_fence (__main__.NVSHMEMTritonTest)
#157789 opened Jul 8, 2025
DISABLED test_progressive (__main__.TestSubprocess)
#157788 opened Jul 8, 2025
torch.distributed.checkpoint.state_dict.set_state_dict stucks with StateDictOptions(full_state_dict=True)
#157781 opened Jul 8, 2025
test_optimizer_non_static_param got failed on Intel GPU
#157778 opened Jul 8, 2025
Does FSDP works with DDP ?
#157917 opened Jul 8, 2025
Support for _Float16/C++23 std::float16_t
#157776 opened Jul 8, 2025
Fatal on torch.xpu
#157775 opened Jul 8, 2025
Drop SSE4 support in oneDNN
#157764 opened Jul 8, 2025
Nightly C++ docs build timeout in CI after 4 hours
#157763 opened Jul 8, 2025
DISABLED test_vmap_exhaustive_linalg_vecdot_cuda_float32 (__main__.TestVmapOperatorsOpInfoCUDA)
#157762 opened Jul 8, 2025
DISABLED test_graph_partition_reorder_cpu_and_gpu (__main__.CudaGraphTreeTests)
#157761 opened Jul 8, 2025
[dynamo] Missing meta kernel for `aten::quantize_per_tensor.tensor_qparams`
#157729 opened Jul 7, 2025
fill_ overflows on uint64 in range [2**63, 2**64) when profiler is engaged
#157728 opened Jul 7, 2025
DISABLED test_vmap_exhaustive_inner_cuda_float32 (__main__.TestVmapOperatorsOpInfoCUDA)
#157726 opened Jul 7, 2025
DISABLED test_resnet (__main__.TestBlockStateAbsorption)
#157725 opened Jul 7, 2025
DISABLED test_progressive (__main__.GPUTests)
#157724 opened Jul 7, 2025
DISABLED test_graph_partition_forward_with_skipped_cudagraphed_backward (__main__.CudaGraphTreeTests)
#157723 opened Jul 7, 2025
[Precompile] Caching precompile and presence of packages in general should be safe re: guard serialization
#157721 opened Jul 7, 2025
[dynamo] `torch.compile` errors on numpy `astype("O")`
#157720 opened Jul 7, 2025
AssertHandler::printMessage On Intel GPU
#157714 opened Jul 7, 2025
/lib/x86_64-linux-gnu/libstdc++.so.6: version `GLIBCXX_3.4.30' not found when IMPORT TORCH
#157709 opened Jul 7, 2025
`torch.randint` raises a `RuntimeError` if `dtype=torch.uint64` and `high >= 2**63`
#157707 opened Jul 7, 2025
Seg faults on macos / OSX
#157704 opened Jul 7, 2025
[inductor][fuzzer] Compilation Error in complex64+toint
#157683 opened Jul 7, 2025
Flex Attention breaks in certain cases when used with a learned bias
#157677 opened Jul 6, 2025
Feedback about Getting Started on Intel GPU
#157672 opened Jul 6, 2025
NCCL error caused due to use of NVLS in torch 2.7.1-cu128 on aarch64 gb200 cluster
#157668 opened Jul 6, 2025
ConvNd ops in channel last layout (N,L,C) / (N,H,W,C) / (N,D,H,W,C)
#157663 opened Jul 5, 2025
OffsetBasedRNGTracker's run_state_sync causes deadlock due to inconsistent broadcast order across ranks
#157662 opened Jul 5, 2025

565 Unresolved conversations

Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.

DDE-Free select with unbacked index.
#157605 commented on Jul 12, 2025 • 16 new comments
[Device] Add support for PrivateUse1 device type in parse_type function
#157609 commented on Jul 12, 2025 • 13 new comments
[HOP, map] Rework of map autograd to the new interface
#153343 commented on Jul 11, 2025 • 11 new comments
Enable `_lazy_clone` between CPU and MPS
#148408 commented on Jul 11, 2025 • 9 new comments
[cutlass backend] cache maybe_append_choices
#156781 commented on Jul 12, 2025 • 7 new comments
Update the signature and test of torch.hamming_window()
#152682 commented on Jul 10, 2025 • 7 new comments
NUMA Binding Integration with torchrun
#149334 commented on Jul 12, 2025 • 6 new comments
[nativert] Move ModelRunnerBase to oss.
#157633 commented on Jul 7, 2025 • 5 new comments
Add test for user-managed weights with load_state_dict
#157496 commented on Jul 11, 2025 • 5 new comments
[dynamo] [guard] Add caching for inside torch.compile.disable function to avoid unnecessary recompilation.
#157566 commented on Jul 12, 2025 • 5 new comments
Deprecate DataLoader pin_memory_device param
#146821 commented on Jul 11, 2025 • 4 new comments
CI for Windows Arm64
#148753 commented on Jul 8, 2025 • 4 new comments
[PT2][fusion] ban fusions with large accumulated reads
#157563 commented on Jul 12, 2025 • 4 new comments
[simplefsdp auto-bucketing] ir node runtime estimation
#157572 commented on Jul 12, 2025 • 4 new comments
allow user to pass in custom partitioner function
#157580 commented on Jul 8, 2025 • 3 new comments
allow _size_of to return individual element's size
#157582 commented on Jul 8, 2025 • 3 new comments
[Draft][CUDA] Upgrade torch._scaled_grouped_mm to SM100+
#156806 commented on Jul 9, 2025 • 3 new comments
[cpp_wrapper] Build main and kernel code in separate threads
#154551 commented on Jul 11, 2025 • 3 new comments
[struct] Add `struct.pack` and `struct.unpack` polyfills
#156977 commented on Jul 10, 2025 • 3 new comments
Always disable ShardingPropagation cache if compiling
#156868 commented on Jul 11, 2025 • 3 new comments
Parameterized CUDA Graph Launch
#152622 commented on Jul 11, 2025 • 3 new comments
[NOT FOR MERGE] Exploratory work on AOTInductor training
#155877 commented on Jul 11, 2025 • 3 new comments
[Misc] skip the case test_foreach_add_different_mesh if world size is…
#155563 commented on Jul 8, 2025 • 2 new comments
Feature: Implement support for `cudnn_batch_norm_out` kernel to replace the autogen approach.
#123020 commented on Jul 7, 2025 • 2 new comments
[iter] support `iter(callable, sentinel)`
#156416 commented on Jul 10, 2025 • 2 new comments
[DTensor][FSDP2] necessary changes to FSDP and TP to unblock EP
#157216 commented on Jul 11, 2025 • 2 new comments
[BE] always use `uv pip` if possible in `pip_init.py` for `lintrunner init`
#157199 commented on Jul 10, 2025 • 2 new comments
Add DeviceAllocator as the base device allocator
#138222 commented on Jul 11, 2025 • 2 new comments
Do not checkout nccl for `USE_SYSTEM_LIBS`
#153807 commented on Jul 9, 2025 • 2 new comments
[Inductor] Set the default value of min_chunk_size to 512
#150762 commented on Jul 12, 2025 • 2 new comments
[generator] Raise `StopIteration(value)` with value from the return stmt
#157152 commented on Jul 9, 2025 • 2 new comments
handling special case for pow(3) for GPU
#157537 commented on Jul 11, 2025 • 2 new comments
Add cascade sum support for Inductor CPP backend
#156296 commented on Jul 11, 2025 • 2 new comments
Deprecate c10::string_view
#156798 commented on Jul 9, 2025 • 2 new comments
Optimize scatter/gather kernel for ARM.
#156161 commented on Jul 11, 2025 • 2 new comments
[doc] Updates to distributed.md for XCCL backend
#155834 commented on Jul 8, 2025 • 2 new comments
Add cuda 12.9 periodic tests
#156900 commented on Jul 10, 2025 • 1 new comment
Add basic torch.hash_tensor op
#154149 commented on Jul 12, 2025 • 1 new comment
[Don't Review] Test CI
#139971 commented on Jul 11, 2025 • 1 new comment
Make open device registration tests standalone
#153855 commented on Jul 9, 2025 • 1 new comment
[PrivateUse1] Optimize 3rd party backend experiences
#155215 commented on Jul 9, 2025 • 1 new comment
[Easy] Show some clear error when torch.ops.load_library fails.
#157524 commented on Jul 6, 2025 • 1 new comment
[dynamo, docs] add dynamo programming model docs
#157527 commented on Jul 10, 2025 • 1 new comment
Enable TF32 as fp32 internal precision for matmul/linear/conv
#157520 commented on Jul 9, 2025 • 1 new comment
[BE][1/5] fix typos in aten/
#157550 commented on Jul 12, 2025 • 1 new comment
[Refactor][XPU] Refactor XPU quantization op and add header files.
#157430 commented on Jul 8, 2025 • 1 new comment
[BE][2/5] fix typos in aten/ (aten/src/ATen/native/)
#157551 commented on Jul 12, 2025 • 1 new comment
[BE][3/5] fix typos in aten/ (aten/src/ATen/native/)
#157552 commented on Jul 12, 2025 • 1 new comment
Preserve current stream in TestCuda::test_stream_compatibility
#157421 commented on Jul 7, 2025 • 1 new comment
[BE][4/5] fix typos in aten/ (aten/src/ATen/native/)
#157553 commented on Jul 12, 2025 • 1 new comment
Fix doc issue 153531 by adding further explanation of STFT equation
#157595 commented on Jul 8, 2025 • 1 new comment
Add inductor lowerings for adaptive_avg_pool3d/adaptive_max_pool3d
#157331 commented on Jul 9, 2025 • 1 new comment
[BE] Replace `std::runtime_error` with `TORCH_CHECK` [2/N]
#152080 commented on Jul 11, 2025 • 1 new comment
enable windows inductor UT in CI
#151777 commented on Jul 11, 2025 • 0 new comments
Add adaptive_avg_pool2d input and output_size check
#151769 commented on Jul 10, 2025 • 0 new comments
[bazel] Fix unusual reference to cpuinfo workspace
#151578 commented on Jul 9, 2025 • 0 new comments
Allow to byteswap data when reading saved torch jit data
#151447 commented on Jul 9, 2025 • 0 new comments
[Environment Variable] Use thread-safe getenv functions
#152609 commented on Jul 12, 2025 • 0 new comments
[2/N] Use std::filesystem
#152586 commented on Jul 10, 2025 • 0 new comments
[BE] Update numba versions
#152557 commented on Jul 6, 2025 • 0 new comments
Remove Conda Instructions
#152546 commented on Jul 11, 2025 • 0 new comments
[compile async] [cache] testing
#152523 commented on Jul 6, 2025 • 0 new comments
Horizontal
#151780 commented on Jul 9, 2025 • 0 new comments
fix: outdated contents in dynamo overview
#152382 commented on Jul 6, 2025 • 0 new comments
[reland][ROCm] remove caffe2 from hipify
#151845 commented on Jul 8, 2025 • 0 new comments
[FP8][CUTLASS] xFail `honor_sm_carveout` on `sm100`
#152378 commented on Jul 12, 2025 • 0 new comments
Enable type promotions in slice_scatter (pytorch#147842)
#151911 commented on Jul 11, 2025 • 0 new comments
Avoid rounding errors in _get_total_norm for a dTensor by using torch Dynamo
#152234 commented on Jul 11, 2025 • 0 new comments
Add dynamo config to HOP-ify context managers
#152159 commented on Jul 8, 2025 • 0 new comments
docs: add torch.e and torch.pi to constants table (#134964)
#151996 commented on Jul 7, 2025 • 0 new comments
[UniformValueConstantFolder] deduce value on CPU rather than on device
#151998 commented on Jul 7, 2025 • 0 new comments
[BE][3/6] fix typos in test/
#157637 commented on Jul 12, 2025 • 0 new comments
try relanding cublaslt autotuning support for TunableOp #
#153316 commented on Jul 9, 2025 • 0 new comments
[DEBUG] MTIA Module and Interface
#153308 commented on Jul 9, 2025 • 0 new comments
[DEBUG] memory profiler and combined trace
#153307 commented on Jul 9, 2025 • 0 new comments
[BE] fix skip_if_lt_x_gpu decorator and add test coverage
#153295 commented on Jul 9, 2025 • 0 new comments
[BE][FSDP] fix FSDP to skip tests where #GPUs < world_size before entering into init_pg
#153291 commented on Jul 9, 2025 • 0 new comments
test timm_efficientnet pass
#153290 commented on Jul 12, 2025 • 0 new comments
[MX] Add more ops to allowed set for e8
#153271 commented on Jul 8, 2025 • 0 new comments
[nocommit] bundled autograd cache test
#153269 commented on Jul 11, 2025 • 0 new comments
fix dtensor and tensor inconsistent compute mesh
#153268 commented on Jul 7, 2025 • 0 new comments
DEBUG PR Issue
#153267 commented on Jul 9, 2025 • 0 new comments
MXFP8 Fix broken bias support for mxfp8
#153254 commented on Jul 8, 2025 • 0 new comments
Fix integer overflow bug in triu/tril for large diagonal values
#153240 commented on Jul 12, 2025 • 0 new comments
Delete .github/workflows/docker-cache-mi300.yml
#153075 commented on Jul 10, 2025 • 0 new comments
Add CUDA support for Adagrad(fused=True)
#153038 commented on Jul 10, 2025 • 0 new comments
Allow zero sized dimensions in padding operations
#153037 commented on Jul 11, 2025 • 0 new comments
[WIP][dynamic shapes] unbacked safer cat, repeat
#153011 commented on Jul 6, 2025 • 0 new comments
[Pytorch] Add `torch.cuda.streams.Event` to save torch functions list
#152978 commented on Jul 6, 2025 • 0 new comments
[dtensor] Extend Partial partition of replicated tensor for min/max reduce
#152975 commented on Jul 7, 2025 • 0 new comments
docs: Improve documentation for NCCL timeout / watchdog variables
#152959 commented on Jul 6, 2025 • 0 new comments
[dtensor] add privateuse1 SDPA op support to DTensor
#152949 commented on Jul 9, 2025 • 0 new comments
[feature] Channel Wise Parallel API for Conv layers
#152937 commented on Jul 6, 2025 • 0 new comments
Allow Inductor backends to attest their own availability
#152933 commented on Jul 5, 2025 • 0 new comments
Add unified memory APIs for torch.accelerator
#152932 commented on Jul 11, 2025 • 0 new comments
Add overall tensor similarity comparison (#152647)
#152920 commented on Jul 6, 2025 • 0 new comments
[DO NOT MERGE] update build tools version
#152820 commented on Jul 8, 2025 • 0 new comments
Update CMakeLists.txt
#152786 commented on Jul 6, 2025 • 0 new comments
Pattern matcher support for mutable ops with view inputs
#152776 commented on Jul 12, 2025 • 0 new comments
Handle less functions than number of segments
#152753 commented on Jul 6, 2025 • 0 new comments
[Dynamo] Guard serialization for BUILTIN_MATCH
#152729 commented on Jul 6, 2025 • 0 new comments
[pytree] make `tree_*` functions accept both Python and C++ `PyTreeSpec`
#152624 commented on Jul 9, 2025 • 0 new comments
[WIP] Generalize device caching allocator
#151298 commented on Jul 10, 2025 • 0 new comments
[DO NOT MERGE] Update oneDNN to the latest main branch
#147073 commented on Jul 8, 2025 • 0 new comments
fake_tensor: Handle op errors more gracefully
#147049 commented on Jul 7, 2025 • 0 new comments
Porting Pytorch to AIX Operating System.
#146983 commented on Jul 8, 2025 • 0 new comments
Support contextlib.ExitStack
#146506 commented on Jul 9, 2025 • 0 new comments
Fix full_like decomposition to preserve strides
#144765 commented on Jul 9, 2025 • 0 new comments
[BE][PYFMT] remove `black`: finish `black -> ruff format` migration
#144557 commented on Jul 12, 2025 • 0 new comments
[BE][PYFMT] migrate PYFMT for `test/[i-z]*/` to `ruff format`
#144556 commented on Jul 12, 2025 • 0 new comments
[BE][PYFMT] migrate PYFMT for `torch/[p-z]*/` to `ruff format`
#144552 commented on Jul 12, 2025 • 0 new comments
[dynamo, nested graph breaks] add nested graph break tests
#144516 commented on Jul 10, 2025 • 0 new comments
[Draft][WIP] Enable XPU path for FlexAttention
#143553 commented on Jul 7, 2025 • 0 new comments
Support LOAD_BUILD_CLASS opcode in dynamo
#139561 commented on Jul 10, 2025 • 0 new comments
`has_triton`: Use the device interface for detecting Triton availability
#139171 commented on Jul 7, 2025 • 0 new comments
Add overflow check for negtive integer div_floor and div_trunc on CPU
#138684 commented on Jul 12, 2025 • 0 new comments
Always produce XML
#138513 commented on Jul 7, 2025 • 0 new comments
[pytree] add `treespec_{leaf,tuple,dict}` functions for args_spec modification
#138214 commented on Jul 9, 2025 • 0 new comments
[pytree] Add public pytree module `torch.utils.pytree`
#137400 commented on Jul 9, 2025 • 0 new comments
Help fix numpy detection in cross compiled layouts
#137084 commented on Jul 8, 2025 • 0 new comments
Automated submodule update: FBGEMM
#115316 commented on Jul 12, 2025 • 0 new comments
[pytree] support PyStructSequence types for Python pytree
#113258 commented on Jul 9, 2025 • 0 new comments
[docs] URL and link format proposal to make function page URLs more concise
#106664 commented on Jul 12, 2025 • 0 new comments
Python 3.14 support for PyTorch
#156856 commented on Jul 12, 2025 • 0 new comments
Illegal Memory Access when Using Trainable Biases in Flex Attention
#144511 commented on Jul 12, 2025 • 0 new comments
addition of muon optimizer to torch.optim
#148819 commented on Jul 12, 2025 • 0 new comments
UNSTABLE rocm / linux-jammy-rocm-py3.10 / test (default)
#156098 commented on Jul 12, 2025 • 0 new comments
DISABLED test_remove_noop_view_default_cuda (__main__.GPUTests)
#151511 commented on Jul 12, 2025 • 0 new comments
[v.2.8.0] Release Tracker
#156745 commented on Jul 12, 2025 • 0 new comments
[inductor] [silence] inconsistent swap wih eager when compiling `torch.rot90-torch.randn_like`
#147847 commented on Jul 12, 2025 • 0 new comments
Inductor can not fuse cat with a pointwise
#125075 commented on Jul 12, 2025 • 0 new comments
[FSDP2] document the contract for modifying DTensor model.parameters()
#157391 commented on Jul 12, 2025 • 0 new comments
mps and cpu backends produce different training results with FFT and Adam
#151740 commented on Jul 8, 2025 • 0 new comments
[dynamo] Avoid unnecessary `.detach()` call in `_make_subclass` polyfill
#151265 commented on Jul 5, 2025 • 0 new comments
NCCL: Fix cmake file when cross compiling.
#151234 commented on Jul 8, 2025 • 0 new comments
Implement MKLGenerator
#151218 commented on Jul 10, 2025 • 0 new comments
Fix `MaskedTensor` to device ignored mask
#151205 commented on Jul 10, 2025 • 0 new comments
[MPS] Get Vmap to work with mps backend
#151177 commented on Jul 12, 2025 • 0 new comments
Pin all root requirements to major versions
#150833 commented on Jul 7, 2025 • 0 new comments
[draft][distributed] add into 3d composability test at AMD CI test
#150694 commented on Jul 8, 2025 • 0 new comments
Make LazyModuleMixin materialize after load_state_dict
#150593 commented on Jul 10, 2025 • 0 new comments
Add differentiable ops hint message in Module docs
#150291 commented on Jul 10, 2025 • 0 new comments
Add cmake variable USE_ROCM_CK
#150245 commented on Jul 5, 2025 • 0 new comments
[DLPack] Add support for missing keyword-arguments.
#150218 commented on Jul 9, 2025 • 0 new comments
Add path used by pip's build isolation procedure to DLL search
#150013 commented on Jul 11, 2025 • 0 new comments
AOTI freezing: fix test issues and enable by default
#149961 commented on Jul 11, 2025 • 0 new comments
DRAFT: Add TMA opt for concat function target hopper and blackwell arch
#149893 commented on Jul 6, 2025 • 0 new comments
Configure `cuda.cmake` to ensure consistent behavior downstream
#149861 commented on Jul 8, 2025 • 0 new comments
[test] sccache docker build
#149536 commented on Jul 7, 2025 • 0 new comments
Fix unexpected keyword argument 'mode' when calling `CompileCounterWithBackend`
#149271 commented on Jul 6, 2025 • 0 new comments
Fix AttributeError for `_get_vc_env` with setuptools>=75.9.0
#148847 commented on Jul 6, 2025 • 0 new comments
[BE][pytree] cleanup parameterized pytree tests
#148569 commented on Jul 9, 2025 • 0 new comments
Implement fast access to individual elements of jagged nested tensors
#148497 commented on Jul 11, 2025 • 0 new comments
[triton hash update] update the pinned triton hash
#148492 commented on Jul 12, 2025 • 0 new comments
[BE][pytree] rename argument name in register function to match the type annotations: `*_fn -> *_func`
#148484 commented on Jul 9, 2025 • 0 new comments
[BE][pytree] rename `NodeDef` member to match the type annotations: `*_fn -> *_func`
#148474 commented on Jul 9, 2025 • 0 new comments
[pytree] simplify public API exposition with `__module__`
#148328 commented on Jul 9, 2025 • 0 new comments
[pytree] add another simplified pytree module `torch.pytree`
#148180 commented on Jul 9, 2025 • 0 new comments
Support `contextlib.suppress`
#147990 commented on Jul 9, 2025 • 0 new comments
[DO NOT MERGE] Migrate from oneDNN Inner Product to oneDNN MatMul for mkldnn_linear and mkldnn_linear_backward
#147855 commented on Jul 8, 2025 • 0 new comments
[Inductor][CPP] Add float16 support for CppMicroGemmAMX
#147368 commented on Jul 9, 2025 • 0 new comments
[DO NOT MERGE][Inductor] Migrate from oneDNN Inner Product to oneDNN MatMul for mkldnn._linear_pointwise and mkldnn._linear_pointwise.binary
#147360 commented on Jul 8, 2025 • 0 new comments
switch from deprecated `find_package(CUDA)` to `find_package(CUDAToolkit)`
#147300 commented on Jul 8, 2025 • 0 new comments
[nativert] libtorch kernel registry
#157150 commented on Jul 8, 2025 • 0 new comments
[generator] Close all open generators in compile_subgraph
#157149 commented on Jul 9, 2025 • 0 new comments
[contextlib] Fixes for CPython contextlib tests
#157148 commented on Jul 10, 2025 • 0 new comments
[DDP][FSDP2] Add unit test for DDP mixed precision with FSDP2 ignored params
#157140 commented on Jul 9, 2025 • 0 new comments
Documenting discrepancy for Numpy dependency versions
#157132 commented on Jul 11, 2025 • 0 new comments
updated path to requirements txt in docs
#157106 commented on Jul 11, 2025 • 0 new comments
[CI] add decorator for specifying H100-only tests
#156980 commented on Jul 10, 2025 • 0 new comments
[math] Trace `float.fromhex`
#156976 commented on Jul 9, 2025 • 0 new comments
[math] Raise exception in Dynamo if constant fold call fail
#156975 commented on Jul 9, 2025 • 0 new comments
Use `_get_object_coll_device` instead of deprecated API
#156878 commented on Jul 7, 2025 • 0 new comments
`fast-autotune`: Model Prediction of Triton Kernel Runtimes
#156851 commented on Jul 10, 2025 • 0 new comments
[TESTING] [DO NOT MERGE] Updated triton commit pin - upstream base
#156841 commented on Jul 7, 2025 • 0 new comments
Introduce a new API torch.accelerator.get_mem_info
#156812 commented on Jul 10, 2025 • 0 new comments
add tests for Thunk utility function
#156759 commented on Jul 7, 2025 • 0 new comments
Add back manywheel-py3_9-cuda12_4-build/test
#156753 commented on Jul 9, 2025 • 0 new comments
Stop parsing command line arguments every time common_utils is imported.
#156703 commented on Jul 10, 2025 • 0 new comments
ReplaceWithCopy graph pass
#156666 commented on Jul 7, 2025 • 0 new comments
Adds support for Nested Jagged Tensor in Multihead Attention
#156660 commented on Jul 8, 2025 • 0 new comments
Fix torch==2.6 broke nn.Module.dtype typing
#156631 commented on Jul 9, 2025 • 0 new comments
multi-kernel matmuls based on varying hint sizes
#156628 commented on Jul 12, 2025 • 0 new comments
[BE][15/16] fix typos in torch/ (torch/distributed/tensor/)
#156605 commented on Jul 12, 2025 • 0 new comments
[Inductor Dashboard] Enable deterministic algorithms for all models on ROCm
#156592 commented on Jul 10, 2025 • 0 new comments
[Doc] remove WSL2 in support matrix for Intel GPU
#156590 commented on Jul 8, 2025 • 0 new comments
[2/N] Remove FindPackageHandleStandardArgs.cmake
#156559 commented on Jul 11, 2025 • 0 new comments
[docs][typing] Document and type support for dim=None in torch.amin and torch.amax
#156510 commented on Jul 9, 2025 • 0 new comments
[Inductor] Fix epilogue fusion decision with 1 Triton caller as choice
#156500 commented on Jul 10, 2025 • 0 new comments
[ROCm][Windows] Fix finding ROCm/HIP version
#156486 commented on Jul 11, 2025 • 0 new comments
[iter] Wrap iter(..) call in a ObjectIteratorVariable
#156460 commented on Jul 10, 2025 • 0 new comments
Change t.is_cuda to t.device.type == 'cuda' in torch/utils/viz
#156418 commented on Jul 7, 2025 • 0 new comments
[iter] Add support for sequence protocol in `iter(..)`
#156371 commented on Jul 9, 2025 • 0 new comments
[BE][1/6] fix typos in test/
#157635 commented on Jul 12, 2025 • 0 new comments
[pruning] Implement Taylor expansion unstructured pruning
#157620 commented on Jul 12, 2025 • 0 new comments
[pruning] add more test cases for pruning
#157613 commented on Jul 7, 2025 • 0 new comments
[dynamo] Move skipIf decorator to class level in test_fx_graph_runnable
#157594 commented on Jul 10, 2025 • 0 new comments
[CUDA][NVTX] use `pytorch` nvtx domain for pytorch ranges
#157586 commented on Jul 8, 2025 • 0 new comments
Linux py 3.14 wheel builds
#157559 commented on Jul 9, 2025 • 0 new comments
[BE][5/5] fix typos in aten/ (aten/src/ATen/)
#157554 commented on Jul 12, 2025 • 0 new comments
[indcutor] pack linear for FP32 dynamic mode
#157542 commented on Jul 7, 2025 • 0 new comments
Add a test for checking that the CUDA stubs directory is not in libcaffe2_nvrts.so's RPATH or RUNPATH
#157437 commented on Jul 7, 2025 • 0 new comments
[build] bootstrap git repo for build for non-git-clone archive
#157432 commented on Jul 12, 2025 • 0 new comments
Add a flag "realized" in IRNode to enable tracking origin_nodes
#157423 commented on Jul 8, 2025 • 0 new comments
[CI] Fixes CI for CUDA Version > 12.9
#157385 commented on Jul 10, 2025 • 0 new comments
[inductor] Fix memory layout for concatenation of repeated input
#157380 commented on Jul 7, 2025 • 0 new comments
[cherry-pick] temporarily disabling generation of weblinks for torch v2.8 …
#157353 commented on Jul 7, 2025 • 0 new comments
Making input dynamically adjust.
#157324 commented on Jul 9, 2025 • 0 new comments
Using torch.accelerator in comm_mode_features_example.py and visualize_sharding_example.py
#157317 commented on Jul 10, 2025 • 0 new comments
Fix inconsistent pybind11 usage across ONNX and Tensorpipe during CMake build
#157309 commented on Jul 10, 2025 • 0 new comments
Test re-enabling ET test
#157298 commented on Jul 9, 2025 • 0 new comments
adding the ability to record aten arg vals and types
#157291 commented on Jul 8, 2025 • 0 new comments
[nativert] add memory overlap debug assertion
#157290 commented on Jul 7, 2025 • 0 new comments
Update docs dependencies
#157287 commented on Jul 12, 2025 • 0 new comments
[inductor][templates] Finalize all registered hooks
#157270 commented on Jul 7, 2025 • 0 new comments
Fix the Problems About Defining Static Variable in Inline Function
#157269 commented on Jul 9, 2025 • 0 new comments
Fix init CUDA preload: get correct versions (#147001)
#157264 commented on Jul 9, 2025 • 0 new comments
[distributed] build enum for Backend class
#157263 commented on Jul 7, 2025 • 0 new comments
[dynamo][fsdp] Consistent behavior of int attributes
#157262 commented on Jul 11, 2025 • 0 new comments
Updating default value of eps in RMSNorm documentation
#157223 commented on Jul 9, 2025 • 0 new comments
Adding bias argument to NN normalization methods
#157198 commented on Jul 11, 2025 • 0 new comments
[DO NOT MERGE] Test new MI300X capacity.
#157191 commented on Jul 12, 2025 • 0 new comments
[SymmMem] Enable NVL72
#157180 commented on Jul 10, 2025 • 0 new comments
[OrderedDict] Implement `OrderedDict.move_to_end(key, last=False)`
#155152 commented on Jul 10, 2025 • 0 new comments
Fix conversion of values in libtorch agnostic tests
#155115 commented on Jul 9, 2025 • 0 new comments
[dict] Implement dict.__ior__ and fix return type in dict.__or__
#155072 commented on Jul 10, 2025 • 0 new comments
Avoid differing results in `linalg.(tensor_)solve`
#154983 commented on Jul 10, 2025 • 0 new comments
[CI] Removing --user flag from all pip install commands
#154900 commented on Jul 10, 2025 • 0 new comments
[BE]: Try to enable LTO
#154819 commented on Jul 5, 2025 • 0 new comments
[vision hash update] update the pinned vision hash
#154694 commented on Jul 12, 2025 • 0 new comments
Add `scale` complex type check in `quantize_per_tensor`
#154601 commented on Jul 10, 2025 • 0 new comments
Use official CUDAToolkit module in CMake
#154595 commented on Jul 7, 2025 • 0 new comments
Enable Leak Sanitizer
#154584 commented on Jul 7, 2025 • 0 new comments
Fixes Issue #154491
#154561 commented on Jul 11, 2025 • 0 new comments
implement MKLGenerator
#154199 commented on Jul 10, 2025 • 0 new comments
[cuBLASLt][cuBLAS] Support 2D bias and `beta != 1.0` in cuBLASLt
#154170 commented on Jul 8, 2025 • 0 new comments
[BE]: Update pybind11 submodule to 3.0.0rc
#154115 commented on Jul 10, 2025 • 0 new comments
Fused RMSNorm implementation
#153666 commented on Jul 12, 2025 • 0 new comments
[BE]: Update CUTLASS submodule to 4.0.0
#153541 commented on Jul 9, 2025 • 0 new comments
[dynamo][compile-time] Cache frame summaries
#153434 commented on Jul 12, 2025 • 0 new comments
[PT2][Optimus][fp8 compuation quantizatoin] Add fallback logic
#153430 commented on Jul 12, 2025 • 0 new comments
[BE] Move `BUILD_AOT_INDUCTOR_TEST` to build stage
#153419 commented on Jul 12, 2025 • 0 new comments
defer to aot eager instead of skip frame
#153409 commented on Jul 12, 2025 • 0 new comments
Print correct variable names in cuda.cmake
#153402 commented on Jul 12, 2025 • 0 new comments
[ONNX] Cast before calling Softmax when dtype is specified
#153393 commented on Jul 12, 2025 • 0 new comments
Remove mut marker for fused_adagrad in native_functions.yaml
#153376 commented on Jul 12, 2025 • 0 new comments
[DEBUG] REmove has CUDA
#153349 commented on Jul 11, 2025 • 0 new comments
[Dynamo][TVM] Check TVM existence and version
#153338 commented on Jul 12, 2025 • 0 new comments
[don't merge] upgrade vs2022 to v17.13.6
#153322 commented on Jul 9, 2025 • 0 new comments
[DEBUG] only comment
#153320 commented on Jul 9, 2025 • 0 new comments
[DEBUG] only combined_traceback
#153319 commented on Jul 9, 2025 • 0 new comments
[DEBUG] dump combined_traceback
#153318 commented on Jul 9, 2025 • 0 new comments
[associative_scan] Autograd for additional inputs
#153317 commented on Jul 9, 2025 • 0 new comments
[iter] exhaust `ListIterator` when `unpack_var_sequence` is called
#156370 commented on Jul 9, 2025 • 0 new comments
[iter] Update some of the tests to not call pickle
#156369 commented on Jul 9, 2025 • 0 new comments
[WIP] Add a new API of allocator setting for accelerator
#156175 commented on Jul 10, 2025 • 0 new comments
[executorch hash update] update the pinned executorch hash
#156141 commented on Jul 12, 2025 • 0 new comments
[CUDA] Use runtime driver API for cuStreamWriteValue32
#156097 commented on Jul 12, 2025 • 0 new comments
[build] remove upper version pin for `setuptools<80.0`
#156049 commented on Jul 12, 2025 • 0 new comments
Fix atleast_{1,2,3}d() with no arguments description
#156042 commented on Jul 10, 2025 • 0 new comments
[DRAFT][cuDNN][SDPA] Introduce `TORCH_CUDNN_SDPA_AVOID_RECOMPILE=1`
#155958 commented on Jul 12, 2025 • 0 new comments
[dynamo] Add `-> bool` to functions named `is_*` or `_is_*`
#155923 commented on Jul 5, 2025 • 0 new comments
add sfdp pattern
#155792 commented on Jul 9, 2025 • 0 new comments
Fix torch.export.export() GPU failure with RNN modules.
#155734 commented on Jul 9, 2025 • 0 new comments
docs: clean up docstring for clarity and correctness
#155712 commented on Jul 9, 2025 • 0 new comments
[Optimus] add einsum_to_pointwise_pass pattern
#155666 commented on Jul 11, 2025 • 0 new comments
Make upsample accept list scale_factor
#155654 commented on Jul 9, 2025 • 0 new comments
DOC: update CrossEntropyLoss with note and example of incorrect target specification
#155649 commented on Jul 8, 2025 • 0 new comments
[dict] Implement dict subclass `fromkeys` classmethod
#155608 commented on Jul 11, 2025 • 0 new comments
[DRAFT] Evaluate feasability of using FunctionalTensor for Example Value
#155606 commented on Jul 8, 2025 • 0 new comments
[aoti][mps] Enable test_aot_inductor.py tests
#155598 commented on Jul 10, 2025 • 0 new comments
[Misc] fix distributed/_tools/test_sac_ilp.py::TestSACILP::test_sac_i…
#155548 commented on Jul 8, 2025 • 0 new comments
[OrderedDict] Add `bool(OrderedDict)`
#155503 commented on Jul 10, 2025 • 0 new comments
[OrderedDict] Set the correct dict class in UserDefinedDictVariable
#155502 commented on Jul 10, 2025 • 0 new comments
[OrderedDict] Implement `hasattr(..., IteratorVariable)`
#155501 commented on Jul 10, 2025 • 0 new comments
[Dynamo] Enable torch function dispatch on HOPs
#155452 commented on Jul 10, 2025 • 0 new comments
Use unpack instructions for vec256 (de)interleave2
#155440 commented on Jul 9, 2025 • 0 new comments
[scan] Fix issues with scan on CPU and for autograd when implementing an RNN with multiple layers
#155422 commented on Jul 10, 2025 • 0 new comments
Convert onnx torchscript rst to md
#155390 commented on Jul 12, 2025 • 0 new comments
[einops] Ensure Dynamo can trace through einops
#155310 commented on Jul 8, 2025 • 0 new comments
Add UT for torch.accelerator memory-related API
#155200 commented on Jul 10, 2025 • 0 new comments
[dict] Implement `__eq__` for dict_items
#155154 commented on Jul 10, 2025 • 0 new comments
[OrderedDict] Implement `OrderedDict.popitem(last=...)`
#155153 commented on Jul 10, 2025 • 0 new comments
Export always give a value range with max length - 1
#156882 commented on Jul 9, 2025 • 0 new comments
DISABLED test_nn_module (__main__.TestGuardSerialization)
#153120 commented on Jul 9, 2025 • 0 new comments
DISABLED test_aot_autograd_runtime_wrapper_prologue_profiled (__main__.ReproTests)
#156678 commented on Jul 9, 2025 • 0 new comments
DISABLED test_inductor_all_reduce_non_contig_input (__main__.CompileTest)
#147733 commented on Jul 9, 2025 • 0 new comments
DISABLED test_rng (__main__.TestCompilerBisector)
#139590 commented on Jul 9, 2025 • 0 new comments
DISABLED test_dynamic_warmup (__main__.CudaGraphTreeTests)
#156693 commented on Jul 9, 2025 • 0 new comments
DISABLED test_relative_import (__main__.ReproTests)
#156679 commented on Jul 9, 2025 • 0 new comments
DISABLED test_vmap_exhaustive_mv_cuda_float32 (__main__.TestVmapOperatorsOpInfoCUDA)
#142631 commented on Jul 9, 2025 • 0 new comments
libtorch.so file size is very large
#34058 commented on Jul 9, 2025 • 0 new comments
torch.fx.experimental.symbolic_shapes.GuardOnDataDependentSymNode: Could not guard on data-dependent expression Eq(u0, 1) (unhinted: Eq(u0, 1)). (Size-like symbols: none)
#157355 commented on Jul 9, 2025 • 0 new comments
Improve Error Message in MultiMarginLoss for Inconsistent Target Size
#106251 commented on Jul 9, 2025 • 0 new comments
Training/Fine-tuning fails with PyTorch 2.8 + 4x 5090 GPUs using DDP/FSDP/DeepSpeed
#150734 commented on Jul 9, 2025 • 0 new comments
[RFC] Per-Parameter-Sharding FSDP
#114299 commented on Jul 9, 2025 • 0 new comments
DISABLED test_simple_multi_arch_embed_kernel_binary_True_cuda (__main__.AOTInductorTestABICompatibleGpu)
#156930 commented on Jul 9, 2025 • 0 new comments
DISABLED test_relative_import_no_modulename (__main__.ReproTests)
#156691 commented on Jul 9, 2025 • 0 new comments
DISABLED test_bitwise_print_precedence (__main__.ReproTests)
#156736 commented on Jul 9, 2025 • 0 new comments
DISABLED test_empty_storage (__main__.CudaGraphTreeTests)
#156755 commented on Jul 9, 2025 • 0 new comments
we should graph break on nn.Parameter constructors
#157452 commented on Jul 8, 2025 • 0 new comments
torch._dynamo.exc.InternalTorchDynamoError: RuntimeError: Compiler: cl is not found
#157458 commented on Jul 8, 2025 • 0 new comments
PyTorch 2.7.1 torch.compile will probably break with einops 0.8.2 or 0.9.0
#157601 commented on Jul 8, 2025 • 0 new comments
DISABLED test_ind_worker_queue (__main__.TestIndividualWorkerQueue)
#68643 commented on Jul 8, 2025 • 0 new comments
DISABLED test_addr_alpha_beta_out (__main__.ReproTests)
#156641 commented on Jul 8, 2025 • 0 new comments
DISABLED test_while_loop_schema_gen (__main__.TestHopSchema)
#141202 commented on Jul 8, 2025 • 0 new comments
DISABLED test_graph_break_unsupported_fake (__main__.ReproTests)
#156629 commented on Jul 8, 2025 • 0 new comments
DISABLED test_graph_partition_custom_op_dynamoc_shapes (__main__.CudaGraphTreeTests)
#157428 commented on Jul 8, 2025 • 0 new comments
DISABLED test_dont_dce_rand (__main__.ReproTests)
#156580 commented on Jul 8, 2025 • 0 new comments
DISABLED test_graph_partition_reorder_cpu_and_gpu_interleave (__main__.CudaGraphTreeTests)
#152561 commented on Jul 8, 2025 • 0 new comments
canUse32BitIndexMath set to False with efficient net
#155225 commented on Jul 8, 2025 • 0 new comments
magma builds should be part of the docker image builds
#148762 commented on Jul 8, 2025 • 0 new comments
Cuda-12.9 removed libnvToolsExt.so.* and is now purely header nvtx3
#152756 commented on Jul 8, 2025 • 0 new comments
[Release improvements] Have cherry-pick bot always add the current release to the PR
#152212 commented on Jul 8, 2025 • 0 new comments
[feature request] torch.mix function to generalize/symmetrize addcmul
#104849 commented on Jul 8, 2025 • 0 new comments
DISABLED test_matmul_small_brute_force_1d_Nd_cuda_float32 (__main__.TestLinalgCUDA)
#125276 commented on Jul 8, 2025 • 0 new comments
PT2E Quantization Migration Tracker
#157591 commented on Jul 8, 2025 • 0 new comments
Label tracking meta-issue (edit me to get automatically CC'ed on issues! cc bot)
#24422 commented on Jul 8, 2025 • 0 new comments
DISABLED test_inductor_all_to_all_single (__main__.CompileTest)
#147795 commented on Jul 9, 2025 • 0 new comments
DISABLED test_empty_cpu_tensor (__main__.CudaGraphTreeTests)
#156735 commented on Jul 9, 2025 • 0 new comments
DISABLED test_dataclass_in_module (__main__.ReproTests)
#156776 commented on Jul 9, 2025 • 0 new comments
Add full support for NVIDIA RTX Pro 6000 (Blackwell – SM122 / Compute Capability 12.2)
#157549 commented on Jul 9, 2025 • 0 new comments
Allow creation of pseudo devices for testing purposes
#61654 commented on Jul 9, 2025 • 0 new comments
Restore CUDA 12.4 manylinux build and test in CI
#156747 commented on Jul 9, 2025 • 0 new comments
Importing xgboost before torch + openmp causes seg fault
#155201 commented on Jul 9, 2025 • 0 new comments
[CUDA][Compex] `test_reference_numerics_large_jiterator_unary_cuda_complex64` broken after updating to `numpy >= 1.25.0`
#125198 commented on Jul 9, 2025 • 0 new comments
FakeTensorUpdater does not trace nodes correctly
#152548 commented on Jul 9, 2025 • 0 new comments
Unexpected, batch size and device dependent NaN propagation in Conv1d
#157237 commented on Jul 9, 2025 • 0 new comments
Deprecation of NVTX 2 (`nvToolsExt`): Recommended to move to NVTX 3
#147011 commented on Jul 9, 2025 • 0 new comments
torch.compile on MPS progress tracker
#150121 commented on Jul 9, 2025 • 0 new comments
Export Huggingface models with StaticCache
#155862 commented on Jul 9, 2025 • 0 new comments
Add dlpack support for MPS device
#153789 commented on Jul 9, 2025 • 0 new comments
torch wheels are unusable if CUDA RPMs are installed on the system (was Import error in nvidia/cuda:12.6.3-cudnn-devel-rockylinux9)
#150399 commented on Jul 9, 2025 • 0 new comments
Looking for valid compiling option for extension based on torch-2.1.0+cpu.cxx11.abi
#143780 commented on Jul 9, 2025 • 0 new comments
Tensor.nbytes() returns itemsize * numel for sparse tensors
#29734 commented on Jul 11, 2025 • 0 new comments
DISABLED test_TransformerDecoderLayer_gelu_activation_cuda_fp32 (__main__.TestNN)
#157121 commented on Jul 9, 2025 • 0 new comments
DISABLED test_graph_partition_custom_op_mutation (__main__.CudaGraphTreeTests)
#157449 commented on Jul 9, 2025 • 0 new comments
DISABLED test_Transformer_multilayer_coder_cuda_fp32 (__main__.TestNN)
#157120 commented on Jul 9, 2025 • 0 new comments
Pytorch XPU Windows build failed in cmake rerun loop due to the source code deep path
#134956 commented on Jul 9, 2025 • 0 new comments
Matmul Triton Template with epilogue fusion can not speed up on XPU.
#146568 commented on Jul 9, 2025 • 0 new comments
[XPU][Inductor] Failed to run max-autotune in subprocess.
#149703 commented on Jul 9, 2025 • 0 new comments
optree package status in PyTorch
#152535 commented on Jul 9, 2025 • 0 new comments
[torch.export] Cannot export TorchVision fasterrcnn_mobilenet_v3_large_fpn
#146152 commented on Jul 9, 2025 • 0 new comments
Inductor error with Torch XPU optimizations to StableDiffusion3 Pipeline
#156303 commented on Jul 9, 2025 • 0 new comments
`RuntimeError: UR error` with XPU
#149953 commented on Jul 9, 2025 • 0 new comments
nn.rmsnorm is super slower than nn.layernorm
#157345 commented on Jul 9, 2025 • 0 new comments
Allow clamp to work on absolute values and preserve sign
#156956 commented on Jul 9, 2025 • 0 new comments
[inductor][triton] Block ptrs are being removed from Triton
#154025 commented on Jul 9, 2025 • 0 new comments
RuntimeError: each element in list of batch should be of equal size
#42654 commented on Jul 9, 2025 • 0 new comments
from_blob python api
#107112 commented on Jul 9, 2025 • 0 new comments
RuntimeError: Could not find libnvrtc.so. Please make sure CUDA is installed.
#155378 commented on Jul 9, 2025 • 0 new comments
get_ema_multi_avg_fn() equation is a little confused
#155551 commented on Jul 9, 2025 • 0 new comments
MPS operator coverage tracking issue (2.6+ version)
#141287 commented on Jul 9, 2025 • 0 new comments
[feature request] Caching allocator diagnostics and memory allocation tracing/visualization
#1529 commented on Jul 7, 2025 • 0 new comments
[Upstream Triton] Handle user-specified triton.set_allocator function
#155584 commented on Jul 7, 2025 • 0 new comments
Incremental version of pca_lowrank
#40770 commented on Jul 7, 2025 • 0 new comments
Migrating existing backend-MAIA integration toward PrivateUse1 / openReg
#155864 commented on Jul 7, 2025 • 0 new comments
`bias: bool` argument for Batch, Instance, Group and RMSNorm
#157144 commented on Jul 7, 2025 • 0 new comments
Regression: torch.distributed.gather_object segfaults
#157627 commented on Jul 7, 2025 • 0 new comments
Einsum of 2 dtensors fails in inference mode
#157631 commented on Jul 7, 2025 • 0 new comments
[DTensor] Better communication cost model for redistribute
#157585 commented on Jul 7, 2025 • 0 new comments
`TORCH_DISTRIBUTED_DEBUG=DETAIL` causes DTensors to raise errors
#157622 commented on Jul 7, 2025 • 0 new comments
[Export] Non-strict mode can't handle conditionals on tensor subclass types
#153429 commented on Jul 7, 2025 • 0 new comments
cmake: add USE_SYSTEM_{KLEIDI,CUDNN_FRONTEND,CUTLASS,FMT} options to USE_SYSTEM_LIBS
#153863 commented on Jul 7, 2025 • 0 new comments
DISABLED test_sdpa_mask_fp16_L6_S17_NH23_HS121 (__main__.TestSDPA)
#138905 commented on Jul 7, 2025 • 0 new comments
MPS SDPA returns NaN when attention mask blocks all rows
#156707 commented on Jul 7, 2025 • 0 new comments
Flex Attention is incompatible with selective AC
#147879 commented on Jul 7, 2025 • 0 new comments
[dynamo] Improve trace rules reasoning
#150435 commented on Jul 7, 2025 • 0 new comments
`torch.combinations` exhibits excessive memory usage and hangs for moderate `n` and `r` due to `n^r`
#153337 commented on Jul 7, 2025 • 0 new comments
[complex] dropout and it's variants should support complex tensors
#80256 commented on Jul 7, 2025 • 0 new comments
DISABLED test_graph_partition_cpu_tensor_symints (__main__.CudaGraphTreeTests)
#157367 commented on Jul 8, 2025 • 0 new comments
RendezvousConnectionError when use C10d on multi nodes
#69197 commented on Jul 5, 2025 • 0 new comments
I want to calculate the matrix multiplication of two Boolean matrices, but torch.mm will report an error. Is there any more efficient alternative?
#107041 commented on Jul 5, 2025 • 0 new comments
Make tlparse able to show a summary of distinct graph breaks
#153669 commented on Jul 5, 2025 • 0 new comments
ROCm+gcc 15 asserts
#145608 commented on Jul 5, 2025 • 0 new comments
Pipeline Parallelism Fails when stage input does not produce gradients in all stages.
#152827 commented on Jul 6, 2025 • 0 new comments
ImportError: libcupti.so.11.2: cannot open shared object file: No such file or directory
#88802 commented on Jul 6, 2025 • 0 new comments
Add `is_outputs_batched` param to `autograd.grad`
#156616 commented on Jul 6, 2025 • 0 new comments
file_name is not correctly read in here
#157624 commented on Jul 7, 2025 • 0 new comments
Segmentation fault in torch.repeat_interleave
#157097 commented on Jul 7, 2025 • 0 new comments
Incorrect inference of the groups parameter type for channel_stuffle (int misclassified as Tensor)
#157603 commented on Jul 7, 2025 • 0 new comments
Deprecation of CUTLASS Python interface
#157456 commented on Jul 7, 2025 • 0 new comments
FlexAttention + int64 indexing
#157446 commented on Jul 7, 2025 • 0 new comments
nll_loss gives result when both input and target are 1D tensor
#157420 commented on Jul 7, 2025 • 0 new comments
Several `torch.*` functions raise uninformative `NotImplementedError`s when called with integer `dtype`
#157547 commented on Jul 7, 2025 • 0 new comments
DISABLED test_inductor_reduce_scatter_tensor_coalesced (__main__.CompileTest)
#147887 commented on Jul 7, 2025 • 0 new comments
DISABLED test_dont_aggressively_write_assert (__main__.ReproTests)
#156570 commented on Jul 7, 2025 • 0 new comments
[CI] Need better way to detect OOMs especially on pet instances
#157379 commented on Jul 7, 2025 • 0 new comments
[FSDP2] figure out the contract for mp_policy and tensor subclass extention
#157395 commented on Jul 7, 2025 • 0 new comments
Perf drop when running with FSDP and torch.compile
#156966 commented on Jul 8, 2025 • 0 new comments
NCCL out of memory error after updating to PyTorch 2.7
#152302 commented on Jul 8, 2025 • 0 new comments
`torch.compile` fails on `torch.vdot` with complex tensors
#157607 commented on Jul 8, 2025 • 0 new comments
Running dispatch modes on compile-disabled regions of a compiled model
#155825 commented on Jul 8, 2025 • 0 new comments
DDP+TP composition does not work as expected
#157445 commented on Jul 8, 2025 • 0 new comments
[dynamo] Replace `unimplemented` with `unimplemented_v2`
#147913 commented on Jul 8, 2025 • 0 new comments
General MPS op coverage tracking issue
#77764 commented on Jul 8, 2025 • 0 new comments
distributed/tensor/_op_schema has_symints does not check args_schema
#151106 commented on Jul 8, 2025 • 0 new comments
Shared `~/.cache/torch_extensions` needs to be pytorch version aware.
#68905 commented on Jul 8, 2025 • 0 new comments
[OSS tooling] pytorchbot fail to revert a PR
#156607 commented on Jul 8, 2025 • 0 new comments
RNN pseudocode wrong?
#157457 commented on Jul 8, 2025 • 0 new comments
Most requested ops for the MPS backend
#154052 commented on Jul 8, 2025 • 0 new comments
Is there some official method to extract the featuremap of each node in pt2 graph like the function torchvision.models.feature_extraction.create_feature_extractor()
#157625 commented on Jul 8, 2025 • 0 new comments
`version.txt` mismatch with tags in release branch
#151425 commented on Jul 8, 2025 • 0 new comments
FakeTensorUpdater doesn't support HOPs
#156819 commented on Jul 8, 2025 • 0 new comments
einops 0.6.1 x torch.compile broken in pytorch nightlies
#157417 commented on Jul 8, 2025 • 0 new comments
[Regression] The torchbench model resnet50_quantized_qat fail_to_run in Pytorch 2.8 but pass in PyTorch 2.7
#157434 commented on Jul 8, 2025 • 0 new comments
TorchInductor CPU Performance Dashboard
#93531 commented on Jul 7, 2025 • 0 new comments
Add the XPU item to pytorch.org/get-started
#156810 commented on Jul 7, 2025 • 0 new comments
functorch_maml_omniglot is a bad CPU performance smoketest model
#156511 commented on Jul 7, 2025 • 0 new comments
test_dtensor.py::test_dtensor_save_load_import conflicts with autoloader importing torch._dynamo
#157545 commented on Jul 7, 2025 • 0 new comments
Cannot copy data from one gpu to another using torch
#157398 commented on Jul 7, 2025 • 0 new comments
[WIP][RFC] Compilable flex_attention + Context Parallel
#157015 commented on Jul 7, 2025 • 0 new comments
[dynamo] using disable inside of compile always recompiles
#157399 commented on Jul 7, 2025 • 0 new comments
`torch.compile` fails on `prims.broadcast_in_dim` with alias annotation error
#157610 commented on Jul 7, 2025 • 0 new comments
Both DTensor TP and SP are missing the last collective in the backward pass
#157606 commented on Jul 7, 2025 • 0 new comments
`torch.compile` fails with `NotImplementedError: Unsupported for now if query, key, value are the same buffer.` in `flex_attention`
#157612 commented on Jul 7, 2025 • 0 new comments
`torch.export` ViT+flex attention: `Attempting to use FunctionalTensor on its own`
#140400 commented on Jul 8, 2025 • 0 new comments
Support reductions in FlexAttention's score_mod/mask_mod
#141627 commented on Jul 8, 2025 • 0 new comments
DISABLED test_cat_max_autotune_triton (__main__.TestMaxAutotune)
#145830 commented on Jul 8, 2025 • 0 new comments
DISABLED test_is_isnot (__main__.TestScript)
#120694 commented on Jul 8, 2025 • 0 new comments
DISABLED test_get_parameter_dtype (__main__.ReproTests)
#156598 commented on Jul 8, 2025 • 0 new comments
DISABLED test_add_sub_alpha_out (__main__.ReproTests)
#156597 commented on Jul 8, 2025 • 0 new comments
DISABLED test_ranks_and_tag (__main__.CompileTest)
#147974 commented on Jul 8, 2025 • 0 new comments
DISABLED test_graph_partition_custom_op (__main__.CudaGraphTreeTests)
#157413 commented on Jul 8, 2025 • 0 new comments
DISABLED test_remove_noop_slice1_cuda (__main__.GPUTests)
#151381 commented on Jul 11, 2025 • 0 new comments
DISABLED test_execution_into_recording (__main__.CudaGraphTreeTests)
#156838 commented on Jul 11, 2025 • 0 new comments
[release 2.8-2.9] Delete support for Maxwell, Pascal, and Volta architectures for CUDA 12.8 and 12.9 builds
#157517 commented on Jul 11, 2025 • 0 new comments
vLLM tests failing in torch 2.8rc but passing with torch 2.7
#157461 commented on Jul 11, 2025 • 0 new comments
Inductor output code source nodes is missing nodes for backwards graphs
#130147 commented on Jul 11, 2025 • 0 new comments
[Feature Request] Experimental support to Moore Threads GPU MUSA
#151303 commented on Jul 11, 2025 • 0 new comments
[RFC] PT2-Friendly Traceable, Functional Collective Communication APIs
#93173 commented on Jul 11, 2025 • 0 new comments
document functional_collectives
#113669 commented on Jul 11, 2025 • 0 new comments
Updated Scaled_mm to support more scaling formats via CuBlas
#153555 commented on Jul 11, 2025 • 0 new comments
support side effects in HOPs?
#124866 commented on Jul 11, 2025 • 0 new comments
non-negative least squares solver feature request
#48972 commented on Jul 11, 2025 • 0 new comments
a log_softmax kernel get much worse perf with padding
#122840 commented on Jul 11, 2025 • 0 new comments
`torch.compile` doesn't consider the alias tensor created by `tensor[:]`
#94773 commented on Jul 11, 2025 • 0 new comments
libTorch cpp docs missing for Tensor::item()
#41213 commented on Jul 11, 2025 • 0 new comments
compile PixArt-sigma error
#128012 commented on Jul 11, 2025 • 0 new comments
Add capturable Adagrad implementation
#118715 commented on Jul 11, 2025 • 0 new comments
Lowering after pointwise cat can lead to uncontiguous memory accesses
#124002 commented on Jul 11, 2025 • 0 new comments
[torch.compile]: Enhanced Error Reporting and Performance Canary Mode
#126644 commented on Jul 10, 2025 • 0 new comments
torch.view_copy(x, dtype) diverges from eager when the destiny dtype has less bytes than the origin
#129966 commented on Jul 10, 2025 • 0 new comments
torch.compile x custom ops: op that accepts float also accepts Tensor in eager-mode
#123470 commented on Jul 10, 2025 • 0 new comments
[RFC] Emit better Telemetry in PyTorch
#103173 commented on Jul 10, 2025 • 0 new comments
MPS Error on sequoia 15.3: NDArray dimension length > INT_MAX'
#146769 commented on Jul 11, 2025 • 0 new comments
Process never ends when sending tensors through multiprocessing queues in Python 3.12+ with filesystem strategy
#153050 commented on Jul 11, 2025 • 0 new comments
[inductor] Incorrect handle of `autocast` results in type mismatch
#121631 commented on Jul 11, 2025 • 0 new comments
switch more test cases to use MultithreadTestCase
#108744 commented on Jul 11, 2025 • 0 new comments
Change `automatic_dynamic_shapes` to trigger on `cache_size_limit` recompiles but not `accumulated_cache_size_limit` recompiles.
#114516 commented on Jul 11, 2025 • 0 new comments
[Torch Inductor] Torch Inductor Better Support for GNN workload and Inductor Sparse Compiler
#113232 commented on Jul 11, 2025 • 0 new comments
[Feature] Taylor expansion pruning
#157218 commented on Jul 11, 2025 • 0 new comments
[rocm] HIP Graph (on AMD GPU) capture does not raise `operation not permitted` for illegal operation whereas CUDA Graph (Nvidia GPU) does
#155684 commented on Jul 11, 2025 • 0 new comments
DISABLED test_comprehensive_nn_functional_conv_transpose3d_cuda_float32 (__main__.TestInductorOpInfoCUDA)
#148853 commented on Jul 11, 2025 • 0 new comments
Documentation Clarification Needed for Clamping of Scale Coefficient in clip_grads_with_norm_
#151554 commented on Jul 11, 2025 • 0 new comments
[Doc] [Win] libuv installation doc is not correct.
#148315 commented on Jul 11, 2025 • 0 new comments
Torch RPC examples from docs say usage is deprecated.
#149393 commented on Jul 11, 2025 • 0 new comments
Using Inductor always throws a warning
#154160 commented on Jul 11, 2025 • 0 new comments
[discussion] Analyzing a list of tensors stored as intermediate values / saved_for_backward in autograd graph
#91692 commented on Jul 11, 2025 • 0 new comments
DISABLED test_remove_noop_slice_cuda (__main__.GPUTests)
#151383 commented on Jul 11, 2025 • 0 new comments
DISABLED test_expanded_inputs (__main__.CudaGraphTreeTests)
#156886 commented on Jul 11, 2025 • 0 new comments
DISABLED test_inductor_all_gather_into_tensor_coalesced (__main__.CompileTest)
#146806 commented on Jul 11, 2025 • 0 new comments
DISABLED test_fallback_to_eager_if_recompiling_too_many_times (__main__.CudaGraphTreeTests)
#130749 commented on Jul 11, 2025 • 0 new comments
DISABLED test_op_has_batch_rule___rmatmul___cuda_float32 (__main__.TestVmapOperatorsOpInfoCUDA)
#157003 commented on Jul 11, 2025 • 0 new comments
DISABLED test_remove_noop_slice_scatter_cuda (__main__.GPUTests)
#151378 commented on Jul 11, 2025 • 0 new comments
DISABLED test_against_reference_multi_input_jacfwd_cuda (__main__.TestJacCUDA)
#156998 commented on Jul 11, 2025 • 0 new comments
DISABLED test_remove_noop_slice1_cpu (__main__.CpuTests)
#151379 commented on Jul 11, 2025 • 0 new comments
DISABLED test_inplace_on_view_makes_base_require_grad_cpu (__main__.TestAutogradDeviceTypeCPU)
#156209 commented on Jul 11, 2025 • 0 new comments
DISABLED test_remove_noop_slice_scatter_cpu (__main__.CpuTests)
#151382 commented on Jul 11, 2025 • 0 new comments
DISABLED test_remove_noop_slice_cpu (__main__.CpuTests)
#151384 commented on Jul 11, 2025 • 0 new comments
Move loop ordering after fusion
#126255 commented on Jul 11, 2025 • 0 new comments
[CD] Windows Wheel builds CUDA 12.9.1 Stack Overflow during build
#156181 commented on Jul 11, 2025 • 0 new comments
Triton kernel doing more work uses less registers
#126463 commented on Jul 11, 2025 • 0 new comments
[dynamo] Investigate interop issues with torch_scatter/torch_sparse/pyg_lib
#111223 commented on Jul 11, 2025 • 0 new comments
[DTensor] Improve `tensor_metadata` and `redistribute_cost` coverage for op strategies.
#157495 commented on Jul 11, 2025 • 0 new comments
FSDP offload doesn't prefetch param to GPU
#157209 commented on Jul 12, 2025 • 0 new comments
Tune whether to use mm or bmm for matmul in inductor max-autotune
#118774 commented on Jul 11, 2025 • 0 new comments
Torch compile fx graph is not removing constant propagation
#120057 commented on Jul 11, 2025 • 0 new comments
Use sys.settrace or torch function mode to compute how much of a model was not covered by Dynamo
#120079 commented on Jul 11, 2025 • 0 new comments
torch.compile Conv1d AssertionError
#123242 commented on Jul 11, 2025 • 0 new comments
Use Incremental Fake Tensor Updater more uniformly across torch.compile compilation
#120116 commented on Jul 11, 2025 • 0 new comments
[Inductor] Generate triton block pointers for discontiguous strided tensors
#125077 commented on Jul 11, 2025 • 0 new comments
opcheck should support TorchBind custom classes
#121162 commented on Jul 11, 2025 • 0 new comments
Match HuggingFace T5 SDPA pattern in Inductor
#121371 commented on Jul 11, 2025 • 0 new comments
Triton kernel unexpectedly gets 1.35x slower by more specializaiton
#120667 commented on Jul 11, 2025 • 0 new comments
[export] 14k models: AssertionError: graph-captured input # 2, of type <class 'torch.nn.parameter.Parameter'>, is not among original inputs of types
#111693 commented on Jul 11, 2025 • 0 new comments
[Inductor] Freezing Add support for Caching Parameter Conversions
#103990 commented on Jul 11, 2025 • 0 new comments
Rework Dynamic Benchmarks To Actually Vary Shapes
#113063 commented on Jul 11, 2025 • 0 new comments
Minifier doesn't work with dynamic shapes
#114296 commented on Jul 11, 2025 • 0 new comments
[pt2] Unable to trace LSTM with dynamic sequence length
#115092 commented on Jul 11, 2025 • 0 new comments
2 Dynamo test are failing with "Global state changed while dynamo tracing, please report a bug".
#120648 commented on Jul 11, 2025 • 0 new comments
Higher peak memory with torch.compile
#122512 commented on Jul 11, 2025 • 0 new comments
torch.compile doesn't convert all input scalar types to symbolic values
#119778 commented on Jul 11, 2025 • 0 new comments
Inconsistent Behavior of `torch.dsplit` with torch.compile
#118741 commented on Jul 11, 2025 • 0 new comments
Triton Kernel Rejects NamedTupleVariable Arguments
#148289 commented on Jul 10, 2025 • 0 new comments
Inconsistent export behavior for nonzero+grid_sample between CUDA and CPU/MPS backends
#152791 commented on Jul 10, 2025 • 0 new comments
DISABLED test_mm_plus_mm (__main__.TestPatternMatcher)
#145335 commented on Jul 10, 2025 • 0 new comments
[RFC] Offload collectives to NVSwitch when possible
#136567 commented on Jul 10, 2025 • 0 new comments
[Tracker] AutoParallel's feature request to DTensor
#156217 commented on Jul 10, 2025 • 0 new comments
SourcelessBuilder.create does not know how to wrap <class '__main__.InFlexData'>
#154009 commented on Jul 10, 2025 • 0 new comments
Is compilation caching for NumPy operators not supported in PyTorch 2.7.1?
#156943 commented on Jul 10, 2025 • 0 new comments
DISABLED test_end_recording_early (__main__.CudaGraphTreeTests)
#156778 commented on Jul 10, 2025 • 0 new comments
DISABLED test_inductor_inplace_op_on_view (__main__.CompileTest)
#147852 commented on Jul 10, 2025 • 0 new comments
DISABLED test_reentrant_parent_error_on_cpu_cuda (__main__.TestAutogradDeviceTypeCUDA)
#86735 commented on Jul 10, 2025 • 0 new comments
DISABLED test_dataclass_init_with_default_factory_with_inputs (__main__.ReproTests)
#156799 commented on Jul 10, 2025 • 0 new comments
FlopCounterMode doesn't support HOP
#134385 commented on Jul 10, 2025 • 0 new comments
Batched multi_dot / chain_matmul + let it accept a tensor instead of tuple
#55261 commented on Jul 10, 2025 • 0 new comments
`__getitem__` fails to vmap for one dimensional tensors
#124423 commented on Jul 10, 2025 • 0 new comments
DISABLED test_error_on_dealloc_use (__main__.CudaGraphTreeTests)
#156801 commented on Jul 10, 2025 • 0 new comments
DISABLED test_comprehensive_pca_lowrank_cuda_float32 (__main__.TestInductorOpInfoCUDA)
#139828 commented on Jul 10, 2025 • 0 new comments
DISABLED test_inductor_reduce_scatter_tensor_single (__main__.CompileTest)
#147911 commented on Jul 10, 2025 • 0 new comments
Outdated install commands
#152213 commented on Jul 10, 2025 • 0 new comments
PyTorch 2.6 License Issues
#150118 commented on Jul 9, 2025 • 0 new comments
[XPU User Empathy Day][whisper][Arc770][Win]XPU performance is worse than CPU
#151985 commented on Jul 9, 2025 • 0 new comments
xpu: missing aten ops needed to support Huggingface quanto
#132947 commented on Jul 9, 2025 • 0 new comments
Preload CUDA fails if CUDA libs in different PYTHONPATH
#147001 commented on Jul 9, 2025 • 0 new comments
Install pytorch from pypi using local CUDA build
#150742 commented on Jul 9, 2025 • 0 new comments
☂️ Update submodule dependencies to supported version of Cmake
#150328 commented on Jul 9, 2025 • 0 new comments
Remove Direct Arm Compute Libray (ACL) Integration for Quantized Matmuls: `qlinear`/`qlinear_dynamic`
#148902 commented on Jul 9, 2025 • 0 new comments
Multihead Attention does not work with jagged tensors due to __torch_function__
#153472 commented on Jul 9, 2025 • 0 new comments
[RFC] Experimental Wheel Variant Support
#155141 commented on Jul 9, 2025 • 0 new comments
Gross mismatch in PDF between CUDA and CPU for multivariate Gaussian mixture models
#156959 commented on Jul 9, 2025 • 0 new comments
[ONNX] Create a tutorial for exporting hf transformers model
#156258 commented on Jul 9, 2025 • 0 new comments
`bytes(...)` support of torch tensor does not match numpy + it would be nice to support tensor.tobytes() as alias
#108565 commented on Jul 9, 2025 • 0 new comments
cpp_extension.py expects an integer on CUDA_ARCH, failing with Grace Hopper.
#144037 commented on Jul 9, 2025 • 0 new comments
torch.nn.InstanceNorm2d throws "mixed dtype" error with track_running_stats set to True
#139140 commented on Jul 9, 2025 • 0 new comments
Failure with cub::TransformInputIterator in 12.9 periodic CI test
#157502 commented on Jul 9, 2025 • 0 new comments
Matmul with int32 parameters on Intel GPU leads to errors
#144766 commented on Jul 9, 2025 • 0 new comments
Dump bytecode of resumption frames in tlparse
#136038 commented on Jul 9, 2025 • 0 new comments
Regression in llama2 model export
#157323 commented on Jul 10, 2025 • 0 new comments
ROCm, 7900 XTX: Pytorch FLASH_ATTENTION SDPA is 2.5x slower than MATH (fp16, head_dim 256, seqlen 4360, 12 heads)
#152595 commented on Jul 10, 2025 • 0 new comments
MPS Memory Leak
#154329 commented on Jul 10, 2025 • 0 new comments
torch._dynamo.mark_static_address refuses to work with nn.Parameter
#157221 commented on Jul 10, 2025 • 0 new comments
PyTorch source code build failed on some Windows 11 environment caused by C++ protocol buffer compiler
#143795 commented on Jul 10, 2025 • 0 new comments
[inductor][dynamic shapes] hugging face models fail while creating error guard
#157330 commented on Jul 10, 2025 • 0 new comments
An error occurs when ‘max_split_size_mb ’and ‘expandable_segments ’ are enabled at the same time.
#123548 commented on Jul 10, 2025 • 0 new comments
Segmentation faults in test_ops.py tests with gcc13 on AArch64 (v1)
#157626 commented on Jul 10, 2025 • 0 new comments
[feature request] Native checkpointing to/from `s3://`
#155992 commented on Jul 10, 2025 • 0 new comments
ImportError: cannot import name 'scaled_mm_configs' from 'torch._inductor.kernel.mm_common
#157343 commented on Jul 10, 2025 • 0 new comments
[inductor][cpu]mobilenet_v2_quantized_qat float32 single thread static/dynamic shape CPP/default wrapper performance regression in 2024-04-28 nightly release
#125672 commented on Jul 10, 2025 • 0 new comments
torch.compile not compatible with multiprocessing pool
#97992 commented on Jul 10, 2025 • 0 new comments
Higher train loss and worse evaluation metrics when using `torch.compile()`
#113180 commented on Jul 10, 2025 • 0 new comments
Different behaviors in `torch.nn.functional.hinge_embedding_loss` between eagermode and torch.compile
#118175 commented on Jul 10, 2025 • 0 new comments
CompiledFxGraph.current_callable is not thread-safe
#138961 commented on Jul 10, 2025 • 0 new comments
In Inductor-wrapped tests, reset() before and after each test and turn off suppress_errors=True
#122804 commented on Jul 10, 2025 • 0 new comments
Out of bounds error with `nn.MultiMarginLoss`
#105597 commented on Jul 10, 2025 • 0 new comments
Undefined symbol: cuOccupancyMaxActiveClusters
#115075 commented on Jul 10, 2025 • 0 new comments
DISABLED test_inductor_reuse_buffer_after_inplace_collective (__main__.CompileTest)
#147950 commented on Jul 10, 2025 • 0 new comments
DISABLED test_error_on_dealloc_use2 (__main__.CudaGraphTreeTests)
#156808 commented on Jul 10, 2025 • 0 new comments
DISABLED test_deferred_runtime_asserts (__main__.ReproTests)
#156817 commented on Jul 10, 2025 • 0 new comments
[torch.compile] tighten FX graph restrictions post-functionalization
#133250 commented on Jul 10, 2025 • 0 new comments
[proposal] "Name" string attribute for modules, parameters, buffers, tensors for more pleasant debugging (especially for graph printouts / export / studying compiled generated code)
#104247 commented on Jul 10, 2025 • 0 new comments
empty_cache does not work for CUDAPluggableAllocator + MemPool
#145168 commented on Jul 10, 2025 • 0 new comments
[ONNX] Flip `dynamo` default to True in torch.onnx.export
#151693 commented on Jul 10, 2025 • 0 new comments
[MPS] Migrate torch.sort to Metal shader
#155560 commented on Jul 10, 2025 • 0 new comments
StrideAPI caused regression in channels-last logic
#141836 commented on Jul 10, 2025 • 0 new comments
MPS Performance regressions on Sonoma 14.0
#111517 commented on Jul 10, 2025 • 0 new comments
MaxPool2D memory leakage on device MPS
#125217 commented on Jul 10, 2025 • 0 new comments
[MPS] BatchNorm2D produces incorrect results for column first tensors
#134580 commented on Jul 10, 2025 • 0 new comments
Inefficient 2D convolution compared to JAX
#157334 commented on Jul 10, 2025 • 0 new comments
Fix broken linalg unittests on ARM platform
#125438 commented on Jul 10, 2025 • 0 new comments
torch.compile bug when using resize
#155209 commented on Jul 10, 2025 • 0 new comments
Compilation issues with ROCm 6.4.1 on Debian 12
#155794 commented on Jul 10, 2025 • 0 new comments
PyTorch CPP Extensions fail when same kernel is compiled more than once on ROCm servers
#155344 commented on Jul 10, 2025 • 0 new comments
DISABLED test_slice_scatter_reinplace_cuda (__main__.GPUTests)
#145189 commented on Jul 10, 2025 • 0 new comments

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Alternative Proxies:

Alternative Proxy