-
Notifications
You must be signed in to change notification settings - Fork 24.7k
Insights: pytorch/pytorch
Overview
Could not load contribution data
Please try again later
21 Pull requests merged by 7 people
-
[aarch64] Add sm_80 to CUDA SBSA build
#158118 merged
Jul 11, 2025 -
[user triton] AOT inductor support for device-side TMA
#157241 merged
Jul 11, 2025 -
Add sm_70 arch for linux cuda 12.8 and 12.9 builds
#157968 merged
Jul 11, 2025 -
Revert "Turn on compile with NVSHMEM (#154538)"
#158040 merged
Jul 11, 2025 -
Revert "Add NVSHMEM to PYTORCH_EXTRA_INSTALL_REQUIREMENTS (#154568)"
#158039 merged
Jul 11, 2025 -
[cherry-pick] revert #156552
#156767 merged
Jul 10, 2025 -
cherrypick revert of #152932 for release 2.8
#158031 merged
Jul 10, 2025 -
[inductor][user triton] sanitize triple-quoted docstrings in kernel definitions
#157454 merged
Jul 9, 2025 -
[release] Triton pin update to 3.4
#157752 merged
Jul 8, 2025 -
Bump urllib3 from 2.2.2 to 2.5.0 in /tools/build/bazel
#156390 merged
Jul 8, 2025 -
[inductor][static launcher] Skip correctness test for test_floats
#157200 merged
Jul 7, 2025 -
[ONNX] Bump onnxscript api for torch 2.8
#157137 merged
Jul 7, 2025 -
Fix macOS build with
USE_MPS=OFF
#156932 merged
Jul 7, 2025 -
[dynamo] do not issue lru_cache warning for functions in the top-level torch namespace
#157718 merged
Jul 7, 2025 -
[dynamo] Fix source for lru_cache method
#157308 merged
Jul 7, 2025 -
[cherry-pick] Organize BUCK for torch/standalone and Rename torch::standalone to headeronly
#157418 merged
Jul 7, 2025 -
[PowerPC] Fixed build issue for vsx vec256 complexfloat and scaled_mm_out_cpu
#157422 merged
Jul 7, 2025 -
[ONNX] Fix conversion of attention - 4D
#157509 merged
Jul 7, 2025 -
[dynamo] Fix bug in dict(mapping_proxy)
#157515 merged
Jul 7, 2025 -
[cherry-pick] [fake tensor] fix issue of no attribute tags (#156689)
#157519 merged
Jul 7, 2025 -
Add einops x torch.compile testing in PyTorch CI (#157416)
#157588 merged
Jul 7, 2025
185 Pull requests opened by 106 people
-
[pt2 event logging] add configurable prefix
#157678 opened
Jul 6, 2025 -
[Inductor][Float8] Add float8_e4m3fn into assertion dtype list.
#157684 opened
Jul 7, 2025 -
[BE] add `SHFMT` linter to format shell scripts
#157685 opened
Jul 7, 2025 -
[BE][1/4] format shell scripts with `SHFMT`
#157686 opened
Jul 7, 2025 -
[BE][2/4] format shell scripts with `SHFMT` in .circleci/ and .github/
#157687 opened
Jul 7, 2025 -
[BE][3/4] format shell scripts with `SHFMT` in .ci/
#157688 opened
Jul 7, 2025 -
[BE][4/4] format shell scripts with `SHFMT` in scripts/
#157689 opened
Jul 7, 2025 -
[inductor] initial triton static config lookup table
#157699 opened
Jul 7, 2025 -
wip, lookup table for reduction configs
#157700 opened
Jul 7, 2025 -
Enhance APoT quantizer
#157710 opened
Jul 7, 2025 -
Add support moviepy 2.x
#157712 opened
Jul 7, 2025 -
Avoid writing temporary modules to disk
#157713 opened
Jul 7, 2025 -
[c10d] Prototype of `group_split` for dist2 work
#157716 opened
Jul 7, 2025 -
Address NaNs if SDPA is called with all values masked from query
#157727 opened
Jul 7, 2025 -
Expose opt_einsum in torch.backends
#157740 opened
Jul 7, 2025 -
Fuse matmul
#157743 opened
Jul 7, 2025 -
Migrate pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc11 -> pytorch-linux-jammy-cuda12.8-cudnn9-py3-gcc13
#157748 opened
Jul 7, 2025 -
[export] Update docs
#157750 opened
Jul 8, 2025 -
Fix einops x torch.compile interaction
#157754 opened
Jul 8, 2025 -
[SymmMem] Find NVSHMEM from system installation
#157755 opened
Jul 8, 2025 -
[SymmMem] find_path does not search /usr/local/lib
#157765 opened
Jul 8, 2025 -
Ensure large tensor int32 -> int64 indexing is enabled
#157767 opened
Jul 8, 2025 -
[inductor][lookup_table] log entries
#157768 opened
Jul 8, 2025 -
Enable _int_mm on Intel GPU
#157769 opened
Jul 8, 2025 -
[BE]: Fix NVSHMEM builds, add missing 12.9 dependency and update to latest for 2.8RC
#157774 opened
Jul 8, 2025 -
[Test][Do Not Merge] Update Ideep to latest oneDNN commit
#157782 opened
Jul 8, 2025 -
S390X: fix detection of magic number placeholder in inductor
#157784 opened
Jul 8, 2025 -
[BE]: Reduce binary size 40% using aggressive fatbin compression.
#157791 opened
Jul 8, 2025 -
Add flag to fx.passes.split_module to normalize input names
#157793 opened
Jul 8, 2025 -
[PP] Allow schedules to run under torch.no_grad()
#157795 opened
Jul 8, 2025 -
[claude-code] Add top-level module doc for torch/distributed/tensor/_op_schema.py
#157804 opened
Jul 8, 2025 -
feat(dynamo): raise UnsupportedError for ndarray.astype(object)
#157810 opened
Jul 8, 2025 -
Fix broken docs requirements symlink
#157811 opened
Jul 8, 2025 -
Allow docker builds to deal with symlinks
#157812 opened
Jul 8, 2025 -
Make functorch notebook symlinks PEP 517 valid
#157813 opened
Jul 8, 2025 -
Improve MANIFEST.in for source distribution
#157814 opened
Jul 8, 2025 -
Add PEP 517 compliant Python source distribution to release process
#157815 opened
Jul 8, 2025 -
[canary] dedupe args + on by default
#157817 opened
Jul 8, 2025 -
[Inductor] [Triton] Enabling TMA for flex-attention for supported device types
#157822 opened
Jul 8, 2025 -
[WIP][Inductor][Intel GPU] Always use channel last for only freezing mode.
#157828 opened
Jul 8, 2025 -
[SymmMem] Avoid library mismatch in CMake search
#157836 opened
Jul 8, 2025 -
partial reads
#157838 opened
Jul 8, 2025 -
remove allow-untyped-defs from torch/ao/pruning/sparsifier/nearly_diagonal_sparsifier.py
#157846 opened
Jul 8, 2025 -
remove allow-untyped-defs from torch/_higher_order_ops/run_const_graph.py
#157847 opened
Jul 8, 2025 -
remove allow-untyped-defs from torch/ao/nn/intrinsic/quantized/dynamic/modules/linear_relu.py
#157848 opened
Jul 8, 2025 -
remove allow-untyped-defs from torch/fx/passes/backends/cudagraphs.py
#157849 opened
Jul 8, 2025 -
remove allow-untyped-defs from torch/utils/tensorboard/_onnx_graph.py
#157850 opened
Jul 8, 2025 -
remove allow-untyped-defs from torch/nn/utils/_expanded_weights/linear_expanded_weights.py
#157851 opened
Jul 8, 2025 -
remove allow-untyped-defs from torch/ao/quantization/experimental/apot_utils.py
#157852 opened
Jul 8, 2025 -
remove allow-untyped-defs from torch/autograd/_functions/utils.py
#157853 opened
Jul 8, 2025 -
Make compiled mode broadcasting use 0 strides
#157854 opened
Jul 8, 2025 -
Add functions to setup PrivateUse1 as a python backend device.
#157859 opened
Jul 8, 2025 -
ScannedModule
#157864 opened
Jul 8, 2025 -
[Draft][CUDA][CI] Test B200 Runner with Nightly Inductor Perf Test
#157870 opened
Jul 9, 2025 -
[MPS] Improve performance of max_pool3d
#157875 opened
Jul 9, 2025 -
[MPS] Move max_pool2d to Metal
#157876 opened
Jul 9, 2025 -
[Inductor][Triton] Update TMA Compatibility Requirements
#157881 opened
Jul 9, 2025 -
use maybe_mark_dynamic instead of mark_dynamic for -dynamic-batch-only option
#157885 opened
Jul 9, 2025 -
Back out "[Inductor] Fix epilogue fusion decision with 1 Triton caller as choice"
#157887 opened
Jul 9, 2025 -
[profiler] update CUDA runtime kernel identification logic
#157890 opened
Jul 9, 2025 -
Partitioner: Fix to align partition node order with original graph
#157892 opened
Jul 9, 2025 -
[CPU] Support GQA for flash attention
#157893 opened
Jul 9, 2025 -
[Inductor] optimize welford reduction
#157902 opened
Jul 9, 2025 -
[inductor] [cpu] fix the dype hardcoded to int64 in store_reduction
#157904 opened
Jul 9, 2025 -
Fix logdet returning finite values for singular matrices on CUDA
#157910 opened
Jul 9, 2025 -
[autograd] Avoid creating and recording event when unnecessary
#157914 opened
Jul 9, 2025 -
Normalize placeholder names in AOTAutogradCache
#157916 opened
Jul 9, 2025 -
Add type assert for tensor_meta, based on real bug in autoparallel.
#157927 opened
Jul 9, 2025 -
[WIP] try compress-mode balance on CUDA12.6
#157928 opened
Jul 9, 2025 -
Reduce random reads for offset metadata when calling torch.load under FakeTensorMode
#157931 opened
Jul 9, 2025 -
Adding a change to kick off the theme pull
#157932 opened
Jul 9, 2025 -
[WIP] WindowsArm64 CI changes
#157935 opened
Jul 9, 2025 -
Add cuda 12.9 periodic tests
#157939 opened
Jul 9, 2025 -
Fix early return in `_EmptyStateDictLoadPlanner`
#157940 opened
Jul 9, 2025 -
Small fix to the release-feature-request.yml
#157941 opened
Jul 9, 2025 -
[WIP][dynamic shapes] unbacked-safe slicing
#157944 opened
Jul 9, 2025 -
[AOTI][CPP] add flag TORCHINDUCTOR_CPP_FORCE_INLINE_KERNEL
#157949 opened
Jul 9, 2025 -
Allow dynamic shapes for DTensor slice
#157953 opened
Jul 9, 2025 -
Refactor AOTInductorModelBase for Device Interface Abstraction
#157954 opened
Jul 9, 2025 -
Add method get_fqn_to_index_mapping_from_metadata for HF storage reader
#157956 opened
Jul 9, 2025 -
Add cuda 12.4 build in CI
#157958 opened
Jul 9, 2025 -
improve typing in torch/test
#157962 opened
Jul 9, 2025 -
[DCP][Prototype] Checkpoint replication via PGTransport
#157963 opened
Jul 9, 2025 -
Ck recipe
#157964 opened
Jul 9, 2025 -
[WIP]What does it take to avoid specializations in this program with out unbacked semantics,
#157965 opened
Jul 9, 2025 -
[do not review][benchmark] test all_gather communication time
#157967 opened
Jul 9, 2025 -
[dynamo] Support more basic output types for `nonstrict_trace`
#157969 opened
Jul 9, 2025 -
[Do not land] Test NVSHMEM build
#157970 opened
Jul 9, 2025 -
[dynamo, nested graph breaks] implement new resume frame stack/locals/cell layout convention
#157971 opened
Jul 9, 2025 -
[Dynamo][Hierarchical Compile] Allow parameters to be propagated to submodules
#157979 opened
Jul 9, 2025 -
Define header type directly instead of using DeviceStreamType
#157982 opened
Jul 10, 2025 -
[dynamo, docs] programming model dynamo core concepts
#157985 opened
Jul 10, 2025 -
[Dynamo] Use proper sources for constructing dataclass defaults
#157993 opened
Jul 10, 2025 -
[test][do not merge] Upgrade oneDNN to v3.9
#157994 opened
Jul 10, 2025 -
Slightly improve error message from repeat_interleave kernel
#157996 opened
Jul 10, 2025 -
[WIP][export] turn on backed_size_oblivious
#158004 opened
Jul 10, 2025 -
Bump requests from 2.32.2 to 2.32.4 in /tools/build/bazel
#158006 opened
Jul 10, 2025 -
Enable nightly PT2 benchmark on B200
#158011 opened
Jul 10, 2025 -
[inductor] consolidate common GEMM triton param retrieval
#158015 opened
Jul 10, 2025 -
remove unnecessary sync point in AveragedModel update
#158017 opened
Jul 10, 2025 -
Update upstream opinfo to generate appropriately scaled sample inputs
#158018 opened
Jul 10, 2025 -
[TESTING] Update pin to disable AMD triton buffer ops
#158019 opened
Jul 10, 2025 -
Add property and setter annotations for Tensor attribute stubs
#158020 opened
Jul 10, 2025 -
[build] remove `wheel` from build requirements
#158027 opened
Jul 10, 2025 -
[precompile][wip] Increment frame and add compile ids when loading packages
#158028 opened
Jul 10, 2025 -
Migrate c10/macros/cmake_macros.h.in to torch/headeronly
#158035 opened
Jul 10, 2025 -
Support DeepSeek-style blockwise scaling scaled-mm for fp8 on Hopper+
#158037 opened
Jul 10, 2025 -
Tag CPython test files with the commit or tag they were copied from.
#158038 opened
Jul 10, 2025 -
[1/N] support of replication fallback strategy
#158046 opened
Jul 10, 2025 -
Move AOTI static linkage header file generation to cpp_wrapper_cpu
#158047 opened
Jul 10, 2025 -
Still run TritonBundler with BundledAOTAutogradCache, save autotune results
#158048 opened
Jul 10, 2025 -
Enable tracing `LOAD_BUILD_CLASS` on CPython tests
#158049 opened
Jul 10, 2025 -
Documentation Fix: torch.empty_like memory preservation
#158050 opened
Jul 10, 2025 -
[DTensor] have split_strategy return OpStrategy instead of TupleStrategy
#158051 opened
Jul 10, 2025 -
[cutlass backend][BE] remove force disable cache in tests
#158053 opened
Jul 10, 2025 -
[docs, dynamo] add fullgraph=True, common graph breaks docs
#158055 opened
Jul 10, 2025 -
[dict] Support `dict.update()` with no args
#158061 opened
Jul 10, 2025 -
[simple_fsdp][inductor_collectives] rewrite reorder_collectives, sink_waits_iterative
#158062 opened
Jul 10, 2025 -
[WIP] Rewrite reorder collectives/sink_wait to preserve peak memory
#158063 opened
Jul 10, 2025 -
adding types to nn module init
#158065 opened
Jul 10, 2025 -
[hop] call mode.__dispatch__ when no mode reigstered for the hop
#158067 opened
Jul 10, 2025 -
c10d inductor tests: do not re initialize dist process group
#158068 opened
Jul 10, 2025 -
Add torch compile force disable caches alias
#158072 opened
Jul 10, 2025 -
[benchmarks] Add scalar loss as model output when training
#158074 opened
Jul 10, 2025 -
Grab bag of (mostly) typing improvements
#158075 opened
Jul 10, 2025 -
[hop] add supports_higher_order_operators flag to TorchDispatchMode
#158077 opened
Jul 10, 2025 -
Remove references to TorchScript in PyTorch docs.
#158079 opened
Jul 10, 2025 -
updated test cases to use MultithreadTestCase
#158082 opened
Jul 11, 2025 -
[pickle] Add polyfills for pickle
#158084 opened
Jul 11, 2025 -
Refactor and Improve the OpenReg Module
#158090 opened
Jul 11, 2025 -
[inductor] add template hashing for template lookup table
#158091 opened
Jul 11, 2025 -
[simplefsdp auto-bucketing] add ir node bucket helper function
#158097 opened
Jul 11, 2025 -
[simplefsdp auto-bucketing] add ir node reorder helper function
#158098 opened
Jul 11, 2025 -
cache sympy.expand in sizevars simplify
#158099 opened
Jul 11, 2025 -
try cache_on_self on ir.py Layout class __str__ method
#158100 opened
Jul 11, 2025 -
[DO NOT MERGE] Stress test #1 new capacity.
#158102 opened
Jul 11, 2025 -
[DO NOT MERGE] Stress test #2 new capacity.
#158103 opened
Jul 11, 2025 -
[build] pin `setuptools>=77` to enable PEP 639
#158104 opened
Jul 11, 2025 -
[BE][Easy] split build system `requirements.txt` to a separate file
#158111 opened
Jul 11, 2025 -
[DTensor][BE] imporve DTensor ops correctness check utils
#158112 opened
Jul 11, 2025 -
Fix AArch64 segfaults by disabling strict-aliasing in GridSamplerKernel for GCC 12 and above
#158117 opened
Jul 11, 2025 -
[inductor_collectives] brute force preserve peak memory for sink_waits
#158119 opened
Jul 11, 2025 -
Clarify that Store.check() was added in PyTorch v2.8
#158124 opened
Jul 11, 2025 -
Add sm_70 to windows 12.9 build
#158126 opened
Jul 11, 2025 -
[Optimus][fp8_activation_quantization] Only log when there's some node to be quantized
#158129 opened
Jul 11, 2025 -
[DTensor] Assert DTensorSpec has valid placements
#158133 opened
Jul 11, 2025 -
[Inductor] addmm + activation function fusion
#158137 opened
Jul 11, 2025 -
Standalone compile API in _Exporter
#158139 opened
Jul 11, 2025 -
[testing only] Layout optimization on
#158141 opened
Jul 11, 2025 -
[testing only] Layout optimization off
#158142 opened
Jul 11, 2025 -
Fix grouped MM output strides when compiled but not max-autotuned
#158143 opened
Jul 11, 2025 -
Use a fixed size of a buffer in ShufflerIterDataPipe to not use append() and len()
#158144 opened
Jul 11, 2025 -
[DO NOT MERGE] Add volta tests to periodic and pull
#158145 opened
Jul 11, 2025 -
dist2: add support for passing custom configs directly to PG
#158147 opened
Jul 11, 2025 -
[DRAFT][DELETE] Y do we have 7 build systems
#158148 opened
Jul 11, 2025 -
Avoid AOTAutogradCache.load in stack trace on cache miss path
#158149 opened
Jul 11, 2025 -
Inline dispatch_and_compile into its call site.
#158150 opened
Jul 11, 2025 -
Modify c10::complex for CUDA 12.9 Win OOM
#158151 opened
Jul 11, 2025 -
[CI] Update mobile build docker image
#158153 opened
Jul 11, 2025 -
For discussion
#158156 opened
Jul 11, 2025 -
[WIP] _convert_element_type_meta test what fails
#158157 opened
Jul 11, 2025 -
[cutlass backend] cache a few things for codegen and properties
#158158 opened
Jul 11, 2025 -
Fix torchrec multiprocess tests
#158159 opened
Jul 11, 2025 -
Add transpose to torch/csrc/stable
#158160 opened
Jul 11, 2025 -
[CI] Move main branch rocm binary builds to its own workflow
#158161 opened
Jul 11, 2025 -
[CI] Do not run inductor rocm on ciflow/inductor
#158162 opened
Jul 11, 2025 -
[CI][TD] Enable TD on all test configs
#158163 opened
Jul 11, 2025 -
[ROCm] Fix tensor.item() for ROCm
#158165 opened
Jul 11, 2025 -
[SymmMem] Fix NCCL Hang in NVSHMEM Triton Wait Until Test
#158167 opened
Jul 11, 2025 -
[scan][cse] avoid cse zeros like gradient buffers
#158168 opened
Jul 12, 2025 -
add eq function to NodeSource
#158170 opened
Jul 12, 2025 -
[dynamo] turn off sys.monitoring if eval_frame is set
#158171 opened
Jul 12, 2025 -
Pipeline _create_aot_dispatcher_function
#158173 opened
Jul 12, 2025 -
Add inputs and outputs in Triton Kernel FX Graph segment
#158174 opened
Jul 12, 2025 -
Hoist choose_dispatcher to top level, remove unnecessary returns
#158176 opened
Jul 12, 2025 -
Update pr_time_benchmarks/expected_results.csv
#158177 opened
Jul 12, 2025 -
[BE] Move repeated code into helper functions
#158178 opened
Jul 12, 2025 -
[MPS] Extend atomic operations to more int types
#158179 opened
Jul 12, 2025 -
don't error out in empty_cache under mempool context
#158180 opened
Jul 12, 2025 -
Reproduce issue from #156097
#158181 opened
Jul 12, 2025 -
[WIP][Inductor] Add cpu_max_other_dimension_decomposition for decompose_mm_pass
#158183 opened
Jul 12, 2025 -
Fix compilation and "import torch" issues for cpython 3.14
#158184 opened
Jul 12, 2025 -
Remove unnecessary CMake checks
#158185 opened
Jul 12, 2025
163 Issues closed by 39 people
-
Inconsistent behavior between eager and compiled mode for `F.conv_transpose2d`
#157909 closed
Jul 12, 2025 -
Question: Is it expected that `QuantStub` and `DeQuantStub` are skipped in `torch.compile`?
#157998 closed
Jul 12, 2025 -
Discrepancy between Numpy and PyTorch advanced indexing
#158134 closed
Jul 12, 2025 -
torch.compile with numpy code differs from numpy's behavior
#157569 closed
Jul 12, 2025 -
dynamo+numpy 2.0 issue: TypeError: 'numpy.bool' object cannot be interpreted as an integer
#157973 closed
Jul 12, 2025 -
Slicing of large tensors is wrong on MPS
#153560 closed
Jul 11, 2025 -
torch.compile produces incorrect output
#155690 closed
Jul 11, 2025 -
Bug in cmake/public/cuda.cmake: Incorrect use of set(${...}) leads to missing CUDA version in error message
#157354 closed
Jul 11, 2025 -
[inductor] TorchInductor does not correctly recognize the grad status of model code
#125474 closed
Jul 11, 2025 -
DISABLED test_op_has_batch_rule_linalg_vecdot_cuda_float32 (__main__.TestVmapOperatorsOpInfoCUDA)
#142924 closed
Jul 11, 2025 -
DISABLED test_forward_backward (__main__.CudaGraphTreeTests)
#156957 closed
Jul 11, 2025 -
DISABLED test_sdpa_with_packed_in_proj_cuda_float32 (__main__.TestNestedTensorSubclassCUDA)
#120029 closed
Jul 11, 2025 -
DISABLED test_invalid_status_for_legacy_api (__main__.TestCuda)
#157110 closed
Jul 11, 2025 -
DISABLED test_remove_noop_view_default_cpu (__main__.CpuTests)
#151512 closed
Jul 11, 2025 -
DISABLED test_foreach_reduce_large_input__foreach_max_w_empty_False_cuda_bfloat16 (__main__.TestForeachCUDA)
#150932 closed
Jul 11, 2025 -
Disable bot spamming
#117132 closed
Jul 11, 2025 -
Optimizers should use learning rates passed as tensors directly
#106802 closed
Jul 11, 2025 -
Support TorchBind x PT2
#121266 closed
Jul 11, 2025 -
38 Dynamo test are failing with "BuiltinVariable.tensor_args() got multiple values for argument 'self'".
#120643 closed
Jul 11, 2025 -
Track the accurate check regress for DebertaForQuestionAnswering and nanogpt
#122987 closed
Jul 11, 2025 -
Tracking issue for PT2 dashboard training passrate
#120129 closed
Jul 11, 2025 -
[Dynamo] Prints, logging, and warnings
#93739 closed
Jul 11, 2025 -
torch.compile slower than eager on simple MLP
#119611 closed
Jul 11, 2025 -
Compilation Failure of torch.special.exp2 in torch.compile Optimized Mode
#112495 closed
Jul 11, 2025 -
Compilation Failure of torch.cumsum in torch.compile Optimized Mode
#112492 closed
Jul 11, 2025 -
Compile targts cuda:0 rather than the device the model is on
#97693 closed
Jul 11, 2025 -
explain() has confusing explanation of graph breaks
#93656 closed
Jul 11, 2025 -
[inductor] `aten.index_put_` runtime shape mismatch on H100 but not on A100
#126614 closed
Jul 11, 2025 -
AOT inductor should generate source code instead of a library
#115965 closed
Jul 11, 2025 -
DISABLED test_op_has_batch_rule_inner_cuda_float32 (__main__.TestVmapOperatorsOpInfoCUDA)
#157090 closed
Jul 11, 2025 -
DISABLED test_fallback_to_eager_if_recompiling_too_many_times_warn_only_once (__main__.CudaGraphTreeTests)
#156954 closed
Jul 11, 2025 -
Massive initial memory overhead GPU
#12873 closed
Jul 11, 2025 -
Error in Qwen inference: NVML_SUCCESS == r INTERNAL ASSERT FAILED at "/pytorch/c10/cuda/CUDACachingAllocator.cpp
#157535 closed
Jul 11, 2025 -
DISABLED test_sdpfa_xpu.(__main__.TestUnbackedSymintsXPU)
#158095 closed
Jul 11, 2025 -
DISABLED test_foreach_check_stride_ignore_dims_of_one_cuda_float32 (__main__.TestForeachCUDA)
#150026 closed
Jul 11, 2025 -
DISABLED test_dynamic_scalar_cuda (__main__.AOTInductorTestABICompatibleGpu)
#156982 closed
Jul 11, 2025 -
DISABLED test_hessian_vectorize_correctness_multi_input_cuda (__main__.TestHessianCUDA)
#157059 closed
Jul 11, 2025 -
DISABLED test_op_has_batch_rule_dot_cuda_float32 (__main__.TestVmapOperatorsOpInfoCUDA)
#157069 closed
Jul 11, 2025 -
DISABLED test_corrcoef_cuda_float32 (__main__.TestTorchDeviceTypeCUDA)
#156987 closed
Jul 11, 2025 -
DISABLED test_inplace_on_view_non_contig_cpu (__main__.TestAutogradDeviceTypeCPU)
#156265 closed
Jul 11, 2025 -
DISABLED test_inplace_on_view_backprop_base_cpu (__main__.TestAutogradDeviceTypeCPU)
#156143 closed
Jul 11, 2025 -
DISABLED test_inplace_on_view_backprop_view_cpu (__main__.TestAutogradDeviceTypeCPU)
#156163 closed
Jul 11, 2025 -
DISABLED test_op_has_batch_rule_addmv_cuda_float32 (__main__.TestVmapOperatorsOpInfoCUDA)
#157037 closed
Jul 11, 2025 -
DISABLED test_inplace_on_view_undefined_grad_output_cpu (__main__.TestAutogradDeviceTypeCPU)
#156363 closed
Jul 11, 2025 -
DISABLED test_against_reference_multi_input_multi_output_jacfwd_cuda (__main__.TestJacCUDA)
#157036 closed
Jul 11, 2025 -
DISABLED test_inplace_on_view_then_no_grad_cpu (__main__.TestAutogradDeviceTypeCPU)
#156306 closed
Jul 11, 2025 -
DISABLED test_inplace_on_view_of_view_cpu (__main__.TestAutogradDeviceTypeCPU)
#156289 closed
Jul 11, 2025 -
DISABLED test_inplace_on_view_backprop_view_of_view_cpu (__main__.TestAutogradDeviceTypeCPU)
#156180 closed
Jul 11, 2025 -
DISABLED test_foreach_reduce_large_input__foreach_max_w_empty_False_cuda_int8 (__main__.TestForeachCUDA)
#156960 closed
Jul 11, 2025 -
miss doc for torch.segment_reduce
#153138 closed
Jul 11, 2025 -
torch.compile with flex_attention: 'ShapeAsConstantBuffer' object has no attribute 'dtype'
#157833 closed
Jul 11, 2025 -
test_generate_tensor_from_list_of_numpy_primitive_type fails if run under pytest
#103439 closed
Jul 11, 2025 -
tensor.to_sparse() handling indices incorrectly under dynamo/fake tensor
#93493 closed
Jul 11, 2025 -
[inductor] [dynamic shape] 5 HF models fails with `Constraints violated` using transformers v4.31.0
#107200 closed
Jul 11, 2025 -
ShapeEnv.produce_guards uses a lot of (CPU) memory
#118222 closed
Jul 11, 2025 -
export does not support boolean tensor indexing
#91990 closed
Jul 11, 2025 -
Dynamo can not trace 'int(a_scalar_tensor.item())'
#93515 closed
Jul 11, 2025 -
_assert_bound_is_rational can fail
#109514 closed
Jul 11, 2025 -
dlrm and hf_T5_generate fails aot_eager with bfloat16+dynamic_shapes
#103760 closed
Jul 11, 2025 -
Attempt to use minifier on sam model fails
#104301 closed
Jul 11, 2025 -
mark_dynamic may error too aggressively
#102814 closed
Jul 11, 2025 -
[inductor] test_fft_real_inputs fails with dynamic shapes
#103194 closed
Jul 11, 2025 -
Dynamic shapes exhaustive tests should fail (not xfail) if data mismatch
#87576 closed
Jul 11, 2025 -
SymIntType gets translated to int when going through pybind
#91753 closed
Jul 11, 2025 -
Functionalization does something wrong with pad backward when it uses as_strided
#87575 closed
Jul 11, 2025 -
Symbolic tensors are not printable
#82517 closed
Jul 11, 2025 -
test_make_fx_symbolic_exhaustive should pass dynamic ints for shape arguments
#82318 closed
Jul 11, 2025 -
[Inductor] AOTInductorTestABICompatibleGpu.test_on_gpu_device1_cuda fails
#157737 closed
Jul 11, 2025 -
torch.logdet produces incorrect results for singular matrices on CUDA vs CPU
#154312 closed
Jul 11, 2025 -
[Inductor] StableDiffusion unet with `cudagraphs` backend raises fake tensor mismatch error
#114525 closed
Jul 10, 2025 -
Custom backend not called for compiling backward graph
#114189 closed
Jul 10, 2025 -
[inductor] Assert that Inductor preserves output strides if `TORCHINDUCTOR_LAYOUT_OPTIMIZATION=0`
#114070 closed
Jul 10, 2025 -
[inductor] [silent incorrectness] Multiple internal `torch.rand` can lead to inconsistent results with eager
#151524 closed
Jul 10, 2025 -
Conjugate bit not handled properly in wrapped subclasses
#130646 closed
Jul 10, 2025 -
Enable an MPS benchmark
#115201 closed
Jul 10, 2025 -
DISABLED test_circular_dependencies (__main__.TestImports)
#110040 closed
Jul 10, 2025 -
TransformerEncoderLayer precision loss when fast path is enabled
#158012 closed
Jul 10, 2025 -
DISABLED test_mps_event_module (__main__.TestMPS)
#145052 closed
Jul 10, 2025 -
[export] run_decompositions generates inefficient operations
#157289 closed
Jul 10, 2025 -
`FractionalMaxPool3d` INTERNAL ASSERT FAILED when computing `jacrev`
#96316 closed
Jul 10, 2025 -
CONTRIBUTING.md install command incorrect
#157680 closed
Jul 10, 2025 -
Dynamo guard source not implemented due to int specialization
#157992 closed
Jul 10, 2025 -
DISABLED test_foreach_l2_large_value_input__foreach_norm_cuda_float16 (__main__.TestForeachCUDA)
#150509 closed
Jul 10, 2025 -
Wrong error message for wrong dtypes in `torch.binomial`
#157195 closed
Jul 10, 2025 -
Calling unbind on 2D NestedTensor throws RuntimeError
#157404 closed
Jul 10, 2025 -
XPU build failure with DLE 2025.1.0
#150047 closed
Jul 9, 2025 -
UNSTABLE rocm-mi300 / linux-jammy-rocm-py3.10-mi300 / test (default)
#156360 closed
Jul 9, 2025 -
ResNet Onnx export dynamic batch size exported as fixed batch size
#157621 closed
Jul 9, 2025 -
`FSDPModule.set_reduce_scatter_divide_factor` on subset of parameters is broken?
#157485 closed
Jul 9, 2025 -
MPSInductor (aka torch.compile for Apple GPUs)
#157957 closed
Jul 9, 2025 -
Triton pin update for PyTorch 2.8 / Triton 3.4
#154206 closed
Jul 9, 2025 -
High-performance LLM quantization on X86 CPU with native PyTorch
#155435 closed
Jul 9, 2025 -
[RFC][API-Unstable]A16W4 on XPU Device
#153019 closed
Jul 9, 2025 -
[RFC][API-Unstable] Support 3rd party SYCL kernels with CPP Extension API
#153265 closed
Jul 9, 2025 -
test_gradient_all Device Type test regression with Numpy >= 2.0.0
#132450 closed
Jul 9, 2025 -
`setup.py develop` command is disappearing soon from `setuptools`
#152276 closed
Jul 9, 2025 -
DISABLED test_foreach_l2_large_value_input__foreach_norm_cuda_bfloat16 (__main__.TestForeachCUDA)
#150467 closed
Jul 9, 2025 -
DISABLED test_graph_partition_reorder_custom_op_with_no_dependency1 (__main__.CudaGraphTreeTests)
#157900 closed
Jul 9, 2025 -
Add RMS Norm layer
#128713 closed
Jul 9, 2025 -
[inductor] [cpu] cpu inductor incorrectly processes `.to(torch.uint8)`, resulting in numerical inconsistency
#156788 closed
Jul 9, 2025 -
Inductor throws UnicodeDecodeError when compiling a simple model on Windows with MSVC
#157673 closed
Jul 9, 2025 -
[autograd] Slowdown in backward after #151079
#157407 closed
Jul 9, 2025 -
return type of torch.nn.functional.interpolate not working
#129053 closed
Jul 9, 2025 -
Build error: unrecognizable insn with using gcc-14 on aarch64
#157842 closed
Jul 9, 2025 -
DISABLED test_comprehensive_logcumsumexp_xpu_float16 (__main__.TestInductorOpInfoXPU)
#157697 closed
Jul 9, 2025 -
aten._cdist_backward
#105561 closed
Jul 9, 2025 -
aten.multilabel_margin_loss_backward
#105562 closed
Jul 9, 2025 -
Reflection_pad1d
#105566 closed
Jul 9, 2025 -
Move Inductor-specific decompositions to general decomposition registrations.
#105568 closed
Jul 9, 2025 -
Lowering topk to reductions and pointwise when k is small
#105569 closed
Jul 9, 2025 -
Using scans
#105570 closed
Jul 9, 2025 -
Add color-coding to fx graph readable printouts :)
#105572 closed
Jul 9, 2025 -
TorchInductor Hack-a-Day on July 19th
#105328 closed
Jul 9, 2025 -
How to compose HSDP with CP?
#157393 closed
Jul 9, 2025 -
[aarch64] Inductor benchmark - fail with IDEEP update
#157785 closed
Jul 8, 2025 -
torch.ops._c10d_functional_autograd.all_to_all_single missing dynamic shapes support
#157479 closed
Jul 8, 2025 -
PyTorch fails to detect AVX through it's detected
#157538 closed
Jul 8, 2025 -
extern declaration of the entity XXX is treated as a static definition
#157674 closed
Jul 8, 2025 -
PyTorch wheel binary size increase ~80mb
#150647 closed
Jul 8, 2025 -
DISABLED test_progressive (__main__.TestSubprocess)
#157787 closed
Jul 8, 2025 -
AOTI: Failure in compile_fx.py with FakeScriptObject (with possible fix)
#157401 closed
Jul 8, 2025 -
MPS internal assertion with jacfwd and concatenation
#152701 closed
Jul 8, 2025 -
The opp is not compatible with compile mode="reduce-overhead" and linear layers for large inputs.
#157363 closed
Jul 8, 2025 -
UNSTABLE trunk / macos-py3-arm64 / test (mps)
#156833 closed
Jul 8, 2025 -
DISABLED test_graph_partition_reorder_cpu_and_gpu (__main__.CudaGraphTreeTests)
#157760 closed
Jul 8, 2025 -
Win32 Build crashes on startup (C++).
#146240 closed
Jul 8, 2025 -
DISABLED test_non_contiguous_input_mm_plus_mm (__main__.TestMaxAutotune)
#126867 closed
Jul 8, 2025 -
DISABLED test_add_complex_conj (__main__.ReproTests)
#156579 closed
Jul 8, 2025 -
Vmap error raised by mask_mod of FlexAttention
#157543 closed
Jul 8, 2025 -
DISABLED test_graph_partition_forward_with_skipped_cudagraphed_backward (__main__.CudaGraphTreeTests)
#157722 closed
Jul 8, 2025 -
Cannot compile a block that contains Flex attention without graph breaks
#143163 closed
Jul 8, 2025 -
[FlexAttention] Support the number of shared query heads in GQA to not be the power of 2
#143117 closed
Jul 8, 2025 -
FlexAttention gives me an INTERNAL_ASSERT_FAILED during mask_mod
#140363 closed
Jul 8, 2025 -
Dynamo's einops version check is bogus
#157451 closed
Jul 7, 2025 -
test_tensor_with_grad_to_scalar_warning failure
#157252 closed
Jul 7, 2025 -
Cannot create a mask for each sequence in a batch with Flex Attention
#157675 closed
Jul 7, 2025 -
DISABLED test_foreach_reduce_large_input__foreach_max_w_empty_False_cuda_int64 (__main__.TestForeachCUDA)
#156612 closed
Jul 7, 2025 -
[RFC] Integrate NCCL scalable init API
#136539 closed
Jul 7, 2025 -
[FSDP2] set_reduce_scatter_divide_factor errors with non-trivial MixedPrecisionPolicy
#155223 closed
Jul 7, 2025 -
[CI] s390x-periodic tests broken with "No matching distribution found for cuda-bindings<13.0,>=12.0"
#157409 closed
Jul 7, 2025 -
pytorch
#157531 closed
Jul 7, 2025 -
DISABLED test_tracker_with_activation_checkpointing (__main__.TestTrackerFullyShard1DTrainingCompose)
#139814 closed
Jul 7, 2025 -
DISABLED test_tracker_non_root_forward_backward (__main__.TestTrackerFullyShard1DTrainingCore)
#129692 closed
Jul 7, 2025 -
DISABLED test_aoti (__main__.TestMemoryPlanning)
#145211 closed
Jul 7, 2025 -
DISABLED test_graph_partition_forward_backward_not_called (__main__.CudaGraphTreeTests)
#157642 closed
Jul 7, 2025 -
Will the Metal4 update bring significant optimizations for future pytorch mps performance and compatibility?
#157660 closed
Jul 6, 2025 -
`torch.compile` fails with `UnicodeDecodeError` when model contains extreme value injection
#156451 closed
Jul 6, 2025 -
torch.utils.cpp_extension fails to parse clang version 20.1.7+libcxx
#157665 closed
Jul 6, 2025 -
Mispelled "paramter" in test_fully_shard_training.py
#157564 closed
Jul 5, 2025
119 Issues opened by 66 people
-
Flash Attention 2 + Dynamo + FSDP accelerate plugin + torch.compile error
#158186 opened
Jul 12, 2025 -
UNSTABLE rocm-mi300 / linux-noble-rocm-py3.12-mi300 / test (default)
#158182 opened
Jul 12, 2025 -
`torch.fmin` has inconsistent overflow behavior on CPU and GPU
#158172 opened
Jul 12, 2025 -
Invalid `torch.linalg.lstsq` check during vmapped implementation
#158169 opened
Jul 12, 2025 -
`torch.nn.LPPool2d` throws inconsistent error on CPU and GPU
#158166 opened
Jul 11, 2025 -
[dynamo] Incompatibility with sys.monitoring
#158164 opened
Jul 11, 2025 -
`torch.native_channel_shuffle` throws FPE when `groups` > size of the second dimension of the input tensor
#158154 opened
Jul 11, 2025 -
DISABLED test_one_shot_all_reduce (__main__.SymmMemCollectiveTest)
#158138 opened
Jul 11, 2025 -
DISABLED test_cuda_nvlink_connectivity_detection (__main__.SymmetricMemoryTest)
#158136 opened
Jul 11, 2025 -
DISABLED test_allow_overlapping_devices (__main__.SymmetricMemoryTest)
#158135 opened
Jul 11, 2025 -
Eager success but inductor failed on torch.ops.aten._cudnn_rnn.default
#158131 opened
Jul 11, 2025 -
A "Merge-when-ready" label when ci finishes?
#158127 opened
Jul 11, 2025 -
index.Tensor doesn't properly account for boolean masks acting on multiple dimensions.
#158125 opened
Jul 11, 2025 -
[RFC] A Distributed CUDA Unified Memory Backend for PyTorch
#158122 opened
Jul 11, 2025 -
DISABLED test_gru (__main__.TestXNNPACKQuantizer)
#158116 opened
Jul 11, 2025 -
DISABLED test_conv_transpose_unary_fusion_ops (__main__.TestMkldnnFusion)
#158115 opened
Jul 11, 2025 -
DISABLED test_triton_signal_wait_until (__main__.NVSHMEMTritonTest)
#158114 opened
Jul 11, 2025 -
Combination of USE_MEM_EFF_ATTENTION and AOTRITON_INSTALLED_PREFIX misbehaves
#158109 opened
Jul 11, 2025 -
DISABLED test_vmap_exhaustive_nn_functional_conv_transpose3d_cuda_float32 (__main__.TestVmapOperatorsOpInfoCUDA)
#158108 opened
Jul 11, 2025 -
DISABLED test_pool_method_subprocess (__main__.TestAsyncCompile)
#158107 opened
Jul 11, 2025 -
DISABLED test_triton_quiet (__main__.NVSHMEMTritonTest)
#158106 opened
Jul 11, 2025 -
Exporting ONNX Model with Operator Name as Output Name Produces Invalid Model When dynamo=True
#158094 opened
Jul 11, 2025 -
kvstore Documentation Inconsistency: check() Method Claimed in v2.7 Docs But Missing in Implementation
#158093 opened
Jul 11, 2025 -
`torch.compile` doesn't properly raise eager fake tensor exception
#158088 opened
Jul 11, 2025 -
`torch.compile` ignores nan check in `int(torch.tensor(torch.nan))`
#158087 opened
Jul 11, 2025 -
`torch.compile` on `.sum() `and `.item()` calls errors from `tensorify_python_scalars`
#158083 opened
Jul 11, 2025 -
`torch.compile` errors with inductor raised `GuardOnDataDependentSymNode` exception
#158081 opened
Jul 11, 2025 -
loaded storage won't be cached during deserialization if location is meta
#158080 opened
Jul 10, 2025 -
torch.compile on BFloat16 Segment Anything segfaults in cpp_CppMicroGemmRef_micro_gemm<false, false> on Mac
#158076 opened
Jul 10, 2025 -
Mutating a tensor while serializing with safetensors crashes free-threaded PyTorch
#158071 opened
Jul 10, 2025 -
InductorError: CppCompileError: C++ compile error on a function with a single `item` call
#158060 opened
Jul 10, 2025 -
`torch.compile` on `torch.arange` hard errors with `PendingUnbackedSymbolNotFound`
#158058 opened
Jul 10, 2025 -
DISABLED test_generalized_sq_cases (__main__.TestSolve)
#158054 opened
Jul 10, 2025 -
Add GoLU Activation Function 🚀
#158043 opened
Jul 10, 2025 -
[inductor] grouped_mm is autotuning under torch.compile default mode
#158042 opened
Jul 10, 2025 -
pow_test fails on Aarch64
#158041 opened
Jul 10, 2025 -
extension-cpp claims ROCM support with no additional changes; we should test this.
#158032 opened
Jul 10, 2025 -
NCCL + cudagraphs + expandable segments result in IMA
#158029 opened
Jul 10, 2025 -
DISABLED test_vmap_exhaustive_nn_functional_conv_transpose2d_cuda_float32 (__main__.TestVmapOperatorsOpInfoCUDA)
#158025 opened
Jul 10, 2025 -
DISABLED test_pool_method_spawn (__main__.TestAsyncCompile)
#158024 opened
Jul 10, 2025 -
DISABLED test_triton_put_signal_set (__main__.NVSHMEMTritonTest)
#158023 opened
Jul 10, 2025 -
empty_like(tensor, memory_format=torch.preserve_format) does not preserve strides for views
#158022 opened
Jul 10, 2025 -
[2.8 regression] CUDAAllocator has BC-breaking changes
#158021 opened
Jul 10, 2025 -
dist.all_to_all_single, when input tensors of different shapes result in undefined output behavior
#158016 opened
Jul 10, 2025 -
DISABLED test_vmap_exhaustive_nn_functional_conv_transpose1d_cuda_float32 (__main__.TestVmapOperatorsOpInfoCUDA)
#158010 opened
Jul 10, 2025 -
DISABLED test_pool_method_fork (__main__.TestAsyncCompile)
#158009 opened
Jul 10, 2025 -
DISABLED test_triton_put_signal_add (__main__.NVSHMEMTritonTest)
#158008 opened
Jul 10, 2025 -
`torch.compile` cannot compile a model with a basic `LSTM`, even on latest main
#158007 opened
Jul 10, 2025 -
BUG: numpy very slow after import torch
#158005 opened
Jul 10, 2025 -
[inuctor] [triton] `torch.cumsum` outputs inconsistent results when meeting large tensors
#158003 opened
Jul 10, 2025 -
[XPU] torch.xpu.mem_get_info() query failed on BMG
#157989 opened
Jul 10, 2025 -
DISABLED test_triton_put (__main__.NVSHMEMTritonTest)
#157986 opened
Jul 10, 2025 -
support DTensor's local_map in compile
#157976 opened
Jul 9, 2025 -
[Inductor] Online softmax disabled due to reduction split – Unexpected performance warning
#157975 opened
Jul 9, 2025 -
Torch compile don't work correctly with divide by scalar
#157959 opened
Jul 9, 2025 -
[RFC]: PyTorch Low-Precision GEMMs Public API
#157950 opened
Jul 9, 2025 -
Dim argument of torch.max can currently be only int or name, but documentation says it can be int or tuple
#157948 opened
Jul 9, 2025 -
torch.compile fails with InternalTorchDynamoError when slicing torch.linalg.svd results
#157945 opened
Jul 9, 2025 -
torch.compile() disables torch.distributions parameter validation globally
#157926 opened
Jul 9, 2025 -
TypeError: message must be a callable when calling grouped_mm with incompatible batch size for offsets
#157922 opened
Jul 9, 2025 -
BC-breaking change to symint range constraints from 2.7 -> 2.8
#157921 opened
Jul 9, 2025 -
tighten pt2 FX graph invariants
#157919 opened
Jul 9, 2025 -
[Precompile] Umbrella task
#157918 opened
Jul 9, 2025 -
tlparse symint guard user provenance slightly off
#157915 opened
Jul 9, 2025 -
DISABLED test_graph_partition_reorder_custom_op_with_no_dependency1 (__main__.CudaGraphTreeTests)
#157901 opened
Jul 9, 2025 -
DISABLED test_triton_get_ring (__main__.NVSHMEMTritonTest)
#157898 opened
Jul 9, 2025 -
DISABLED test_reduce_scatter_float8 (__main__.ProcessGroupNCCLOpTest)
#157897 opened
Jul 9, 2025 -
DISABLED test_nccl_watchdog_cudagraph (__main__.ProcessGroupNCCLOpTest)
#157896 opened
Jul 9, 2025 -
DISABLED test_2d_reductions_mixed_indexing_reduction_op0_cpu (__main__.TritonBlockPointerTestCPU)
#157895 opened
Jul 9, 2025 -
Partitioner: Fix to align partition node order with original graph
#157891 opened
Jul 9, 2025 -
Way to customize HOPs overrided arguments selection
#157888 opened
Jul 9, 2025 -
DISABLED test_vmap_exhaustive_nn_functional_conv2d_cuda_float32 (__main__.TestVmapOperatorsOpInfoCUDA)
#157880 opened
Jul 9, 2025 -
DISABLED test_allreduce_float8 (__main__.ProcessGroupNCCLOpTest)
#157879 opened
Jul 9, 2025 -
DISABLED test_triton_template_generated_code_caching_mm_plus_mm (__main__.TestMaxAutotune)
#157878 opened
Jul 9, 2025 -
DISABLED test_triton_get (__main__.NVSHMEMTritonTest)
#157877 opened
Jul 9, 2025 -
DISABLED test_int64_index_intermediate (__main__.CudaReproTests)
#157872 opened
Jul 9, 2025 -
DISABLED test_graph_partition_reorder_custom_op_with_no_dependency (__main__.CudaGraphTreeTests)
#157871 opened
Jul 9, 2025 -
Exporting tensor.to("cuda") under FakeTensorMode doesn't work on a CPU-only machine
#157869 opened
Jul 8, 2025 -
Spurious "Grad strides do not match bucket view strides" warning for 1x1 convolution
#157862 opened
Jul 8, 2025 -
CUDA ec2 runner with no cuda runtime
#157844 opened
Jul 8, 2025 -
[FSDP2] should pass args as is instead of creating new ones
#157832 opened
Jul 8, 2025 -
sparse_csr bfloat16 matrix multiplication backward is 10x slower than float16
#157808 opened
Jul 8, 2025 -
[RFC] Replace setuptools build backend with scikit-build-core
#157807 opened
Jul 8, 2025 -
Consider changing AOTAutograd cache to hit on graphs with different input and node names
#157792 opened
Jul 8, 2025 -
DISABLED test_vmap_exhaustive_matmul_cuda_float32 (__main__.TestVmapOperatorsOpInfoCUDA)
#157790 opened
Jul 8, 2025 -
DISABLED test_triton_fence (__main__.NVSHMEMTritonTest)
#157789 opened
Jul 8, 2025 -
DISABLED test_progressive (__main__.TestSubprocess)
#157788 opened
Jul 8, 2025 -
torch.distributed.checkpoint.state_dict.set_state_dict stucks with StateDictOptions(full_state_dict=True)
#157781 opened
Jul 8, 2025 -
test_optimizer_non_static_param got failed on Intel GPU
#157778 opened
Jul 8, 2025 -
Does FSDP works with DDP ?
#157917 opened
Jul 8, 2025 -
Support for _Float16/C++23 std::float16_t
#157776 opened
Jul 8, 2025 -
Fatal on torch.xpu
#157775 opened
Jul 8, 2025 -
Drop SSE4 support in oneDNN
#157764 opened
Jul 8, 2025 -
Nightly C++ docs build timeout in CI after 4 hours
#157763 opened
Jul 8, 2025 -
DISABLED test_vmap_exhaustive_linalg_vecdot_cuda_float32 (__main__.TestVmapOperatorsOpInfoCUDA)
#157762 opened
Jul 8, 2025 -
DISABLED test_graph_partition_reorder_cpu_and_gpu (__main__.CudaGraphTreeTests)
#157761 opened
Jul 8, 2025 -
[dynamo] Missing meta kernel for `aten::quantize_per_tensor.tensor_qparams`
#157729 opened
Jul 7, 2025 -
fill_ overflows on uint64 in range [2**63, 2**64) when profiler is engaged
#157728 opened
Jul 7, 2025 -
DISABLED test_vmap_exhaustive_inner_cuda_float32 (__main__.TestVmapOperatorsOpInfoCUDA)
#157726 opened
Jul 7, 2025 -
DISABLED test_resnet (__main__.TestBlockStateAbsorption)
#157725 opened
Jul 7, 2025 -
DISABLED test_progressive (__main__.GPUTests)
#157724 opened
Jul 7, 2025 -
DISABLED test_graph_partition_forward_with_skipped_cudagraphed_backward (__main__.CudaGraphTreeTests)
#157723 opened
Jul 7, 2025 -
[Precompile] Caching precompile and presence of packages in general should be safe re: guard serialization
#157721 opened
Jul 7, 2025 -
[dynamo] `torch.compile` errors on numpy `astype("O")`
#157720 opened
Jul 7, 2025 -
AssertHandler::printMessage On Intel GPU
#157714 opened
Jul 7, 2025 -
/lib/x86_64-linux-gnu/libstdc++.so.6: version `GLIBCXX_3.4.30' not found when IMPORT TORCH
#157709 opened
Jul 7, 2025 -
`torch.randint` raises a `RuntimeError` if `dtype=torch.uint64` and `high >= 2**63`
#157707 opened
Jul 7, 2025 -
Seg faults on macos / OSX
#157704 opened
Jul 7, 2025 -
[inductor][fuzzer] Compilation Error in complex64+toint
#157683 opened
Jul 7, 2025 -
Flex Attention breaks in certain cases when used with a learned bias
#157677 opened
Jul 6, 2025 -
Feedback about Getting Started on Intel GPU
#157672 opened
Jul 6, 2025 -
NCCL error caused due to use of NVLS in torch 2.7.1-cu128 on aarch64 gb200 cluster
#157668 opened
Jul 6, 2025 -
ConvNd ops in channel last layout (N,L,C) / (N,H,W,C) / (N,D,H,W,C)
#157663 opened
Jul 5, 2025 -
OffsetBasedRNGTracker's run_state_sync causes deadlock due to inconsistent broadcast order across ranks
#157662 opened
Jul 5, 2025
565 Unresolved conversations
Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.
-
DDE-Free select with unbacked index.
#157605 commented on
Jul 12, 2025 • 16 new comments -
[Device] Add support for PrivateUse1 device type in parse_type function
#157609 commented on
Jul 12, 2025 • 13 new comments -
[HOP, map] Rework of map autograd to the new interface
#153343 commented on
Jul 11, 2025 • 11 new comments -
Enable `_lazy_clone` between CPU and MPS
#148408 commented on
Jul 11, 2025 • 9 new comments -
[cutlass backend] cache maybe_append_choices
#156781 commented on
Jul 12, 2025 • 7 new comments -
Update the signature and test of torch.hamming_window()
#152682 commented on
Jul 10, 2025 • 7 new comments -
NUMA Binding Integration with torchrun
#149334 commented on
Jul 12, 2025 • 6 new comments -
[nativert] Move ModelRunnerBase to oss.
#157633 commented on
Jul 7, 2025 • 5 new comments -
Add test for user-managed weights with load_state_dict
#157496 commented on
Jul 11, 2025 • 5 new comments -
[dynamo] [guard] Add caching for inside torch.compile.disable function to avoid unnecessary recompilation.
#157566 commented on
Jul 12, 2025 • 5 new comments -
Deprecate DataLoader pin_memory_device param
#146821 commented on
Jul 11, 2025 • 4 new comments -
CI for Windows Arm64
#148753 commented on
Jul 8, 2025 • 4 new comments -
[PT2][fusion] ban fusions with large accumulated reads
#157563 commented on
Jul 12, 2025 • 4 new comments -
[simplefsdp auto-bucketing] ir node runtime estimation
#157572 commented on
Jul 12, 2025 • 4 new comments -
allow user to pass in custom partitioner function
#157580 commented on
Jul 8, 2025 • 3 new comments -
allow _size_of to return individual element's size
#157582 commented on
Jul 8, 2025 • 3 new comments -
[Draft][CUDA] Upgrade torch._scaled_grouped_mm to SM100+
#156806 commented on
Jul 9, 2025 • 3 new comments -
[cpp_wrapper] Build main and kernel code in separate threads
#154551 commented on
Jul 11, 2025 • 3 new comments -
[struct] Add `struct.pack` and `struct.unpack` polyfills
#156977 commented on
Jul 10, 2025 • 3 new comments -
Always disable ShardingPropagation cache if compiling
#156868 commented on
Jul 11, 2025 • 3 new comments -
Parameterized CUDA Graph Launch
#152622 commented on
Jul 11, 2025 • 3 new comments -
[NOT FOR MERGE] Exploratory work on AOTInductor training
#155877 commented on
Jul 11, 2025 • 3 new comments -
[Misc] skip the case test_foreach_add_different_mesh if world size is…
#155563 commented on
Jul 8, 2025 • 2 new comments -
Feature: Implement support for `cudnn_batch_norm_out` kernel to replace the autogen approach.
#123020 commented on
Jul 7, 2025 • 2 new comments -
[iter] support `iter(callable, sentinel)`
#156416 commented on
Jul 10, 2025 • 2 new comments -
[DTensor][FSDP2] necessary changes to FSDP and TP to unblock EP
#157216 commented on
Jul 11, 2025 • 2 new comments -
[BE] always use `uv pip` if possible in `pip_init.py` for `lintrunner init`
#157199 commented on
Jul 10, 2025 • 2 new comments -
Add DeviceAllocator as the base device allocator
#138222 commented on
Jul 11, 2025 • 2 new comments -
Do not checkout nccl for `USE_SYSTEM_LIBS`
#153807 commented on
Jul 9, 2025 • 2 new comments -
[Inductor] Set the default value of min_chunk_size to 512
#150762 commented on
Jul 12, 2025 • 2 new comments -
[generator] Raise `StopIteration(value)` with value from the return stmt
#157152 commented on
Jul 9, 2025 • 2 new comments -
handling special case for pow(3) for GPU
#157537 commented on
Jul 11, 2025 • 2 new comments -
Add cascade sum support for Inductor CPP backend
#156296 commented on
Jul 11, 2025 • 2 new comments -
Deprecate c10::string_view
#156798 commented on
Jul 9, 2025 • 2 new comments -
Optimize scatter/gather kernel for ARM.
#156161 commented on
Jul 11, 2025 • 2 new comments -
[doc] Updates to distributed.md for XCCL backend
#155834 commented on
Jul 8, 2025 • 2 new comments -
Add cuda 12.9 periodic tests
#156900 commented on
Jul 10, 2025 • 1 new comment -
Add basic torch.hash_tensor op
#154149 commented on
Jul 12, 2025 • 1 new comment -
[Don't Review] Test CI
#139971 commented on
Jul 11, 2025 • 1 new comment -
Make open device registration tests standalone
#153855 commented on
Jul 9, 2025 • 1 new comment -
[PrivateUse1] Optimize 3rd party backend experiences
#155215 commented on
Jul 9, 2025 • 1 new comment -
[Easy] Show some clear error when torch.ops.load_library fails.
#157524 commented on
Jul 6, 2025 • 1 new comment -
[dynamo, docs] add dynamo programming model docs
#157527 commented on
Jul 10, 2025 • 1 new comment -
Enable TF32 as fp32 internal precision for matmul/linear/conv
#157520 commented on
Jul 9, 2025 • 1 new comment -
[BE][1/5] fix typos in aten/
#157550 commented on
Jul 12, 2025 • 1 new comment -
[Refactor][XPU] Refactor XPU quantization op and add header files.
#157430 commented on
Jul 8, 2025 • 1 new comment -
[BE][2/5] fix typos in aten/ (aten/src/ATen/native/)
#157551 commented on
Jul 12, 2025 • 1 new comment -
[BE][3/5] fix typos in aten/ (aten/src/ATen/native/)
#157552 commented on
Jul 12, 2025 • 1 new comment -
Preserve current stream in TestCuda::test_stream_compatibility
#157421 commented on
Jul 7, 2025 • 1 new comment -
[BE][4/5] fix typos in aten/ (aten/src/ATen/native/)
#157553 commented on
Jul 12, 2025 • 1 new comment -
Fix doc issue 153531 by adding further explanation of STFT equation
#157595 commented on
Jul 8, 2025 • 1 new comment -
Add inductor lowerings for adaptive_avg_pool3d/adaptive_max_pool3d
#157331 commented on
Jul 9, 2025 • 1 new comment -
[BE] Replace `std::runtime_error` with `TORCH_CHECK` [2/N]
#152080 commented on
Jul 11, 2025 • 1 new comment -
enable windows inductor UT in CI
#151777 commented on
Jul 11, 2025 • 0 new comments -
Add adaptive_avg_pool2d input and output_size check
#151769 commented on
Jul 10, 2025 • 0 new comments -
[bazel] Fix unusual reference to cpuinfo workspace
#151578 commented on
Jul 9, 2025 • 0 new comments -
Allow to byteswap data when reading saved torch jit data
#151447 commented on
Jul 9, 2025 • 0 new comments -
[Environment Variable] Use thread-safe getenv functions
#152609 commented on
Jul 12, 2025 • 0 new comments -
[2/N] Use std::filesystem
#152586 commented on
Jul 10, 2025 • 0 new comments -
[BE] Update numba versions
#152557 commented on
Jul 6, 2025 • 0 new comments -
Remove Conda Instructions
#152546 commented on
Jul 11, 2025 • 0 new comments -
[compile async] [cache] testing
#152523 commented on
Jul 6, 2025 • 0 new comments -
Horizontal
#151780 commented on
Jul 9, 2025 • 0 new comments -
fix: outdated contents in dynamo overview
#152382 commented on
Jul 6, 2025 • 0 new comments -
[reland][ROCm] remove caffe2 from hipify
#151845 commented on
Jul 8, 2025 • 0 new comments -
[FP8][CUTLASS] xFail `honor_sm_carveout` on `sm100`
#152378 commented on
Jul 12, 2025 • 0 new comments -
Enable type promotions in slice_scatter (pytorch#147842)
#151911 commented on
Jul 11, 2025 • 0 new comments -
Avoid rounding errors in _get_total_norm for a dTensor by using torch Dynamo
#152234 commented on
Jul 11, 2025 • 0 new comments -
Add dynamo config to HOP-ify context managers
#152159 commented on
Jul 8, 2025 • 0 new comments -
docs: add torch.e and torch.pi to constants table (#134964)
#151996 commented on
Jul 7, 2025 • 0 new comments -
[UniformValueConstantFolder] deduce value on CPU rather than on device
#151998 commented on
Jul 7, 2025 • 0 new comments -
[BE][3/6] fix typos in test/
#157637 commented on
Jul 12, 2025 • 0 new comments -
try relanding cublaslt autotuning support for TunableOp #
#153316 commented on
Jul 9, 2025 • 0 new comments -
[DEBUG] MTIA Module and Interface
#153308 commented on
Jul 9, 2025 • 0 new comments -
[DEBUG] memory profiler and combined trace
#153307 commented on
Jul 9, 2025 • 0 new comments -
[BE] fix skip_if_lt_x_gpu decorator and add test coverage
#153295 commented on
Jul 9, 2025 • 0 new comments -
[BE][FSDP] fix FSDP to skip tests where #GPUs < world_size before entering into init_pg
#153291 commented on
Jul 9, 2025 • 0 new comments -
test timm_efficientnet pass
#153290 commented on
Jul 12, 2025 • 0 new comments -
[MX] Add more ops to allowed set for e8
#153271 commented on
Jul 8, 2025 • 0 new comments -
[nocommit] bundled autograd cache test
#153269 commented on
Jul 11, 2025 • 0 new comments -
fix dtensor and tensor inconsistent compute mesh
#153268 commented on
Jul 7, 2025 • 0 new comments -
DEBUG PR Issue
#153267 commented on
Jul 9, 2025 • 0 new comments -
MXFP8 Fix broken bias support for mxfp8
#153254 commented on
Jul 8, 2025 • 0 new comments -
Fix integer overflow bug in triu/tril for large diagonal values
#153240 commented on
Jul 12, 2025 • 0 new comments -
Delete .github/workflows/docker-cache-mi300.yml
#153075 commented on
Jul 10, 2025 • 0 new comments -
Add CUDA support for Adagrad(fused=True)
#153038 commented on
Jul 10, 2025 • 0 new comments -
Allow zero sized dimensions in padding operations
#153037 commented on
Jul 11, 2025 • 0 new comments -
[WIP][dynamic shapes] unbacked safer cat, repeat
#153011 commented on
Jul 6, 2025 • 0 new comments -
[Pytorch] Add `torch.cuda.streams.Event` to save torch functions list
#152978 commented on
Jul 6, 2025 • 0 new comments -
[dtensor] Extend Partial partition of replicated tensor for min/max reduce
#152975 commented on
Jul 7, 2025 • 0 new comments -
docs: Improve documentation for NCCL timeout / watchdog variables
#152959 commented on
Jul 6, 2025 • 0 new comments -
[dtensor] add privateuse1 SDPA op support to DTensor
#152949 commented on
Jul 9, 2025 • 0 new comments -
[feature] Channel Wise Parallel API for Conv layers
#152937 commented on
Jul 6, 2025 • 0 new comments -
Allow Inductor backends to attest their own availability
#152933 commented on
Jul 5, 2025 • 0 new comments -
Add unified memory APIs for torch.accelerator
#152932 commented on
Jul 11, 2025 • 0 new comments -
Add overall tensor similarity comparison (#152647)
#152920 commented on
Jul 6, 2025 • 0 new comments -
[DO NOT MERGE] update build tools version
#152820 commented on
Jul 8, 2025 • 0 new comments -
Update CMakeLists.txt
#152786 commented on
Jul 6, 2025 • 0 new comments -
Pattern matcher support for mutable ops with view inputs
#152776 commented on
Jul 12, 2025 • 0 new comments -
Handle less functions than number of segments
#152753 commented on
Jul 6, 2025 • 0 new comments -
[Dynamo] Guard serialization for BUILTIN_MATCH
#152729 commented on
Jul 6, 2025 • 0 new comments -
[pytree] make `tree_*` functions accept both Python and C++ `PyTreeSpec`
#152624 commented on
Jul 9, 2025 • 0 new comments -
[WIP] Generalize device caching allocator
#151298 commented on
Jul 10, 2025 • 0 new comments -
[DO NOT MERGE] Update oneDNN to the latest main branch
#147073 commented on
Jul 8, 2025 • 0 new comments -
fake_tensor: Handle op errors more gracefully
#147049 commented on
Jul 7, 2025 • 0 new comments -
Porting Pytorch to AIX Operating System.
#146983 commented on
Jul 8, 2025 • 0 new comments -
Support contextlib.ExitStack
#146506 commented on
Jul 9, 2025 • 0 new comments -
Fix full_like decomposition to preserve strides
#144765 commented on
Jul 9, 2025 • 0 new comments -
[BE][PYFMT] remove `black`: finish `black -> ruff format` migration
#144557 commented on
Jul 12, 2025 • 0 new comments -
[BE][PYFMT] migrate PYFMT for `test/[i-z]*/` to `ruff format`
#144556 commented on
Jul 12, 2025 • 0 new comments -
[BE][PYFMT] migrate PYFMT for `torch/[p-z]*/` to `ruff format`
#144552 commented on
Jul 12, 2025 • 0 new comments -
[dynamo, nested graph breaks] add nested graph break tests
#144516 commented on
Jul 10, 2025 • 0 new comments -
[Draft][WIP] Enable XPU path for FlexAttention
#143553 commented on
Jul 7, 2025 • 0 new comments -
Support LOAD_BUILD_CLASS opcode in dynamo
#139561 commented on
Jul 10, 2025 • 0 new comments -
`has_triton`: Use the device interface for detecting Triton availability
#139171 commented on
Jul 7, 2025 • 0 new comments -
Add overflow check for negtive integer div_floor and div_trunc on CPU
#138684 commented on
Jul 12, 2025 • 0 new comments -
Always produce XML
#138513 commented on
Jul 7, 2025 • 0 new comments -
[pytree] add `treespec_{leaf,tuple,dict}` functions for args_spec modification
#138214 commented on
Jul 9, 2025 • 0 new comments -
[pytree] Add public pytree module `torch.utils.pytree`
#137400 commented on
Jul 9, 2025 • 0 new comments -
Help fix numpy detection in cross compiled layouts
#137084 commented on
Jul 8, 2025 • 0 new comments -
Automated submodule update: FBGEMM
#115316 commented on
Jul 12, 2025 • 0 new comments -
[pytree] support PyStructSequence types for Python pytree
#113258 commented on
Jul 9, 2025 • 0 new comments -
[docs] URL and link format proposal to make function page URLs more concise
#106664 commented on
Jul 12, 2025 • 0 new comments -
Python 3.14 support for PyTorch
#156856 commented on
Jul 12, 2025 • 0 new comments -
Illegal Memory Access when Using Trainable Biases in Flex Attention
#144511 commented on
Jul 12, 2025 • 0 new comments -
addition of muon optimizer to torch.optim
#148819 commented on
Jul 12, 2025 • 0 new comments -
UNSTABLE rocm / linux-jammy-rocm-py3.10 / test (default)
#156098 commented on
Jul 12, 2025 • 0 new comments -
DISABLED test_remove_noop_view_default_cuda (__main__.GPUTests)
#151511 commented on
Jul 12, 2025 • 0 new comments -
[v.2.8.0] Release Tracker
#156745 commented on
Jul 12, 2025 • 0 new comments -
[inductor] [silence] inconsistent swap wih eager when compiling `torch.rot90-torch.randn_like`
#147847 commented on
Jul 12, 2025 • 0 new comments -
Inductor can not fuse cat with a pointwise
#125075 commented on
Jul 12, 2025 • 0 new comments -
[FSDP2] document the contract for modifying DTensor model.parameters()
#157391 commented on
Jul 12, 2025 • 0 new comments -
mps and cpu backends produce different training results with FFT and Adam
#151740 commented on
Jul 8, 2025 • 0 new comments -
[dynamo] Avoid unnecessary `.detach()` call in `_make_subclass` polyfill
#151265 commented on
Jul 5, 2025 • 0 new comments -
NCCL: Fix cmake file when cross compiling.
#151234 commented on
Jul 8, 2025 • 0 new comments -
Implement MKLGenerator
#151218 commented on
Jul 10, 2025 • 0 new comments -
Fix `MaskedTensor` to device ignored mask
#151205 commented on
Jul 10, 2025 • 0 new comments -
[MPS] Get Vmap to work with mps backend
#151177 commented on
Jul 12, 2025 • 0 new comments -
Pin all root requirements to major versions
#150833 commented on
Jul 7, 2025 • 0 new comments -
[draft][distributed] add into 3d composability test at AMD CI test
#150694 commented on
Jul 8, 2025 • 0 new comments -
Make LazyModuleMixin materialize after load_state_dict
#150593 commented on
Jul 10, 2025 • 0 new comments -
Add differentiable ops hint message in Module docs
#150291 commented on
Jul 10, 2025 • 0 new comments -
Add cmake variable USE_ROCM_CK
#150245 commented on
Jul 5, 2025 • 0 new comments -
[DLPack] Add support for missing keyword-arguments.
#150218 commented on
Jul 9, 2025 • 0 new comments -
Add path used by pip's build isolation procedure to DLL search
#150013 commented on
Jul 11, 2025 • 0 new comments -
AOTI freezing: fix test issues and enable by default
#149961 commented on
Jul 11, 2025 • 0 new comments -
DRAFT: Add TMA opt for concat function target hopper and blackwell arch
#149893 commented on
Jul 6, 2025 • 0 new comments -
Configure `cuda.cmake` to ensure consistent behavior downstream
#149861 commented on
Jul 8, 2025 • 0 new comments -
[test] sccache docker build
#149536 commented on
Jul 7, 2025 • 0 new comments -
Fix unexpected keyword argument 'mode' when calling `CompileCounterWithBackend`
#149271 commented on
Jul 6, 2025 • 0 new comments -
Fix AttributeError for `_get_vc_env` with setuptools>=75.9.0
#148847 commented on
Jul 6, 2025 • 0 new comments -
[BE][pytree] cleanup parameterized pytree tests
#148569 commented on
Jul 9, 2025 • 0 new comments -
Implement fast access to individual elements of jagged nested tensors
#148497 commented on
Jul 11, 2025 • 0 new comments -
[triton hash update] update the pinned triton hash
#148492 commented on
Jul 12, 2025 • 0 new comments -
[BE][pytree] rename argument name in register function to match the type annotations: `*_fn -> *_func`
#148484 commented on
Jul 9, 2025 • 0 new comments -
[BE][pytree] rename `NodeDef` member to match the type annotations: `*_fn -> *_func`
#148474 commented on
Jul 9, 2025 • 0 new comments -
[pytree] simplify public API exposition with `__module__`
#148328 commented on
Jul 9, 2025 • 0 new comments -
[pytree] add another simplified pytree module `torch.pytree`
#148180 commented on
Jul 9, 2025 • 0 new comments -
Support `contextlib.suppress`
#147990 commented on
Jul 9, 2025 • 0 new comments -
[DO NOT MERGE] Migrate from oneDNN Inner Product to oneDNN MatMul for mkldnn_linear and mkldnn_linear_backward
#147855 commented on
Jul 8, 2025 • 0 new comments -
[Inductor][CPP] Add float16 support for CppMicroGemmAMX
#147368 commented on
Jul 9, 2025 • 0 new comments -
[DO NOT MERGE][Inductor] Migrate from oneDNN Inner Product to oneDNN MatMul for mkldnn._linear_pointwise and mkldnn._linear_pointwise.binary
#147360 commented on
Jul 8, 2025 • 0 new comments -
switch from deprecated `find_package(CUDA)` to `find_package(CUDAToolkit)`
#147300 commented on
Jul 8, 2025 • 0 new comments -
[nativert] libtorch kernel registry
#157150 commented on
Jul 8, 2025 • 0 new comments -
[generator] Close all open generators in compile_subgraph
#157149 commented on
Jul 9, 2025 • 0 new comments -
[contextlib] Fixes for CPython contextlib tests
#157148 commented on
Jul 10, 2025 • 0 new comments -
[DDP][FSDP2] Add unit test for DDP mixed precision with FSDP2 ignored params
#157140 commented on
Jul 9, 2025 • 0 new comments -
Documenting discrepancy for Numpy dependency versions
#157132 commented on
Jul 11, 2025 • 0 new comments -
updated path to requirements txt in docs
#157106 commented on
Jul 11, 2025 • 0 new comments -
[CI] add decorator for specifying H100-only tests
#156980 commented on
Jul 10, 2025 • 0 new comments -
[math] Trace `float.fromhex`
#156976 commented on
Jul 9, 2025 • 0 new comments -
[math] Raise exception in Dynamo if constant fold call fail
#156975 commented on
Jul 9, 2025 • 0 new comments -
Use `_get_object_coll_device` instead of deprecated API
#156878 commented on
Jul 7, 2025 • 0 new comments -
`fast-autotune`: Model Prediction of Triton Kernel Runtimes
#156851 commented on
Jul 10, 2025 • 0 new comments -
[TESTING] [DO NOT MERGE] Updated triton commit pin - upstream base
#156841 commented on
Jul 7, 2025 • 0 new comments -
Introduce a new API torch.accelerator.get_mem_info
#156812 commented on
Jul 10, 2025 • 0 new comments -
add tests for Thunk utility function
#156759 commented on
Jul 7, 2025 • 0 new comments -
Add back manywheel-py3_9-cuda12_4-build/test
#156753 commented on
Jul 9, 2025 • 0 new comments -
Stop parsing command line arguments every time common_utils is imported.
#156703 commented on
Jul 10, 2025 • 0 new comments -
ReplaceWithCopy graph pass
#156666 commented on
Jul 7, 2025 • 0 new comments -
Adds support for Nested Jagged Tensor in Multihead Attention
#156660 commented on
Jul 8, 2025 • 0 new comments -
Fix torch==2.6 broke nn.Module.dtype typing
#156631 commented on
Jul 9, 2025 • 0 new comments -
multi-kernel matmuls based on varying hint sizes
#156628 commented on
Jul 12, 2025 • 0 new comments -
[BE][15/16] fix typos in torch/ (torch/distributed/tensor/)
#156605 commented on
Jul 12, 2025 • 0 new comments -
[Inductor Dashboard] Enable deterministic algorithms for all models on ROCm
#156592 commented on
Jul 10, 2025 • 0 new comments -
[Doc] remove WSL2 in support matrix for Intel GPU
#156590 commented on
Jul 8, 2025 • 0 new comments -
[2/N] Remove FindPackageHandleStandardArgs.cmake
#156559 commented on
Jul 11, 2025 • 0 new comments -
[docs][typing] Document and type support for dim=None in torch.amin and torch.amax
#156510 commented on
Jul 9, 2025 • 0 new comments -
[Inductor] Fix epilogue fusion decision with 1 Triton caller as choice
#156500 commented on
Jul 10, 2025 • 0 new comments -
[ROCm][Windows] Fix finding ROCm/HIP version
#156486 commented on
Jul 11, 2025 • 0 new comments -
[iter] Wrap iter(..) call in a ObjectIteratorVariable
#156460 commented on
Jul 10, 2025 • 0 new comments -
Change t.is_cuda to t.device.type == 'cuda' in torch/utils/viz
#156418 commented on
Jul 7, 2025 • 0 new comments -
[iter] Add support for sequence protocol in `iter(..)`
#156371 commented on
Jul 9, 2025 • 0 new comments -
[BE][1/6] fix typos in test/
#157635 commented on
Jul 12, 2025 • 0 new comments -
[pruning] Implement Taylor expansion unstructured pruning
#157620 commented on
Jul 12, 2025 • 0 new comments -
[pruning] add more test cases for pruning
#157613 commented on
Jul 7, 2025 • 0 new comments -
[dynamo] Move skipIf decorator to class level in test_fx_graph_runnable
#157594 commented on
Jul 10, 2025 • 0 new comments -
[CUDA][NVTX] use `pytorch` nvtx domain for pytorch ranges
#157586 commented on
Jul 8, 2025 • 0 new comments -
Linux py 3.14 wheel builds
#157559 commented on
Jul 9, 2025 • 0 new comments -
[BE][5/5] fix typos in aten/ (aten/src/ATen/)
#157554 commented on
Jul 12, 2025 • 0 new comments -
[indcutor] pack linear for FP32 dynamic mode
#157542 commented on
Jul 7, 2025 • 0 new comments -
Add a test for checking that the CUDA stubs directory is not in libcaffe2_nvrts.so's RPATH or RUNPATH
#157437 commented on
Jul 7, 2025 • 0 new comments -
[build] bootstrap git repo for build for non-git-clone archive
#157432 commented on
Jul 12, 2025 • 0 new comments -
Add a flag "realized" in IRNode to enable tracking origin_nodes
#157423 commented on
Jul 8, 2025 • 0 new comments -
[CI] Fixes CI for CUDA Version > 12.9
#157385 commented on
Jul 10, 2025 • 0 new comments -
[inductor] Fix memory layout for concatenation of repeated input
#157380 commented on
Jul 7, 2025 • 0 new comments -
[cherry-pick] temporarily disabling generation of weblinks for torch v2.8 …
#157353 commented on
Jul 7, 2025 • 0 new comments -
Making input dynamically adjust.
#157324 commented on
Jul 9, 2025 • 0 new comments -
Using torch.accelerator in comm_mode_features_example.py and visualize_sharding_example.py
#157317 commented on
Jul 10, 2025 • 0 new comments -
Fix inconsistent pybind11 usage across ONNX and Tensorpipe during CMake build
#157309 commented on
Jul 10, 2025 • 0 new comments -
Test re-enabling ET test
#157298 commented on
Jul 9, 2025 • 0 new comments -
adding the ability to record aten arg vals and types
#157291 commented on
Jul 8, 2025 • 0 new comments -
[nativert] add memory overlap debug assertion
#157290 commented on
Jul 7, 2025 • 0 new comments -
Update docs dependencies
#157287 commented on
Jul 12, 2025 • 0 new comments -
[inductor][templates] Finalize all registered hooks
#157270 commented on
Jul 7, 2025 • 0 new comments -
Fix the Problems About Defining Static Variable in Inline Function
#157269 commented on
Jul 9, 2025 • 0 new comments -
Fix init CUDA preload: get correct versions (#147001)
#157264 commented on
Jul 9, 2025 • 0 new comments -
[distributed] build enum for Backend class
#157263 commented on
Jul 7, 2025 • 0 new comments -
[dynamo][fsdp] Consistent behavior of int attributes
#157262 commented on
Jul 11, 2025 • 0 new comments -
Updating default value of eps in RMSNorm documentation
#157223 commented on
Jul 9, 2025 • 0 new comments -
Adding bias argument to NN normalization methods
#157198 commented on
Jul 11, 2025 • 0 new comments -
[DO NOT MERGE] Test new MI300X capacity.
#157191 commented on
Jul 12, 2025 • 0 new comments -
[SymmMem] Enable NVL72
#157180 commented on
Jul 10, 2025 • 0 new comments -
[OrderedDict] Implement `OrderedDict.move_to_end(key, last=False)`
#155152 commented on
Jul 10, 2025 • 0 new comments -
Fix conversion of values in libtorch agnostic tests
#155115 commented on
Jul 9, 2025 • 0 new comments -
[dict] Implement dict.__ior__ and fix return type in dict.__or__
#155072 commented on
Jul 10, 2025 • 0 new comments -
Avoid differing results in `linalg.(tensor_)solve`
#154983 commented on
Jul 10, 2025 • 0 new comments -
[CI] Removing --user flag from all pip install commands
#154900 commented on
Jul 10, 2025 • 0 new comments -
[BE]: Try to enable LTO
#154819 commented on
Jul 5, 2025 • 0 new comments -
[vision hash update] update the pinned vision hash
#154694 commented on
Jul 12, 2025 • 0 new comments -
Add `scale` complex type check in `quantize_per_tensor`
#154601 commented on
Jul 10, 2025 • 0 new comments -
Use official CUDAToolkit module in CMake
#154595 commented on
Jul 7, 2025 • 0 new comments -
Enable Leak Sanitizer
#154584 commented on
Jul 7, 2025 • 0 new comments -
Fixes Issue #154491
#154561 commented on
Jul 11, 2025 • 0 new comments -
implement MKLGenerator
#154199 commented on
Jul 10, 2025 • 0 new comments -
[cuBLASLt][cuBLAS] Support 2D bias and `beta != 1.0` in cuBLASLt
#154170 commented on
Jul 8, 2025 • 0 new comments -
[BE]: Update pybind11 submodule to 3.0.0rc
#154115 commented on
Jul 10, 2025 • 0 new comments -
Fused RMSNorm implementation
#153666 commented on
Jul 12, 2025 • 0 new comments -
[BE]: Update CUTLASS submodule to 4.0.0
#153541 commented on
Jul 9, 2025 • 0 new comments -
[dynamo][compile-time] Cache frame summaries
#153434 commented on
Jul 12, 2025 • 0 new comments -
[PT2][Optimus][fp8 compuation quantizatoin] Add fallback logic
#153430 commented on
Jul 12, 2025 • 0 new comments -
[BE] Move `BUILD_AOT_INDUCTOR_TEST` to build stage
#153419 commented on
Jul 12, 2025 • 0 new comments -
defer to aot eager instead of skip frame
#153409 commented on
Jul 12, 2025 • 0 new comments -
Print correct variable names in cuda.cmake
#153402 commented on
Jul 12, 2025 • 0 new comments -
[ONNX] Cast before calling Softmax when dtype is specified
#153393 commented on
Jul 12, 2025 • 0 new comments -
Remove mut marker for fused_adagrad in native_functions.yaml
#153376 commented on
Jul 12, 2025 • 0 new comments -
[DEBUG] REmove has CUDA
#153349 commented on
Jul 11, 2025 • 0 new comments -
[Dynamo][TVM] Check TVM existence and version
#153338 commented on
Jul 12, 2025 • 0 new comments -
[don't merge] upgrade vs2022 to v17.13.6
#153322 commented on
Jul 9, 2025 • 0 new comments -
[DEBUG] only comment
#153320 commented on
Jul 9, 2025 • 0 new comments -
[DEBUG] only combined_traceback
#153319 commented on
Jul 9, 2025 • 0 new comments -
[DEBUG] dump combined_traceback
#153318 commented on
Jul 9, 2025 • 0 new comments -
[associative_scan] Autograd for additional inputs
#153317 commented on
Jul 9, 2025 • 0 new comments -
[iter] exhaust `ListIterator` when `unpack_var_sequence` is called
#156370 commented on
Jul 9, 2025 • 0 new comments -
[iter] Update some of the tests to not call pickle
#156369 commented on
Jul 9, 2025 • 0 new comments -
[WIP] Add a new API of allocator setting for accelerator
#156175 commented on
Jul 10, 2025 • 0 new comments -
[executorch hash update] update the pinned executorch hash
#156141 commented on
Jul 12, 2025 • 0 new comments -
[CUDA] Use runtime driver API for cuStreamWriteValue32
#156097 commented on
Jul 12, 2025 • 0 new comments -
[build] remove upper version pin for `setuptools<80.0`
#156049 commented on
Jul 12, 2025 • 0 new comments -
Fix atleast_{1,2,3}d() with no arguments description
#156042 commented on
Jul 10, 2025 • 0 new comments -
[DRAFT][cuDNN][SDPA] Introduce `TORCH_CUDNN_SDPA_AVOID_RECOMPILE=1`
#155958 commented on
Jul 12, 2025 • 0 new comments -
[dynamo] Add `-> bool` to functions named `is_*` or `_is_*`
#155923 commented on
Jul 5, 2025 • 0 new comments -
add sfdp pattern
#155792 commented on
Jul 9, 2025 • 0 new comments -
Fix torch.export.export() GPU failure with RNN modules.
#155734 commented on
Jul 9, 2025 • 0 new comments -
docs: clean up docstring for clarity and correctness
#155712 commented on
Jul 9, 2025 • 0 new comments -
[Optimus] add einsum_to_pointwise_pass pattern
#155666 commented on
Jul 11, 2025 • 0 new comments -
Make upsample accept list scale_factor
#155654 commented on
Jul 9, 2025 • 0 new comments -
DOC: update CrossEntropyLoss with note and example of incorrect target specification
#155649 commented on
Jul 8, 2025 • 0 new comments -
[dict] Implement dict subclass `fromkeys` classmethod
#155608 commented on
Jul 11, 2025 • 0 new comments -
[DRAFT] Evaluate feasability of using FunctionalTensor for Example Value
#155606 commented on
Jul 8, 2025 • 0 new comments -
[aoti][mps] Enable test_aot_inductor.py tests
#155598 commented on
Jul 10, 2025 • 0 new comments -
[Misc] fix distributed/_tools/test_sac_ilp.py::TestSACILP::test_sac_i…
#155548 commented on
Jul 8, 2025 • 0 new comments -
[OrderedDict] Add `bool(OrderedDict)`
#155503 commented on
Jul 10, 2025 • 0 new comments -
[OrderedDict] Set the correct dict class in UserDefinedDictVariable
#155502 commented on
Jul 10, 2025 • 0 new comments -
[OrderedDict] Implement `hasattr(..., IteratorVariable)`
#155501 commented on
Jul 10, 2025 • 0 new comments -
[Dynamo] Enable torch function dispatch on HOPs
#155452 commented on
Jul 10, 2025 • 0 new comments -
Use unpack instructions for vec256 (de)interleave2
#155440 commented on
Jul 9, 2025 • 0 new comments -
[scan] Fix issues with scan on CPU and for autograd when implementing an RNN with multiple layers
#155422 commented on
Jul 10, 2025 • 0 new comments -
Convert onnx torchscript rst to md
#155390 commented on
Jul 12, 2025 • 0 new comments -
[einops] Ensure Dynamo can trace through einops
#155310 commented on
Jul 8, 2025 • 0 new comments -
Add UT for torch.accelerator memory-related API
#155200 commented on
Jul 10, 2025 • 0 new comments -
[dict] Implement `__eq__` for dict_items
#155154 commented on
Jul 10, 2025 • 0 new comments -
[OrderedDict] Implement `OrderedDict.popitem(last=...)`
#155153 commented on
Jul 10, 2025 • 0 new comments -
Export always give a value range with max length - 1
#156882 commented on
Jul 9, 2025 • 0 new comments -
DISABLED test_nn_module (__main__.TestGuardSerialization)
#153120 commented on
Jul 9, 2025 • 0 new comments -
DISABLED test_aot_autograd_runtime_wrapper_prologue_profiled (__main__.ReproTests)
#156678 commented on
Jul 9, 2025 • 0 new comments -
DISABLED test_inductor_all_reduce_non_contig_input (__main__.CompileTest)
#147733 commented on
Jul 9, 2025 • 0 new comments -
DISABLED test_rng (__main__.TestCompilerBisector)
#139590 commented on
Jul 9, 2025 • 0 new comments -
DISABLED test_dynamic_warmup (__main__.CudaGraphTreeTests)
#156693 commented on
Jul 9, 2025 • 0 new comments -
DISABLED test_relative_import (__main__.ReproTests)
#156679 commented on
Jul 9, 2025 • 0 new comments -
DISABLED test_vmap_exhaustive_mv_cuda_float32 (__main__.TestVmapOperatorsOpInfoCUDA)
#142631 commented on
Jul 9, 2025 • 0 new comments -
libtorch.so file size is very large
#34058 commented on
Jul 9, 2025 • 0 new comments -
torch.fx.experimental.symbolic_shapes.GuardOnDataDependentSymNode: Could not guard on data-dependent expression Eq(u0, 1) (unhinted: Eq(u0, 1)). (Size-like symbols: none)
#157355 commented on
Jul 9, 2025 • 0 new comments -
Improve Error Message in MultiMarginLoss for Inconsistent Target Size
#106251 commented on
Jul 9, 2025 • 0 new comments -
Training/Fine-tuning fails with PyTorch 2.8 + 4x 5090 GPUs using DDP/FSDP/DeepSpeed
#150734 commented on
Jul 9, 2025 • 0 new comments -
[RFC] Per-Parameter-Sharding FSDP
#114299 commented on
Jul 9, 2025 • 0 new comments -
DISABLED test_simple_multi_arch_embed_kernel_binary_True_cuda (__main__.AOTInductorTestABICompatibleGpu)
#156930 commented on
Jul 9, 2025 • 0 new comments -
DISABLED test_relative_import_no_modulename (__main__.ReproTests)
#156691 commented on
Jul 9, 2025 • 0 new comments -
DISABLED test_bitwise_print_precedence (__main__.ReproTests)
#156736 commented on
Jul 9, 2025 • 0 new comments -
DISABLED test_empty_storage (__main__.CudaGraphTreeTests)
#156755 commented on
Jul 9, 2025 • 0 new comments -
we should graph break on nn.Parameter constructors
#157452 commented on
Jul 8, 2025 • 0 new comments -
torch._dynamo.exc.InternalTorchDynamoError: RuntimeError: Compiler: cl is not found
#157458 commented on
Jul 8, 2025 • 0 new comments -
PyTorch 2.7.1 torch.compile will probably break with einops 0.8.2 or 0.9.0
#157601 commented on
Jul 8, 2025 • 0 new comments -
DISABLED test_ind_worker_queue (__main__.TestIndividualWorkerQueue)
#68643 commented on
Jul 8, 2025 • 0 new comments -
DISABLED test_addr_alpha_beta_out (__main__.ReproTests)
#156641 commented on
Jul 8, 2025 • 0 new comments -
DISABLED test_while_loop_schema_gen (__main__.TestHopSchema)
#141202 commented on
Jul 8, 2025 • 0 new comments -
DISABLED test_graph_break_unsupported_fake (__main__.ReproTests)
#156629 commented on
Jul 8, 2025 • 0 new comments -
DISABLED test_graph_partition_custom_op_dynamoc_shapes (__main__.CudaGraphTreeTests)
#157428 commented on
Jul 8, 2025 • 0 new comments -
DISABLED test_dont_dce_rand (__main__.ReproTests)
#156580 commented on
Jul 8, 2025 • 0 new comments -
DISABLED test_graph_partition_reorder_cpu_and_gpu_interleave (__main__.CudaGraphTreeTests)
#152561 commented on
Jul 8, 2025 • 0 new comments -
canUse32BitIndexMath set to False with efficient net
#155225 commented on
Jul 8, 2025 • 0 new comments -
magma builds should be part of the docker image builds
#148762 commented on
Jul 8, 2025 • 0 new comments -
Cuda-12.9 removed libnvToolsExt.so.* and is now purely header nvtx3
#152756 commented on
Jul 8, 2025 • 0 new comments -
[Release improvements] Have cherry-pick bot always add the current release to the PR
#152212 commented on
Jul 8, 2025 • 0 new comments -
[feature request] torch.mix function to generalize/symmetrize addcmul
#104849 commented on
Jul 8, 2025 • 0 new comments -
DISABLED test_matmul_small_brute_force_1d_Nd_cuda_float32 (__main__.TestLinalgCUDA)
#125276 commented on
Jul 8, 2025 • 0 new comments -
PT2E Quantization Migration Tracker
#157591 commented on
Jul 8, 2025 • 0 new comments -
Label tracking meta-issue (edit me to get automatically CC'ed on issues! cc bot)
#24422 commented on
Jul 8, 2025 • 0 new comments -
DISABLED test_inductor_all_to_all_single (__main__.CompileTest)
#147795 commented on
Jul 9, 2025 • 0 new comments -
DISABLED test_empty_cpu_tensor (__main__.CudaGraphTreeTests)
#156735 commented on
Jul 9, 2025 • 0 new comments -
DISABLED test_dataclass_in_module (__main__.ReproTests)
#156776 commented on
Jul 9, 2025 • 0 new comments -
Add full support for NVIDIA RTX Pro 6000 (Blackwell – SM122 / Compute Capability 12.2)
#157549 commented on
Jul 9, 2025 • 0 new comments -
Allow creation of pseudo devices for testing purposes
#61654 commented on
Jul 9, 2025 • 0 new comments -
Restore CUDA 12.4 manylinux build and test in CI
#156747 commented on
Jul 9, 2025 • 0 new comments -
Importing xgboost before torch + openmp causes seg fault
#155201 commented on
Jul 9, 2025 • 0 new comments -
[CUDA][Compex] `test_reference_numerics_large_jiterator_unary_cuda_complex64` broken after updating to `numpy >= 1.25.0`
#125198 commented on
Jul 9, 2025 • 0 new comments -
FakeTensorUpdater does not trace nodes correctly
#152548 commented on
Jul 9, 2025 • 0 new comments -
Unexpected, batch size and device dependent NaN propagation in Conv1d
#157237 commented on
Jul 9, 2025 • 0 new comments -
Deprecation of NVTX 2 (`nvToolsExt`): Recommended to move to NVTX 3
#147011 commented on
Jul 9, 2025 • 0 new comments -
torch.compile on MPS progress tracker
#150121 commented on
Jul 9, 2025 • 0 new comments -
Export Huggingface models with StaticCache
#155862 commented on
Jul 9, 2025 • 0 new comments -
Add dlpack support for MPS device
#153789 commented on
Jul 9, 2025 • 0 new comments -
torch wheels are unusable if CUDA RPMs are installed on the system (was Import error in nvidia/cuda:12.6.3-cudnn-devel-rockylinux9)
#150399 commented on
Jul 9, 2025 • 0 new comments -
Looking for valid compiling option for extension based on torch-2.1.0+cpu.cxx11.abi
#143780 commented on
Jul 9, 2025 • 0 new comments -
Tensor.nbytes() returns itemsize * numel for sparse tensors
#29734 commented on
Jul 11, 2025 • 0 new comments -
DISABLED test_TransformerDecoderLayer_gelu_activation_cuda_fp32 (__main__.TestNN)
#157121 commented on
Jul 9, 2025 • 0 new comments -
DISABLED test_graph_partition_custom_op_mutation (__main__.CudaGraphTreeTests)
#157449 commented on
Jul 9, 2025 • 0 new comments -
DISABLED test_Transformer_multilayer_coder_cuda_fp32 (__main__.TestNN)
#157120 commented on
Jul 9, 2025 • 0 new comments -
Pytorch XPU Windows build failed in cmake rerun loop due to the source code deep path
#134956 commented on
Jul 9, 2025 • 0 new comments -
Matmul Triton Template with epilogue fusion can not speed up on XPU.
#146568 commented on
Jul 9, 2025 • 0 new comments -
[XPU][Inductor] Failed to run max-autotune in subprocess.
#149703 commented on
Jul 9, 2025 • 0 new comments -
optree package status in PyTorch
#152535 commented on
Jul 9, 2025 • 0 new comments -
[torch.export] Cannot export TorchVision fasterrcnn_mobilenet_v3_large_fpn
#146152 commented on
Jul 9, 2025 • 0 new comments -
Inductor error with Torch XPU optimizations to StableDiffusion3 Pipeline
#156303 commented on
Jul 9, 2025 • 0 new comments -
`RuntimeError: UR error` with XPU
#149953 commented on
Jul 9, 2025 • 0 new comments -
nn.rmsnorm is super slower than nn.layernorm
#157345 commented on
Jul 9, 2025 • 0 new comments -
Allow clamp to work on absolute values and preserve sign
#156956 commented on
Jul 9, 2025 • 0 new comments -
[inductor][triton] Block ptrs are being removed from Triton
#154025 commented on
Jul 9, 2025 • 0 new comments -
RuntimeError: each element in list of batch should be of equal size
#42654 commented on
Jul 9, 2025 • 0 new comments -
from_blob python api
#107112 commented on
Jul 9, 2025 • 0 new comments -
RuntimeError: Could not find libnvrtc.so. Please make sure CUDA is installed.
#155378 commented on
Jul 9, 2025 • 0 new comments -
get_ema_multi_avg_fn() equation is a little confused
#155551 commented on
Jul 9, 2025 • 0 new comments -
MPS operator coverage tracking issue (2.6+ version)
#141287 commented on
Jul 9, 2025 • 0 new comments -
[feature request] Caching allocator diagnostics and memory allocation tracing/visualization
#1529 commented on
Jul 7, 2025 • 0 new comments -
[Upstream Triton] Handle user-specified triton.set_allocator function
#155584 commented on
Jul 7, 2025 • 0 new comments -
Incremental version of pca_lowrank
#40770 commented on
Jul 7, 2025 • 0 new comments -
Migrating existing backend-MAIA integration toward PrivateUse1 / openReg
#155864 commented on
Jul 7, 2025 • 0 new comments -
`bias: bool` argument for Batch, Instance, Group and RMSNorm
#157144 commented on
Jul 7, 2025 • 0 new comments -
Regression: torch.distributed.gather_object segfaults
#157627 commented on
Jul 7, 2025 • 0 new comments -
Einsum of 2 dtensors fails in inference mode
#157631 commented on
Jul 7, 2025 • 0 new comments -
[DTensor] Better communication cost model for redistribute
#157585 commented on
Jul 7, 2025 • 0 new comments -
`TORCH_DISTRIBUTED_DEBUG=DETAIL` causes DTensors to raise errors
#157622 commented on
Jul 7, 2025 • 0 new comments -
[Export] Non-strict mode can't handle conditionals on tensor subclass types
#153429 commented on
Jul 7, 2025 • 0 new comments -
cmake: add USE_SYSTEM_{KLEIDI,CUDNN_FRONTEND,CUTLASS,FMT} options to USE_SYSTEM_LIBS
#153863 commented on
Jul 7, 2025 • 0 new comments -
DISABLED test_sdpa_mask_fp16_L6_S17_NH23_HS121 (__main__.TestSDPA)
#138905 commented on
Jul 7, 2025 • 0 new comments -
MPS SDPA returns NaN when attention mask blocks all rows
#156707 commented on
Jul 7, 2025 • 0 new comments -
Flex Attention is incompatible with selective AC
#147879 commented on
Jul 7, 2025 • 0 new comments -
[dynamo] Improve trace rules reasoning
#150435 commented on
Jul 7, 2025 • 0 new comments -
`torch.combinations` exhibits excessive memory usage and hangs for moderate `n` and `r` due to `n^r`
#153337 commented on
Jul 7, 2025 • 0 new comments -
[complex] dropout and it's variants should support complex tensors
#80256 commented on
Jul 7, 2025 • 0 new comments -
DISABLED test_graph_partition_cpu_tensor_symints (__main__.CudaGraphTreeTests)
#157367 commented on
Jul 8, 2025 • 0 new comments -
RendezvousConnectionError when use C10d on multi nodes
#69197 commented on
Jul 5, 2025 • 0 new comments -
I want to calculate the matrix multiplication of two Boolean matrices, but torch.mm will report an error. Is there any more efficient alternative?
#107041 commented on
Jul 5, 2025 • 0 new comments -
Make tlparse able to show a summary of distinct graph breaks
#153669 commented on
Jul 5, 2025 • 0 new comments -
ROCm+gcc 15 asserts
#145608 commented on
Jul 5, 2025 • 0 new comments -
Pipeline Parallelism Fails when stage input does not produce gradients in all stages.
#152827 commented on
Jul 6, 2025 • 0 new comments -
ImportError: libcupti.so.11.2: cannot open shared object file: No such file or directory
#88802 commented on
Jul 6, 2025 • 0 new comments -
Add `is_outputs_batched` param to `autograd.grad`
#156616 commented on
Jul 6, 2025 • 0 new comments -
file_name is not correctly read in here
#157624 commented on
Jul 7, 2025 • 0 new comments -
Segmentation fault in torch.repeat_interleave
#157097 commented on
Jul 7, 2025 • 0 new comments -
Incorrect inference of the groups parameter type for channel_stuffle (int misclassified as Tensor)
#157603 commented on
Jul 7, 2025 • 0 new comments -
Deprecation of CUTLASS Python interface
#157456 commented on
Jul 7, 2025 • 0 new comments -
FlexAttention + int64 indexing
#157446 commented on
Jul 7, 2025 • 0 new comments -
nll_loss gives result when both input and target are 1D tensor
#157420 commented on
Jul 7, 2025 • 0 new comments -
Several `torch.*` functions raise uninformative `NotImplementedError`s when called with integer `dtype`
#157547 commented on
Jul 7, 2025 • 0 new comments -
DISABLED test_inductor_reduce_scatter_tensor_coalesced (__main__.CompileTest)
#147887 commented on
Jul 7, 2025 • 0 new comments -
DISABLED test_dont_aggressively_write_assert (__main__.ReproTests)
#156570 commented on
Jul 7, 2025 • 0 new comments -
[CI] Need better way to detect OOMs especially on pet instances
#157379 commented on
Jul 7, 2025 • 0 new comments -
[FSDP2] figure out the contract for mp_policy and tensor subclass extention
#157395 commented on
Jul 7, 2025 • 0 new comments -
Perf drop when running with FSDP and torch.compile
#156966 commented on
Jul 8, 2025 • 0 new comments -
NCCL out of memory error after updating to PyTorch 2.7
#152302 commented on
Jul 8, 2025 • 0 new comments -
`torch.compile` fails on `torch.vdot` with complex tensors
#157607 commented on
Jul 8, 2025 • 0 new comments -
Running dispatch modes on compile-disabled regions of a compiled model
#155825 commented on
Jul 8, 2025 • 0 new comments -
DDP+TP composition does not work as expected
#157445 commented on
Jul 8, 2025 • 0 new comments -
[dynamo] Replace `unimplemented` with `unimplemented_v2`
#147913 commented on
Jul 8, 2025 • 0 new comments -
General MPS op coverage tracking issue
#77764 commented on
Jul 8, 2025 • 0 new comments -
distributed/tensor/_op_schema has_symints does not check args_schema
#151106 commented on
Jul 8, 2025 • 0 new comments -
Shared `~/.cache/torch_extensions` needs to be pytorch version aware.
#68905 commented on
Jul 8, 2025 • 0 new comments -
[OSS tooling] pytorchbot fail to revert a PR
#156607 commented on
Jul 8, 2025 • 0 new comments -
RNN pseudocode wrong?
#157457 commented on
Jul 8, 2025 • 0 new comments -
Most requested ops for the MPS backend
#154052 commented on
Jul 8, 2025 • 0 new comments -
Is there some official method to extract the featuremap of each node in pt2 graph like the function torchvision.models.feature_extraction.create_feature_extractor()
#157625 commented on
Jul 8, 2025 • 0 new comments -
`version.txt` mismatch with tags in release branch
#151425 commented on
Jul 8, 2025 • 0 new comments -
FakeTensorUpdater doesn't support HOPs
#156819 commented on
Jul 8, 2025 • 0 new comments -
einops 0.6.1 x torch.compile broken in pytorch nightlies
#157417 commented on
Jul 8, 2025 • 0 new comments -
[Regression] The torchbench model resnet50_quantized_qat fail_to_run in Pytorch 2.8 but pass in PyTorch 2.7
#157434 commented on
Jul 8, 2025 • 0 new comments -
TorchInductor CPU Performance Dashboard
#93531 commented on
Jul 7, 2025 • 0 new comments -
Add the XPU item to pytorch.org/get-started
#156810 commented on
Jul 7, 2025 • 0 new comments -
functorch_maml_omniglot is a bad CPU performance smoketest model
#156511 commented on
Jul 7, 2025 • 0 new comments -
test_dtensor.py::test_dtensor_save_load_import conflicts with autoloader importing torch._dynamo
#157545 commented on
Jul 7, 2025 • 0 new comments -
Cannot copy data from one gpu to another using torch
#157398 commented on
Jul 7, 2025 • 0 new comments -
[WIP][RFC] Compilable flex_attention + Context Parallel
#157015 commented on
Jul 7, 2025 • 0 new comments -
[dynamo] using disable inside of compile always recompiles
#157399 commented on
Jul 7, 2025 • 0 new comments -
`torch.compile` fails on `prims.broadcast_in_dim` with alias annotation error
#157610 commented on
Jul 7, 2025 • 0 new comments -
Both DTensor TP and SP are missing the last collective in the backward pass
#157606 commented on
Jul 7, 2025 • 0 new comments -
`torch.compile` fails with `NotImplementedError: Unsupported for now if query, key, value are the same buffer.` in `flex_attention`
#157612 commented on
Jul 7, 2025 • 0 new comments -
`torch.export` ViT+flex attention: `Attempting to use FunctionalTensor on its own`
#140400 commented on
Jul 8, 2025 • 0 new comments -
Support reductions in FlexAttention's score_mod/mask_mod
#141627 commented on
Jul 8, 2025 • 0 new comments -
DISABLED test_cat_max_autotune_triton (__main__.TestMaxAutotune)
#145830 commented on
Jul 8, 2025 • 0 new comments -
DISABLED test_is_isnot (__main__.TestScript)
#120694 commented on
Jul 8, 2025 • 0 new comments -
DISABLED test_get_parameter_dtype (__main__.ReproTests)
#156598 commented on
Jul 8, 2025 • 0 new comments -
DISABLED test_add_sub_alpha_out (__main__.ReproTests)
#156597 commented on
Jul 8, 2025 • 0 new comments -
DISABLED test_ranks_and_tag (__main__.CompileTest)
#147974 commented on
Jul 8, 2025 • 0 new comments -
DISABLED test_graph_partition_custom_op (__main__.CudaGraphTreeTests)
#157413 commented on
Jul 8, 2025 • 0 new comments -
DISABLED test_remove_noop_slice1_cuda (__main__.GPUTests)
#151381 commented on
Jul 11, 2025 • 0 new comments -
DISABLED test_execution_into_recording (__main__.CudaGraphTreeTests)
#156838 commented on
Jul 11, 2025 • 0 new comments -
[release 2.8-2.9] Delete support for Maxwell, Pascal, and Volta architectures for CUDA 12.8 and 12.9 builds
#157517 commented on
Jul 11, 2025 • 0 new comments -
vLLM tests failing in torch 2.8rc but passing with torch 2.7
#157461 commented on
Jul 11, 2025 • 0 new comments -
Inductor output code source nodes is missing nodes for backwards graphs
#130147 commented on
Jul 11, 2025 • 0 new comments -
[Feature Request] Experimental support to Moore Threads GPU MUSA
#151303 commented on
Jul 11, 2025 • 0 new comments -
[RFC] PT2-Friendly Traceable, Functional Collective Communication APIs
#93173 commented on
Jul 11, 2025 • 0 new comments -
document functional_collectives
#113669 commented on
Jul 11, 2025 • 0 new comments -
Updated Scaled_mm to support more scaling formats via CuBlas
#153555 commented on
Jul 11, 2025 • 0 new comments -
support side effects in HOPs?
#124866 commented on
Jul 11, 2025 • 0 new comments -
non-negative least squares solver feature request
#48972 commented on
Jul 11, 2025 • 0 new comments -
a log_softmax kernel get much worse perf with padding
#122840 commented on
Jul 11, 2025 • 0 new comments -
`torch.compile` doesn't consider the alias tensor created by `tensor[:]`
#94773 commented on
Jul 11, 2025 • 0 new comments -
libTorch cpp docs missing for Tensor::item()
#41213 commented on
Jul 11, 2025 • 0 new comments -
compile PixArt-sigma error
#128012 commented on
Jul 11, 2025 • 0 new comments -
Add capturable Adagrad implementation
#118715 commented on
Jul 11, 2025 • 0 new comments -
Lowering after pointwise cat can lead to uncontiguous memory accesses
#124002 commented on
Jul 11, 2025 • 0 new comments -
[torch.compile]: Enhanced Error Reporting and Performance Canary Mode
#126644 commented on
Jul 10, 2025 • 0 new comments -
torch.view_copy(x, dtype) diverges from eager when the destiny dtype has less bytes than the origin
#129966 commented on
Jul 10, 2025 • 0 new comments -
torch.compile x custom ops: op that accepts float also accepts Tensor in eager-mode
#123470 commented on
Jul 10, 2025 • 0 new comments -
[RFC] Emit better Telemetry in PyTorch
#103173 commented on
Jul 10, 2025 • 0 new comments -
MPS Error on sequoia 15.3: NDArray dimension length > INT_MAX'
#146769 commented on
Jul 11, 2025 • 0 new comments -
Process never ends when sending tensors through multiprocessing queues in Python 3.12+ with filesystem strategy
#153050 commented on
Jul 11, 2025 • 0 new comments -
[inductor] Incorrect handle of `autocast` results in type mismatch
#121631 commented on
Jul 11, 2025 • 0 new comments -
switch more test cases to use MultithreadTestCase
#108744 commented on
Jul 11, 2025 • 0 new comments -
Change `automatic_dynamic_shapes` to trigger on `cache_size_limit` recompiles but not `accumulated_cache_size_limit` recompiles.
#114516 commented on
Jul 11, 2025 • 0 new comments -
[Torch Inductor] Torch Inductor Better Support for GNN workload and Inductor Sparse Compiler
#113232 commented on
Jul 11, 2025 • 0 new comments -
[Feature] Taylor expansion pruning
#157218 commented on
Jul 11, 2025 • 0 new comments -
[rocm] HIP Graph (on AMD GPU) capture does not raise `operation not permitted` for illegal operation whereas CUDA Graph (Nvidia GPU) does
#155684 commented on
Jul 11, 2025 • 0 new comments -
DISABLED test_comprehensive_nn_functional_conv_transpose3d_cuda_float32 (__main__.TestInductorOpInfoCUDA)
#148853 commented on
Jul 11, 2025 • 0 new comments -
Documentation Clarification Needed for Clamping of Scale Coefficient in clip_grads_with_norm_
#151554 commented on
Jul 11, 2025 • 0 new comments -
[Doc] [Win] libuv installation doc is not correct.
#148315 commented on
Jul 11, 2025 • 0 new comments -
Torch RPC examples from docs say usage is deprecated.
#149393 commented on
Jul 11, 2025 • 0 new comments -
Using Inductor always throws a warning
#154160 commented on
Jul 11, 2025 • 0 new comments -
[discussion] Analyzing a list of tensors stored as intermediate values / saved_for_backward in autograd graph
#91692 commented on
Jul 11, 2025 • 0 new comments -
DISABLED test_remove_noop_slice_cuda (__main__.GPUTests)
#151383 commented on
Jul 11, 2025 • 0 new comments -
DISABLED test_expanded_inputs (__main__.CudaGraphTreeTests)
#156886 commented on
Jul 11, 2025 • 0 new comments -
DISABLED test_inductor_all_gather_into_tensor_coalesced (__main__.CompileTest)
#146806 commented on
Jul 11, 2025 • 0 new comments -
DISABLED test_fallback_to_eager_if_recompiling_too_many_times (__main__.CudaGraphTreeTests)
#130749 commented on
Jul 11, 2025 • 0 new comments -
DISABLED test_op_has_batch_rule___rmatmul___cuda_float32 (__main__.TestVmapOperatorsOpInfoCUDA)
#157003 commented on
Jul 11, 2025 • 0 new comments -
DISABLED test_remove_noop_slice_scatter_cuda (__main__.GPUTests)
#151378 commented on
Jul 11, 2025 • 0 new comments -
DISABLED test_against_reference_multi_input_jacfwd_cuda (__main__.TestJacCUDA)
#156998 commented on
Jul 11, 2025 • 0 new comments -
DISABLED test_remove_noop_slice1_cpu (__main__.CpuTests)
#151379 commented on
Jul 11, 2025 • 0 new comments -
DISABLED test_inplace_on_view_makes_base_require_grad_cpu (__main__.TestAutogradDeviceTypeCPU)
#156209 commented on
Jul 11, 2025 • 0 new comments -
DISABLED test_remove_noop_slice_scatter_cpu (__main__.CpuTests)
#151382 commented on
Jul 11, 2025 • 0 new comments -
DISABLED test_remove_noop_slice_cpu (__main__.CpuTests)
#151384 commented on
Jul 11, 2025 • 0 new comments -
Move loop ordering after fusion
#126255 commented on
Jul 11, 2025 • 0 new comments -
[CD] Windows Wheel builds CUDA 12.9.1 Stack Overflow during build
#156181 commented on
Jul 11, 2025 • 0 new comments -
Triton kernel doing more work uses less registers
#126463 commented on
Jul 11, 2025 • 0 new comments -
[dynamo] Investigate interop issues with torch_scatter/torch_sparse/pyg_lib
#111223 commented on
Jul 11, 2025 • 0 new comments -
[DTensor] Improve `tensor_metadata` and `redistribute_cost` coverage for op strategies.
#157495 commented on
Jul 11, 2025 • 0 new comments -
FSDP offload doesn't prefetch param to GPU
#157209 commented on
Jul 12, 2025 • 0 new comments -
Tune whether to use mm or bmm for matmul in inductor max-autotune
#118774 commented on
Jul 11, 2025 • 0 new comments -
Torch compile fx graph is not removing constant propagation
#120057 commented on
Jul 11, 2025 • 0 new comments -
Use sys.settrace or torch function mode to compute how much of a model was not covered by Dynamo
#120079 commented on
Jul 11, 2025 • 0 new comments -
torch.compile Conv1d AssertionError
#123242 commented on
Jul 11, 2025 • 0 new comments -
Use Incremental Fake Tensor Updater more uniformly across torch.compile compilation
#120116 commented on
Jul 11, 2025 • 0 new comments -
[Inductor] Generate triton block pointers for discontiguous strided tensors
#125077 commented on
Jul 11, 2025 • 0 new comments -
opcheck should support TorchBind custom classes
#121162 commented on
Jul 11, 2025 • 0 new comments -
Match HuggingFace T5 SDPA pattern in Inductor
#121371 commented on
Jul 11, 2025 • 0 new comments -
Triton kernel unexpectedly gets 1.35x slower by more specializaiton
#120667 commented on
Jul 11, 2025 • 0 new comments -
[export] 14k models: AssertionError: graph-captured input # 2, of type <class 'torch.nn.parameter.Parameter'>, is not among original inputs of types
#111693 commented on
Jul 11, 2025 • 0 new comments -
[Inductor] Freezing Add support for Caching Parameter Conversions
#103990 commented on
Jul 11, 2025 • 0 new comments -
Rework Dynamic Benchmarks To Actually Vary Shapes
#113063 commented on
Jul 11, 2025 • 0 new comments -
Minifier doesn't work with dynamic shapes
#114296 commented on
Jul 11, 2025 • 0 new comments -
[pt2] Unable to trace LSTM with dynamic sequence length
#115092 commented on
Jul 11, 2025 • 0 new comments -
2 Dynamo test are failing with "Global state changed while dynamo tracing, please report a bug".
#120648 commented on
Jul 11, 2025 • 0 new comments -
Higher peak memory with torch.compile
#122512 commented on
Jul 11, 2025 • 0 new comments -
torch.compile doesn't convert all input scalar types to symbolic values
#119778 commented on
Jul 11, 2025 • 0 new comments -
Inconsistent Behavior of `torch.dsplit` with torch.compile
#118741 commented on
Jul 11, 2025 • 0 new comments -
Triton Kernel Rejects NamedTupleVariable Arguments
#148289 commented on
Jul 10, 2025 • 0 new comments -
Inconsistent export behavior for nonzero+grid_sample between CUDA and CPU/MPS backends
#152791 commented on
Jul 10, 2025 • 0 new comments -
DISABLED test_mm_plus_mm (__main__.TestPatternMatcher)
#145335 commented on
Jul 10, 2025 • 0 new comments -
[RFC] Offload collectives to NVSwitch when possible
#136567 commented on
Jul 10, 2025 • 0 new comments -
[Tracker] AutoParallel's feature request to DTensor
#156217 commented on
Jul 10, 2025 • 0 new comments -
SourcelessBuilder.create does not know how to wrap <class '__main__.InFlexData'>
#154009 commented on
Jul 10, 2025 • 0 new comments -
Is compilation caching for NumPy operators not supported in PyTorch 2.7.1?
#156943 commented on
Jul 10, 2025 • 0 new comments -
DISABLED test_end_recording_early (__main__.CudaGraphTreeTests)
#156778 commented on
Jul 10, 2025 • 0 new comments -
DISABLED test_inductor_inplace_op_on_view (__main__.CompileTest)
#147852 commented on
Jul 10, 2025 • 0 new comments -
DISABLED test_reentrant_parent_error_on_cpu_cuda (__main__.TestAutogradDeviceTypeCUDA)
#86735 commented on
Jul 10, 2025 • 0 new comments -
DISABLED test_dataclass_init_with_default_factory_with_inputs (__main__.ReproTests)
#156799 commented on
Jul 10, 2025 • 0 new comments -
FlopCounterMode doesn't support HOP
#134385 commented on
Jul 10, 2025 • 0 new comments -
Batched multi_dot / chain_matmul + let it accept a tensor instead of tuple
#55261 commented on
Jul 10, 2025 • 0 new comments -
`__getitem__` fails to vmap for one dimensional tensors
#124423 commented on
Jul 10, 2025 • 0 new comments -
DISABLED test_error_on_dealloc_use (__main__.CudaGraphTreeTests)
#156801 commented on
Jul 10, 2025 • 0 new comments -
DISABLED test_comprehensive_pca_lowrank_cuda_float32 (__main__.TestInductorOpInfoCUDA)
#139828 commented on
Jul 10, 2025 • 0 new comments -
DISABLED test_inductor_reduce_scatter_tensor_single (__main__.CompileTest)
#147911 commented on
Jul 10, 2025 • 0 new comments -
Outdated install commands
#152213 commented on
Jul 10, 2025 • 0 new comments -
PyTorch 2.6 License Issues
#150118 commented on
Jul 9, 2025 • 0 new comments -
[XPU User Empathy Day][whisper][Arc770][Win]XPU performance is worse than CPU
#151985 commented on
Jul 9, 2025 • 0 new comments -
xpu: missing aten ops needed to support Huggingface quanto
#132947 commented on
Jul 9, 2025 • 0 new comments -
Preload CUDA fails if CUDA libs in different PYTHONPATH
#147001 commented on
Jul 9, 2025 • 0 new comments -
Install pytorch from pypi using local CUDA build
#150742 commented on
Jul 9, 2025 • 0 new comments -
☂️ Update submodule dependencies to supported version of Cmake
#150328 commented on
Jul 9, 2025 • 0 new comments -
Remove Direct Arm Compute Libray (ACL) Integration for Quantized Matmuls: `qlinear`/`qlinear_dynamic`
#148902 commented on
Jul 9, 2025 • 0 new comments -
Multihead Attention does not work with jagged tensors due to __torch_function__
#153472 commented on
Jul 9, 2025 • 0 new comments -
[RFC] Experimental Wheel Variant Support
#155141 commented on
Jul 9, 2025 • 0 new comments -
Gross mismatch in PDF between CUDA and CPU for multivariate Gaussian mixture models
#156959 commented on
Jul 9, 2025 • 0 new comments -
[ONNX] Create a tutorial for exporting hf transformers model
#156258 commented on
Jul 9, 2025 • 0 new comments -
`bytes(...)` support of torch tensor does not match numpy + it would be nice to support tensor.tobytes() as alias
#108565 commented on
Jul 9, 2025 • 0 new comments -
cpp_extension.py expects an integer on CUDA_ARCH, failing with Grace Hopper.
#144037 commented on
Jul 9, 2025 • 0 new comments -
torch.nn.InstanceNorm2d throws "mixed dtype" error with track_running_stats set to True
#139140 commented on
Jul 9, 2025 • 0 new comments -
Failure with cub::TransformInputIterator in 12.9 periodic CI test
#157502 commented on
Jul 9, 2025 • 0 new comments -
Matmul with int32 parameters on Intel GPU leads to errors
#144766 commented on
Jul 9, 2025 • 0 new comments -
Dump bytecode of resumption frames in tlparse
#136038 commented on
Jul 9, 2025 • 0 new comments -
Regression in llama2 model export
#157323 commented on
Jul 10, 2025 • 0 new comments -
ROCm, 7900 XTX: Pytorch FLASH_ATTENTION SDPA is 2.5x slower than MATH (fp16, head_dim 256, seqlen 4360, 12 heads)
#152595 commented on
Jul 10, 2025 • 0 new comments -
MPS Memory Leak
#154329 commented on
Jul 10, 2025 • 0 new comments -
torch._dynamo.mark_static_address refuses to work with nn.Parameter
#157221 commented on
Jul 10, 2025 • 0 new comments -
PyTorch source code build failed on some Windows 11 environment caused by C++ protocol buffer compiler
#143795 commented on
Jul 10, 2025 • 0 new comments -
[inductor][dynamic shapes] hugging face models fail while creating error guard
#157330 commented on
Jul 10, 2025 • 0 new comments -
An error occurs when ‘max_split_size_mb ’and ‘expandable_segments ’ are enabled at the same time.
#123548 commented on
Jul 10, 2025 • 0 new comments -
Segmentation faults in test_ops.py tests with gcc13 on AArch64 (v1)
#157626 commented on
Jul 10, 2025 • 0 new comments -
[feature request] Native checkpointing to/from `s3://`
#155992 commented on
Jul 10, 2025 • 0 new comments -
ImportError: cannot import name 'scaled_mm_configs' from 'torch._inductor.kernel.mm_common
#157343 commented on
Jul 10, 2025 • 0 new comments -
[inductor][cpu]mobilenet_v2_quantized_qat float32 single thread static/dynamic shape CPP/default wrapper performance regression in 2024-04-28 nightly release
#125672 commented on
Jul 10, 2025 • 0 new comments -
torch.compile not compatible with multiprocessing pool
#97992 commented on
Jul 10, 2025 • 0 new comments -
Higher train loss and worse evaluation metrics when using `torch.compile()`
#113180 commented on
Jul 10, 2025 • 0 new comments -
Different behaviors in `torch.nn.functional.hinge_embedding_loss` between eagermode and torch.compile
#118175 commented on
Jul 10, 2025 • 0 new comments -
CompiledFxGraph.current_callable is not thread-safe
#138961 commented on
Jul 10, 2025 • 0 new comments -
In Inductor-wrapped tests, reset() before and after each test and turn off suppress_errors=True
#122804 commented on
Jul 10, 2025 • 0 new comments -
Out of bounds error with `nn.MultiMarginLoss`
#105597 commented on
Jul 10, 2025 • 0 new comments -
Undefined symbol: cuOccupancyMaxActiveClusters
#115075 commented on
Jul 10, 2025 • 0 new comments -
DISABLED test_inductor_reuse_buffer_after_inplace_collective (__main__.CompileTest)
#147950 commented on
Jul 10, 2025 • 0 new comments -
DISABLED test_error_on_dealloc_use2 (__main__.CudaGraphTreeTests)
#156808 commented on
Jul 10, 2025 • 0 new comments -
DISABLED test_deferred_runtime_asserts (__main__.ReproTests)
#156817 commented on
Jul 10, 2025 • 0 new comments -
[torch.compile] tighten FX graph restrictions post-functionalization
#133250 commented on
Jul 10, 2025 • 0 new comments -
[proposal] "Name" string attribute for modules, parameters, buffers, tensors for more pleasant debugging (especially for graph printouts / export / studying compiled generated code)
#104247 commented on
Jul 10, 2025 • 0 new comments -
empty_cache does not work for CUDAPluggableAllocator + MemPool
#145168 commented on
Jul 10, 2025 • 0 new comments -
[ONNX] Flip `dynamo` default to True in torch.onnx.export
#151693 commented on
Jul 10, 2025 • 0 new comments -
[MPS] Migrate torch.sort to Metal shader
#155560 commented on
Jul 10, 2025 • 0 new comments -
StrideAPI caused regression in channels-last logic
#141836 commented on
Jul 10, 2025 • 0 new comments -
MPS Performance regressions on Sonoma 14.0
#111517 commented on
Jul 10, 2025 • 0 new comments -
MaxPool2D memory leakage on device MPS
#125217 commented on
Jul 10, 2025 • 0 new comments -
[MPS] BatchNorm2D produces incorrect results for column first tensors
#134580 commented on
Jul 10, 2025 • 0 new comments -
Inefficient 2D convolution compared to JAX
#157334 commented on
Jul 10, 2025 • 0 new comments -
Fix broken linalg unittests on ARM platform
#125438 commented on
Jul 10, 2025 • 0 new comments -
torch.compile bug when using resize
#155209 commented on
Jul 10, 2025 • 0 new comments -
Compilation issues with ROCm 6.4.1 on Debian 12
#155794 commented on
Jul 10, 2025 • 0 new comments -
PyTorch CPP Extensions fail when same kernel is compiled more than once on ROCm servers
#155344 commented on
Jul 10, 2025 • 0 new comments -
DISABLED test_slice_scatter_reinplace_cuda (__main__.GPUTests)
#145189 commented on
Jul 10, 2025 • 0 new comments