July 12, 2025 – July 19, 2025

Overview

328 Active pull requests

16 Active issues

221 Pull requests merged by 7 people

[XLA:GPU] Support control flow thunks in command buffer conversion pass. We only convert kWhile and kConditional thunks if all thunks in all brunches are convertible.
#97183 merged Jul 19, 2025
#sdy Fix forward of making XLA C++ changes so we can fall back to GSPMD in JAX export if the loaded module was lowered for GSPMD.
#96368 merged Jul 19, 2025
[XLA] Add helper function GetIndicesSpecForDynamicSlice to get indices spec for dynamic slice fed by all-gather, the spec includes the mapping from slice offsets to corresponding partition IDs(flattened-id).
#96805 merged Jul 19, 2025
Internal, visibility only changes to public code.
#97207 merged Jul 19, 2025
Add visibility to hlo_input_output_format
#96758 merged Jul 19, 2025
Use a literal sentinel value for kernel init failure
#97193 merged Jul 19, 2025
Reduce redundancy between StringTo* enum functions.
#97201 merged Jul 19, 2025
[XLA:CPU] Refactor Intrinsic and use it in all math intrinsics.
#97000 merged Jul 19, 2025
Integrate LLVM at llvm/llvm-project@06ae0c2a1086
#97154 merged Jul 19, 2025
Update nccl_archive BUILD file to fix TF GPU wheel build.
#97206 merged Jul 19, 2025
[XLA:GPU] Add a verifier to the GPU compiler before post-scheduling pipeline.
#97150 merged Jul 19, 2025
Use host callback in the CopyToHostFuture method in Async PjRt.
#97203 merged Jul 18, 2025
Add function ExtractDynamicSliceFromCollectiveUser to extract a dynamic slice user from a collective.
#96802 merged Jul 18, 2025
no external change
#97100 merged Jul 18, 2025
Reverts 849435a30d0487e415126507953575358ed3c4eb
#97190 merged Jul 18, 2025
Reverts 2a45c5b0c326e20eafe833df055326b39edadcf2
#97071 merged Jul 18, 2025
Bump sqlite to 3.50.3
#97191 merged Jul 18, 2025
Typo fix "perferred" -> "preferred".
#97198 merged Jul 18, 2025
PR #28257: [XLA:GPU] Update ONEAPI crosstool compiler wrapper
#97149 merged Jul 18, 2025
Use ASSERT_THAT to check pass.Run() result
#97164 merged Jul 18, 2025
Update the XNNPack delegate README.
#97181 merged Jul 18, 2025
Annotate some XLA:GPU flags as stable i.e. they should provide 6 month deprecation notice.
#97134 merged Jul 18, 2025
[XLA:GPU] Add a test for DotForInt4vsIdentityBF16ReturnsCorrectResult.
#97064 merged Jul 18, 2025
PR #28985: [XLA:GPU] Add shared_memory_per_block_optin device info member
#97140 merged Jul 18, 2025
Update README.md
#96902 merged Jul 18, 2025
Update dependencies to XNNPACK.
#97177 merged Jul 18, 2025
[XLA:GPU] Move Dot strength reduction out of algebraic simplifier
#97166 merged Jul 18, 2025
[XLA:GPU] Remove CHECK-CSE since it is not used.
#97129 merged Jul 18, 2025
#sdy improve the error messaging when importing and exporting sharding custom calls.
#97041 merged Jul 18, 2025
Introduce new helper function that produces device lists for iota tile assignment. Apply it in xla_sharding_util.cc.
#97176 merged Jul 18, 2025
Introduce stable flags and associated deprecation policy for XLA debug options.
#97049 merged Jul 18, 2025
Use GetInPlaceInputOutputPairs from AliasInfo instead of HloDataflowAnalysis.
#97170 merged Jul 18, 2025
Remove ifdef from ir_emitter_unnested and fix various clang-tidy warnings
#97127 merged Jul 18, 2025
Add TmaMetadata serialization support
#97103 merged Jul 18, 2025
Automated Code Change
#97109 merged Jul 18, 2025
Move GetInPlaceInputOutputPairs and related code to AliasInfo class (NFC).
#97119 merged Jul 18, 2025
Automated Code Change
#97123 merged Jul 18, 2025
Fix tests paths and visibility issue for tflite/converter
#97147 merged Jul 18, 2025
Remove leftover logging
#97145 merged Jul 18, 2025
Automated Code Change
#97033 merged Jul 18, 2025
Propagate context to the waiter destruction sequence, so that all contained operations execute with the correct context.
#97143 merged Jul 18, 2025
Update PjRtCpuExecutable to not rely on any internals of PjRtCpuBuffer.
#97146 merged Jul 18, 2025
Handle V2 xla::OpSharding in ExtractInputsForLogicalDevices and ParseAndValidateOutputSharding.
#97136 merged Jul 18, 2025
Exclude tensorflow/lite/mlir/lite protos definitions when compiling under LiteRT repo and enable LiteRT disbale_tf_lite_py by default
#97137 merged Jul 18, 2025
Update version to 2.21.0
#97079 merged Jul 18, 2025
[XLA:TPU] In MSA, when removing instructions, we need to remove their scoped allocations from PresetAssignments.
#96945 merged Jul 17, 2025
Modified python bindings to enable passing a probe_instrumentation_dir to support interpreter ops in eval_module. Consistent with StableHLO interpreter usage from command line
#97091 merged Jul 17, 2025
[XLA][host offloading] Return AsyncValue from HostOffloadingExecutable.
#96915 merged Jul 17, 2025
#sdy update dump names and add index as prefix so they would be clearer for users
#97117 merged Jul 17, 2025
[Autotuner] Add block level emitter backend for Triton fusion (3).
#96798 merged Jul 17, 2025
[IFRT] Add UserContextScope
#97012 merged Jul 17, 2025
Add ReleaseDeviceMemoryOwnership implementation based on
#97144 merged Jul 17, 2025
Migrate uses of XLA_TEST_BACKEND macros to use utilities in xla_test_backend_predicates.h
#97135 merged Jul 17, 2025
Correctly identify async start and done ops in latency hiding scheduler.
#97089 merged Jul 17, 2025
Close output shardings to respect allow_spmd_sharding_propagation_to_output flag set to default {false} value. Added multiple test variants to test shardy, use_compile_options_from_model.
#97126 merged Jul 17, 2025
[xla:cpu] Make DotLibraryRewriter support greedy fusion mode.
#96319 merged Jul 17, 2025
Internal change only
#97065 merged Jul 17, 2025
Optimize BM_GlobalDecreasingSizeBestFitHeap benchmark by up to 3%.
#97075 merged Jul 17, 2025
Update release notes for TensorFlow 2.20.0
#97080 merged Jul 17, 2025
Relax the folding size threshold to 200 MiB.
#97078 merged Jul 17, 2025
Update CommonPjRtBufferImpl to have specialized versions for both cpu->device
#97085 merged Jul 17, 2025
#sdy define the utils that JAX jaxlib will use to allow for falling back to GSPMD when loading an old checkpoint.
#97130 merged Jul 17, 2025
[Autotuner] Add block level emitter backend for Triton fusion (2).
#96796 merged Jul 17, 2025
Use ASSERT_THAT(..., IsOkAndHolds(true)) for consistency and correctness
#97005 merged Jul 17, 2025
fix(dtensor): guard against nullptr from TF_TensorData in ExtractSmallTensorValue
#96866 merged Jul 17, 2025
Reverts 812bb86d50b1cee5cf32ccb1629a49687e924ea5
#97098 merged Jul 17, 2025
Simplify ShouldSkipForSideEffect function in zero_sized_hlo_elimination.
#97101 merged Jul 17, 2025
[XLA:GPU] Remove unused DotSparsityRewriter.
#97128 merged Jul 17, 2025
Automated Code Change
#97122 merged Jul 17, 2025
[XLA:GPU] additional logging in triton fusion numeric verifier
#97056 merged Jul 17, 2025
[xla:gpu][triton] triton-xla-squeeze-dims pass improvements.
#97099 merged Jul 17, 2025
Automated Code Change
#96959 merged Jul 17, 2025
PR #28073: [XLA:GPU][oneAPI] Enable Level_zero support
#97022 merged Jul 17, 2025
Remove deprecated HloAliasAnalysis::Run method
#97044 merged Jul 17, 2025
Add serialization and deserialization for the cuDNN thunk
#96914 merged Jul 17, 2025
no external change
#96942 merged Jul 17, 2025
[xla] Optimize ShapeUtil::ForEach traverals
#97063 merged Jul 17, 2025
Support INT16 for PRelu op
#96899 merged Jul 17, 2025
[xla:tf] Check if device shape is already a host shape
#97018 merged Jul 17, 2025
Add int16 kernel support for DIV op
#96934 merged Jul 17, 2025
Rollback https://github.com/openxla/xla/commit/cf3dfa9723c4cd4e2b25a606207a201a95fe71db
#97074 merged Jul 17, 2025
Fix //tflite/converter/tests/... MLIR tests by fixing .bzl rules and redirecting tensorflow submodule
#97003 merged Jul 16, 2025
Update release notes at HEAD
#97073 merged Jul 16, 2025
Enable --flaky_test_attempts in release branch
#97076 merged Jul 16, 2025
Move op name longest prefix logic from annotation.cc to somewhere upper level
#93906 merged Jul 16, 2025
Internal change only
#96928 merged Jul 16, 2025
Refactor optimized div for int8 and uint8
#96933 merged Jul 16, 2025
Add Hermetic C++ Toolchains for Linux x86_64 builds.
#96803 merged Jul 16, 2025
Migrate uses of XLA_TEST_BACKEND macros to use utilities in xla_test_backend_predicates.h
#97006 merged Jul 16, 2025
[JAX]: rollforward. Add ability to add a transfer server factory to override
#97069 merged Jul 16, 2025
Update dependencies to XNNPACK and cpuinfo.
#96990 merged Jul 16, 2025
Complete the CommonPjRtBufferImpl implementation.
#97001 merged Jul 16, 2025
[xla] Move xla::Shape functions that are used on a hot path to header file
#97057 merged Jul 16, 2025
Increase the size of __tensorflow_core_lib_core_legacy_lib_core_all_tests to deflake CI.
#97061 merged Jul 16, 2025
Support composite unpack and pack legalization with dynamic shape
#97062 merged Jul 16, 2025
Reverts e52a31e166af020e465c7494a6353f098a65155c
#97066 merged Jul 16, 2025
Rollback for missing header
#97067 merged Jul 16, 2025
#sdy Mark xla.sdy.LocalToGlobalShape custom call as side effecting so it isn't removed if unused.
#97037 merged Jul 16, 2025
Update 06 broken links in question_answer.md
#94881 merged Jul 16, 2025
Added PjrtClient::UpdateGlobalProcessInfo method.
#95611 merged Jul 16, 2025
[tf] Use non-owning ShapeTree to pass execution inputs to XLA
#97055 merged Jul 16, 2025
PR #28877: [XLA]Clamp num_workers to avoid partition overflow
#97046 merged Jul 16, 2025
[XLA:GPU] Disable horizontal loop fusion.
#96410 merged Jul 16, 2025
[tf] Use non-owning ShapeTree to pass execution inputs to XLA
#97053 merged Jul 16, 2025
[XLA] Be less aggressive about recursively updating metadata when inlining.
#97043 merged Jul 16, 2025
[XLA:GPU] Move IsIntermediate & FindHero to shared ir_emission_utils.
#97052 merged Jul 16, 2025
[XLA:ALGEBRAIC_SIMPLIFIER] If an optimization barrier has an unused side-effecting instruction, do not remove the optimization barrier
#97048 merged Jul 16, 2025
Move HloAliasAnalysis out of HloModuleGroupMetadata (NFC).
#97036 merged Jul 16, 2025
Pass proper AliasInfo to HloAliasAnalysis::Run in tests (NFC).
#97039 merged Jul 16, 2025
[XLA:GPU] Update documentation for triton_xla.extract/insert.
#97038 merged Jul 16, 2025
[xla][gpu][triton] Temporarily disable triton squeeze dims pass, due to internal benchmark regression.
#97028 merged Jul 16, 2025
Remove unused HloAliasAnalysis instance (NFC).
#97024 merged Jul 16, 2025
Skip TreeReductionRewriter for Slinky.
#96968 merged Jul 16, 2025
[XLA:GPU] update triton test for generic emitter
#96994 merged Jul 16, 2025
Automated Code Change
#96966 merged Jul 16, 2025
[xla] Add benchmark for ShapeUtil::SubshapeCount
#97021 merged Jul 16, 2025
Reverts e74d259786b388e8ff7af90d426e665c84388229
#96991 merged Jul 16, 2025
Automated Code Change
#96965 merged Jul 16, 2025
Automated Code Change
#96957 merged Jul 16, 2025
[xla] Change the order of std::variant types in MaybeOwningDeviceMemory
#97007 merged Jul 16, 2025
The raw buffer CopyToMemorySpace don't seem to quite work yet cross client, so avoid
#97011 merged Jul 16, 2025
[xla] Optimize constructing ShapeTree
#97008 merged Jul 16, 2025
[JAX] Cache transfer server connections for cross-host device_put.
#97002 merged Jul 15, 2025
Update target define states before we update ready list.
#97004 merged Jul 15, 2025
Reverts e8964b7d937c027100b0b4aed68f02ac57ea0333
#96996 merged Jul 15, 2025
Optimize xla::GlobalDecreasingSizeBestFitHeap::MakeFreeChunks when using power-of-2 memory alignments, and add 1024B alignment test to benchmark.
#96925 merged Jul 15, 2025
Create xla::test::Empty for instantiating empty test suites.
#96997 merged Jul 15, 2025
Add ::GetReadyFuturePromise to be used in implementing
#96951 merged Jul 15, 2025
Add an option to do multiple executions of the same module to HloRunners.
#96752 merged Jul 15, 2025
[tf:xla] Avoid accidental copies of large Op attributes
#96952 merged Jul 15, 2025
Add deprecation message for TFLITE_XNNPACK_DELEGATE_FLAG_ENABLE_SUBGRAPH_RESHAPING
#96688 merged Jul 15, 2025
Pass proper AliasInfo to HloAliasAnalysis::Run (NFC).
#96983 merged Jul 15, 2025
[XLA][Numerics][HLO Value Tracking] Add recovery modules when removing nested reshapes on TPU
#96503 merged Jul 15, 2025
Add CopyToMemorySpace which calls DirectCopyToMemorySpace or
#96947 merged Jul 15, 2025
#HLODiff Remove text diff summary
#96938 merged Jul 15, 2025
#HLODiff Update print progress at the end of matcher to show 100%.
#96937 merged Jul 15, 2025
[XLA:CPU] Don't expand tanh at the fusion level.
#96987 merged Jul 15, 2025
[IFRT] Do not set MHLO shardings if sdy partitioned
#96903 merged Jul 15, 2025
Adds a new rematerialization method that focuses on rematerializing only the highest memory usage peak in the module at any given remat pass (instead of rematerializing the first point at which the memory limit is reached). Should result in more monotonic rematerialization and avoid rematerializing unecessary instructions. Usually not as efficient as regular rematerialization but can help in specific cases. The new mode is not enabled yet. Reworks Instruction List to use unique ptrs.
#96995 merged Jul 15, 2025
Handle GetDonatableInputIndices() errors
#96954 merged Jul 15, 2025
[XLA:CPU] Disable fusion level vectorization.
#96986 merged Jul 15, 2025
Add missing header.
#96878 merged Jul 15, 2025
[XLA:CPU][XLA:GPU] Set default alignment of vector load/store as that of the vector element type.
#96982 merged Jul 15, 2025
avoid failure when docstrings have been stripped (python -OO)
#96906 merged Jul 15, 2025
#sdy Clean up AddAxisOrMergeInserter in dedup_meshes
#96254 merged Jul 15, 2025
[ifrt] Fix spelling in CopyArraysOp description.
#96993 merged Jul 15, 2025
Disable failure_handler_test for Mac
#96943 merged Jul 15, 2025
PR #28716: [GPU] Make fabric info test compatible with lower CUDA driver versions
#96788 merged Jul 15, 2025
Remove MeshAttr builder that takes a single int
#96988 merged Jul 15, 2025
#sdy Mark xla.sdy.LocalToGlobalShape custom call as side effecting so it isn't removed if unused.
#96909 merged Jul 15, 2025
Migrate away from ArrayRef(std::nullopt_t)
#96989 merged Jul 15, 2025
[XLA:GPU] Implement tiling for dot.
#96676 merged Jul 15, 2025
PR #28728: Add Nvidia benchmarks
#96922 merged Jul 15, 2025
Make Thunk keep an instance of ThunkInfo directly (NFC)
#96910 merged Jul 15, 2025
Remove workarounds for missing ABSL_DEPRECATE_AND_INLINE
#96984 merged Jul 15, 2025
[XLA:CPU][XLA:GPU] Increase limit in number of iterations of UnswitchLoopsPass.
#96913 merged Jul 15, 2025
[xla:cpu] Add DotLibraryRewriter rewrite options for oneDNN and XNNPACK.
#96981 merged Jul 15, 2025
Automated Code Change
#96978 merged Jul 15, 2025
fix(proto_splitter): return error if FindFieldByNumber yields null field_desc in ProcessField
#96429 merged Jul 15, 2025
[xla:cpu] Tiny improvements for documentation and function names
#96976 merged Jul 15, 2025
Fix shardy_xla_pass_test that is failing
#96946 merged Jul 15, 2025
[XLA:CPU][XLA:GPU] Fix missing layout on emitted constants.
#96791 merged Jul 15, 2025
Automated Code Change
#96972 merged Jul 15, 2025
Remove dependency on KernelArguments from CudnnThunk
#96911 merged Jul 15, 2025
[XLA:GPU] Do not multi-output fuse sibling transposes with reductions.
#96774 merged Jul 15, 2025
Migrate away from ArrayRef(std::nullopt_t)
#96944 merged Jul 15, 2025
Fix incorrect per-channel scaling in fully_connected on Android
#96522 merged Jul 15, 2025
Automated Code Change
#96690 merged Jul 15, 2025
PR #28401: [ROCm] Fix PackedTranspose for adapting to warp size 64
#96971 merged Jul 15, 2025
PR #25914: [NVIDIA GPU] Add nvshmem communicator and runtime thunks
#96897 merged Jul 15, 2025
[XLA] Propagate op_names recursively in the CallInliner.
#96926 merged Jul 15, 2025
Fix test-case when NVML library is not available.
#96917 merged Jul 15, 2025
[xla:cpu] Mark cpu_function_runtime alignment as deprecated
#96732 merged Jul 15, 2025
initial implementation of send/recv static verification
#96517 merged Jul 15, 2025
Remove unused ExecutionProfile option.
#96680 merged Jul 15, 2025
[JAX] Use experimental DCN transfer library as a fallback for PjRt-IFRT cross-host device transfers when the PjRt plugin doesn't implement the cross-host transfer APIs.
#96817 merged Jul 15, 2025
Add HloAsyncStartInstruction::AddCallOperand to mirror HloCallInstruction::AddCallOperand.
#96742 merged Jul 15, 2025
[xla:codegen] Migrate Fptrunc to GetOrInsertDeclaration API
#96949 merged Jul 15, 2025
Check shape rank is less than XNN_MAX_TENSOR_DIMS for TRANSPOSE
#96813 merged Jul 15, 2025
[XLA] Refactoring Reduce Window Rewriter to reduce complexity
#96815 merged Jul 15, 2025
Migrate away from ArrayRef(std::nullopt_t)
#96940 merged Jul 15, 2025
[xla:codegen] Use Intrinsic::Type in Fptruc::CreateDefinition
#96921 merged Jul 15, 2025
Add 'mode' attribute to AllReduce and ReduceScatter.
#96214 merged Jul 14, 2025
Reverts d41335bc8404d9a347913fa9c776ca4a1bb7e94a
#96939 merged Jul 14, 2025
Extract CheckUniformReplicaGroups to verify that all replica groups in a collective instruction are of the same size, which is a precondition for many collective optimizations.
#96801 merged Jul 14, 2025
[Efficiency]Cleanup unused metrics which track the pjrt compilation status.
#96736 merged Jul 14, 2025
[IFRT IR] Add pipeline for compiling IFRT IR programs
#96891 merged Jul 14, 2025
[XLA:GPU] update determenism test to use generic triton emitter
#96920 merged Jul 14, 2025
Allow biases with rank > 1 to be fused to FC
#96595 merged Jul 14, 2025
set layout assignment for the result correctly
#96421 merged Jul 14, 2025
Add CloneWithControlDependency which is used to implement
#96804 merged Jul 14, 2025
Automated Code Change
#96828 merged Jul 14, 2025
[XLA:GPU]: Enable two-shot all reduce implementation for usage.
#96484 merged Jul 14, 2025
Reverts e5982331c429fce7c9c4389f2d15eee5ff3e9791
#96751 merged Jul 14, 2025
set default layout when exporting dense constants from HLO to MLIR
#96737 merged Jul 14, 2025
Re-enable precompilation for some tests.
#96747 merged Jul 14, 2025
[XLA:GPU] enable nested fusion for autotuner test
#96916 merged Jul 14, 2025
quick fix for sigill on non-null device
#96748 merged Jul 14, 2025
Align AtLocation signature with Abseil LogMessage::AtLocation.
#96907 merged Jul 14, 2025
[XLA] Use "edge time indices" to skip some redundant calls to FindChunkCandidate.
#96744 merged Jul 14, 2025
[XLA:CPU] Move erf32 approximation to mathlib.
#96783 merged Jul 14, 2025
Removing stale function signature references from tensorflow that rely on old options of type variant<int, string>
#95416 merged Jul 14, 2025
[XLA:CPU] Add expm1 expansion.
#96782 merged Jul 14, 2025
[XLA:GPU]: Calculate rank_offset and rotated_ranks outside the kernel.
#95954 merged Jul 14, 2025
[XLA:CPU] Move passes from expand_float_ops that lower to math lib.
#96781 merged Jul 14, 2025
[XLA:GPU]: Calculate launch dimensions based on input size.
#95893 merged Jul 14, 2025
Pass proper AliasInfo to HloAliasAnalysis::Run in HostOffloader (NFC).
#96904 merged Jul 14, 2025
[XLA:GPU] Print fusion string when selecting the best result, instead of root string.
#96787 merged Jul 14, 2025
[xla][gpu][triton] Do not duplicate code in squeeze dims pass, re-enable the pass.
#96894 merged Jul 14, 2025
Disable NVSHMEM send-recv test-case due to flakiness.
#96892 merged Jul 14, 2025
PR #28295: [NVIDIA GPU] Do out of place allreduce for nvshmem
#96893 merged Jul 14, 2025
[XLA:GPU] Remove code for horizontal_input_fusion.
#96424 merged Jul 14, 2025
Update StreamExecutorGpuClientTest.PropagateError test to expect unpacked tuples
#96901 merged Jul 14, 2025
XLA:GPU: Fix method ambiguity on CUDA 12.4
#96877 merged Jul 14, 2025
Avoid using PointsToAnalysis in DFSMemoryScheduler (NFC).
#96718 merged Jul 14, 2025
Apply patch to fix compile error on windows (NFC).
#96896 merged Jul 14, 2025
Always stage transfers when doing d2h copy to avoid memory corruption issue.
#96821 merged Jul 14, 2025
[xla:codegen] Use Intrinsic::Type in Fptruc::GetOrInsertDeclaration
#96858 merged Jul 13, 2025
Reverts 6503034148ab3c0469a32d20b9a3ea397457a8f8
#96829 merged Jul 13, 2025
#sdy If auto partitioning is enabled and there is no registered auto partitioner, register Alpa as the default.
#96860 merged Jul 12, 2025
Fix typo in xnn_fusion_thunk.cc.
#96823 merged Jul 12, 2025

107 Pull requests opened by 4 people

Automated Code Change
#96850 opened Jul 12, 2025
Automated Code Change
#96851 opened Jul 12, 2025
Automated Code Change
#96853 opened Jul 12, 2025
Automated Code Change
#96856 opened Jul 12, 2025
Automated Code Change
#96857 opened Jul 12, 2025
Automated Code Change
#96862 opened Jul 12, 2025
Automated Code Change
#96863 opened Jul 12, 2025
Automated Code Change
#96864 opened Jul 12, 2025
Automated Code Change
#96905 opened Jul 14, 2025
Migrate ListScheduler from TuplePointsToAnalysis to HloAliasAnalysis (NFC).
#96908 opened Jul 14, 2025
Automated Code Change
#96912 opened Jul 14, 2025
#sdy Fix forward of making JAX changes so we can fall back to GSPMD in JAX export if the loaded module was lowered for GSPMD.
#96918 opened Jul 14, 2025
Add sdy shardings in frontend_attributes alongside hlo shardings for extra wrapper main added in tf2xla bridge.
#96919 opened Jul 14, 2025
Automated Code Change
#96923 opened Jul 14, 2025
Automated Code Change
#96924 opened Jul 14, 2025
Fix cost analysis on for output byte accessed when result is tuple
#96927 opened Jul 14, 2025
Automated Code Change
#96929 opened Jul 14, 2025
Avoid crashing when LRU cache keys change.
#96930 opened Jul 14, 2025
Automated Code Change
#96931 opened Jul 14, 2025
test PR #28728: Add Nvidia benchmarks
#96941 opened Jul 14, 2025
There is nothing in this change going to 3rd party.
#96950 opened Jul 15, 2025
Integrate LLVM at llvm/llvm-project@06ae0c2a1086
#96953 opened Jul 15, 2025
[xla:gpu][triton] Add squeeze_dims of tt.descriptor_load rewrite.
#96979 opened Jul 15, 2025
Integrate LLVM at llvm/llvm-project@0d5325bb203f
#96998 opened Jul 15, 2025
lite: Add config option to enable benchmark_model
#96999 opened Jul 15, 2025
Fix tflite converter MLIR tests with copy's of td_ops
#97009 opened Jul 16, 2025
Minor improve IfrtServingExecutable performance
#97010 opened Jul 16, 2025
Automated Code Change
#97013 opened Jul 16, 2025
Automated Code Change
#97025 opened Jul 16, 2025
Update deps:
#97035 opened Jul 16, 2025
PR #28735: [XLA:GPU] Enabling cuda graph concurrent mode by default
#97045 opened Jul 16, 2025
[XLA:GPU] Move the s4 unpacking sequence from llvm pass to int4->int8 pass
#97047 opened Jul 16, 2025
[XLA:CPU][XLA:GPU] Move concat fusion emitter to shared directory
#97050 opened Jul 16, 2025
[XLA:GPU][host offloading] Implement gpu host offloading allocator.
#97051 opened Jul 16, 2025
Allow the chaining of state across MetricHookInterface instantiations for multiple compilations.
#97054 opened Jul 16, 2025
[XLA:GPU][host offloading] Implement host offloading thunks.
#97059 opened Jul 16, 2025
[#HLODiff] Add support for manual node matching.
#97060 opened Jul 16, 2025
Avoid recomputation of `pjrt_buffer->memory_space()` in `MakeMemoryKindFromPjRtBuffer`.
#97068 opened Jul 16, 2025
No changes to 3rd party.
#97070 opened Jul 16, 2025
Add JAX tests for deadlock verifier
#97072 opened Jul 16, 2025
Added WatchJobState RPC to coordination service.
#97077 opened Jul 16, 2025
Remove `local_config_nvshmem` repository and corresponding macros.
#97082 opened Jul 17, 2025
Cache device on `PJRT_Buffer`.
#97083 opened Jul 17, 2025
[XLA:MSA] Add block allocations for program weights that are not aliased and single use.
#97084 opened Jul 17, 2025
Integrate LLVM at llvm/llvm-project@2910c24638fc
#97086 opened Jul 17, 2025
[XLA:CPU] Run CSE after inlining in fusion compiler.
#97114 opened Jul 17, 2025
PR #28883: [XLA:CPU][oneDNN] Add build flag to enable asynchronous support in oneDNN
#97115 opened Jul 17, 2025
Introduce --dump_tflite_model_dir to dump TFLite models in Delegate Test Suite (DTS)
#97118 opened Jul 17, 2025
Solve the problem number #97125
#97131 opened Jul 17, 2025
Add 10 Maxtext-derived HLO-based benchmarks
#97132 opened Jul 17, 2025
[XLA:GPU] Refactor tests of IndexingMap
#97138 opened Jul 17, 2025
Integrate LLVM at llvm/llvm-project@06ae0c2a1086
#97139 opened Jul 17, 2025
Avoid checking captured_tensors' usage when deciding if
#97141 opened Jul 17, 2025
Add Metal LiteRt Tensor Buffer support
#97142 opened Jul 17, 2025
[XLA][Numerics][HLO Value Tracking] Track original values through propagation of shardy annotation
#97148 opened Jul 17, 2025
Enable N-dimensional sparse tensor in tpu_embedding_v3.py
#97151 opened Jul 17, 2025
Remove redundant string conversion.
#97152 opened Jul 17, 2025
[IFRT] Define `user_context()` in `Value` and `LoadedExecutable`
#97153 opened Jul 17, 2025
Optimize `HasCombinableReplicaGroup` and `xla::CheckReplicaGroups`.
#97155 opened Jul 18, 2025
[IFRT] Support XLA GPU flag overrides.
#97156 opened Jul 18, 2025
Change `PjRtClient::LazyToLiteral` to take a generator that returns a future of the literal
#97162 opened Jul 18, 2025
[XLA:MSA] Reduce available memory bandwidth for instruction that are overlapped with bandwidth limiting asynchronous instructions.
#97163 opened Jul 18, 2025
Automated Code Change
#97168 opened Jul 18, 2025
Automated Code Change
#97169 opened Jul 18, 2025
Automated Code Change
#97171 opened Jul 18, 2025
Remove LLVM dependency from KernelThunk
#97178 opened Jul 18, 2025
Automated Code Change
#97179 opened Jul 18, 2025
Remove HLO and Autotuner dependency from CublasLtMatmulThunk
#97180 opened Jul 18, 2025
Use `mdformat` on the XNNPack delegate readme.
#97182 opened Jul 18, 2025
Removed optimized batch_matmul to redirect to XNNPACK. Also performed tiny refactoring along the way.
#97184 opened Jul 18, 2025
[XLA] Use sort instead of btree in MakeFreeChunks.
#97185 opened Jul 18, 2025
pass in scheduling group id when adding some new ops from ops which have id.
#97186 opened Jul 18, 2025
Automated Code Change
#97188 opened Jul 18, 2025
Introduces a new utility function, `MatchPermutedSliceAndPartitionOffset`, to detect a pattern where a `DynamicSlice` consumes the output of an `AllGather` with a permuted set of offsets. This pattern is equivalent to a `CollectivePermute` and can be optimized accordingly.
#97189 opened Jul 18, 2025
LatencyHidingScheduler: Only recalculate when we've touched an already-scheduled computation, and use computation-specific peak rather than module peak in statistics.
#97192 opened Jul 18, 2025
Remove AbstractCpuBuffer. All subclasses can be replaced with CommonPjRtBufferImpl and removed.
#97194 opened Jul 18, 2025
Internal change only.
#97195 opened Jul 18, 2025
Remove unused aliases and dependencies.
#97196 opened Jul 18, 2025
IFRT proxy logging fix: Do not log error when Executable is destroyed before its metadata is queried by the server (and sent over to the client).
#97197 opened Jul 18, 2025
Handle negative permutations in IsTransposeTrivial.
#97199 opened Jul 18, 2025
Add SparseCore documentation
#97200 opened Jul 18, 2025
Reverts 849435a30d0487e415126507953575358ed3c4eb
#97202 opened Jul 18, 2025
Optimize BatchMatmul to Fully Connected when RHS is reshaped after dequantization.
#97205 opened Jul 18, 2025
Give better error in run_hlo_module if HLO has collectives.
#97208 opened Jul 19, 2025
Automated Code Change
#97209 opened Jul 19, 2025
Change int64 to int64_t in Pow operations for type consistency
#97210 opened Jul 19, 2025
Determine collective support based on #partitions
#97211 opened Jul 19, 2025
Automated Code Change
#97212 opened Jul 19, 2025
Automated Code Change
#97213 opened Jul 19, 2025
Automated Code Change
#97214 opened Jul 19, 2025
Automated Code Change
#97215 opened Jul 19, 2025
Automated Code Change
#97216 opened Jul 19, 2025
Automated Code Change
#97217 opened Jul 19, 2025
Automated Code Change
#97218 opened Jul 19, 2025
Automated Code Change
#97219 opened Jul 19, 2025
Automated Code Change
#97220 opened Jul 19, 2025
Automated Code Change
#97221 opened Jul 19, 2025
Automated Code Change
#97222 opened Jul 19, 2025
[xla:gpu][triton] In squeeze-dims pass, keep at least two dimensions.
#97223 opened Jul 19, 2025
Automated Code Change
#97224 opened Jul 19, 2025
Automated Code Change
#97225 opened Jul 19, 2025
Automated Code Change
#97226 opened Jul 19, 2025
Automated Code Change
#97228 opened Jul 19, 2025
[XLA:GPU][Tiling] Use SmallVector<OneDimTile> to store tiling info.
#97229 opened Jul 19, 2025
Avoid heap allocation for the sub buffer address
#97230 opened Jul 19, 2025
Add a scratch implemention of muon
#97231 opened Jul 19, 2025
Utils to add sdy shardings in frontend_attributes alongside hlo shardings for extra wrapper main added in tf2xla bridge.
#97232 opened Jul 19, 2025

5 Issues closed by 3 people

Inconsistent NotEqual broadcasting behavior between CPU and GPU (CPU fails silently, GPU raises error)
#97227 closed Jul 19, 2025
graph execution error bug with tfm.nlp.layers.MultiHeadRelativeAttention
#94599 closed Jul 18, 2025
tensorflow crashes when run with python -OO
#96900 closed Jul 15, 2025
Unable to install TensorFlow: No matching distribution found for TensorFlow!
#79349 closed Jul 13, 2025
Broken Link: Microsoft Visual C++ Redistributable (pip install)
#93826 closed Jul 12, 2025

11 Issues opened by 5 people

Inconsistent behavior for `tf.raw_ops.NotEqual` between CPU and GPU with non-broadcastable shapes
#97204 opened Jul 18, 2025
could you add support of the new optimizer: Muon
#97187 opened Jul 18, 2025
`tf.nn.depthwise_conv2d` crashes with large `strides` values when ONEDNN is enabled
#97165 opened Jul 18, 2025
`tf.pow` returns inconsistent value on CPU vs GPU
#97125 opened Jul 17, 2025
`tf.nn.local_response_normalization` returns incorrect output
#97105 opened Jul 17, 2025
`tf.linalg.matrix_rank` produces inconsistent output on CPU vs GPU with `tol=6`
#97102 opened Jul 17, 2025
`tf.math.argmax` throws `InvalidArgumentError` with valid `axis` of `int16` dtype
#97096 opened Jul 17, 2025
Tensorflow 2.19 fails to load after Pyside6
#97058 opened Jul 16, 2025
`tf.experimental.numpy.cumsum` handles overflow inconsistently on CPU and GPU
#97042 opened Jul 16, 2025
Core Dump When Training
#97016 opened Jul 16, 2025
Will TF supprot triton at future
#96876 opened Jul 13, 2025

53 Unresolved conversations

Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.

Stable delegate python api
#93850 commented on Jul 16, 2025 • 1 new comment
Update Docs to latest release
#89084 commented on Jul 12, 2025 • 0 new comments
`tf.nn.conv2d_transpose` crashes with "Illegal instruction (core dumped)"
#93733 commented on Jul 18, 2025 • 0 new comments
[Compatibility][Upgrade] TensorFlow 2.x to 2.15.0: Dependency Conflict and Version Downgrade Issue
#96694 commented on Jul 19, 2025 • 0 new comments
Build Error While Compiling TensorFlow Lite Using CMake
#96654 commented on Jul 19, 2025 • 0 new comments
Memory leak in tf.data when iterating over Dataset.from_generator
#65675 commented on Jul 19, 2025 • 0 new comments
tfLite. Select fastest available GPU
#88039 commented on Jul 14, 2025 • 0 new comments
[XLA] Add stack trace breakdown to `HloLiveRange::ToString` for peak memory usage
#94954 commented on Jul 18, 2025 • 0 new comments
io_utils: prevent `input()` crash in non-interactive mode
#95525 commented on Jul 16, 2025 • 0 new comments
Update Protobuf to 6.31.1
#95873 commented on Jul 14, 2025 • 0 new comments
Add metadata for CUDA and libtpu versions
#95903 commented on Jul 15, 2025 • 0 new comments
Remove LiteRT modules from TF python deps.
#95991 commented on Jul 18, 2025 • 0 new comments
Add a Reflection Map to `emitc` class
#96263 commented on Jul 18, 2025 • 0 new comments
[XLA] Remove dead argument in ProtoToHumanReadableJson
#96582 commented on Jul 14, 2025 • 0 new comments
PR #19067: [XLA:CPU][oneDNN] Move simplification pass before oneDNN pass
#96617 commented on Jul 19, 2025 • 0 new comments
#sdy Remove MHLO shardings from round-trip export
#96640 commented on Jul 17, 2025 • 0 new comments
Use RPG's solution as a hint to CP-SAT
#96674 commented on Jul 17, 2025 • 0 new comments
[XLA:benchmarks] Test Nvidia benchmarks from https://github.com/openxla/xla/pull/28728
#96678 commented on Jul 15, 2025 • 0 new comments
[XLA] Add stack trace breakdown to `HloLiveRange::ToString` for peak memory usage
#96754 commented on Jul 13, 2025 • 0 new comments
Add an option to do multiple executions of the same module to HloRunners.
#96807 commented on Jul 15, 2025 • 0 new comments
Add Hermetic C++ Toolchains for Linux x86_64 builds.
#96820 commented on Jul 17, 2025 • 0 new comments
Automated Code Change
#96833 commented on Jul 14, 2025 • 0 new comments
Automated Code Change
#96838 commented on Jul 14, 2025 • 0 new comments
Automated Code Change
#96839 commented on Jul 14, 2025 • 0 new comments
Automated Code Change
#96841 commented on Jul 14, 2025 • 0 new comments
Automated Code Change
#96842 commented on Jul 14, 2025 • 0 new comments
Automated Code Change
#96844 commented on Jul 14, 2025 • 0 new comments
GPU Not Detected by TensorFlow Despite Proper System Setup
#96707 commented on Jul 14, 2025 • 0 new comments
Multiple segmentation faults and aborted in some modules
#96209 commented on Jul 14, 2025 • 0 new comments
Some sorting related ops produce results inconsistent with NumPy when tensor contains NaN
#95235 commented on Jul 14, 2025 • 0 new comments
tf.data.experimental.prefetch_to_device has no effect inside tf.distribute.Strategy.distribute_datasets_from_function.
#94735 commented on Jul 14, 2025 • 0 new comments
TensorFlow Docker `tensorflow/tensorflow:latest-gpu` fails to detect GPU due to CUDA/cuDNN mismatch
#94593 commented on Jul 14, 2025 • 0 new comments
TensorFlow disables SwiftUI Previews
#95106 commented on Jul 14, 2025 • 0 new comments
xprof compilation fails with gcc 14.2
#94035 commented on Jul 14, 2025 • 0 new comments
Psycopg crashes using OpenSSL if tensorflow is imported beforehand
#93969 commented on Jul 14, 2025 • 0 new comments
lib new version not support 16kb pages in android
#96602 commented on Jul 14, 2025 • 0 new comments
how to use libtensorflowlite_c.so C API and delegate gpu opencl correctly?
#95795 commented on Jul 15, 2025 • 0 new comments
TensorFlow Java documentation is outdated
#96799 commented on Jul 15, 2025 • 0 new comments
Dataset.ragged_batch does not produce correct specs with tf.py_function and tf.numpy_function
#60710 commented on Jul 15, 2025 • 0 new comments
`tf.split` or `tf.transpose` cause errors for quantize-aware training with `quantize_apply`
#60714 commented on Jul 15, 2025 • 0 new comments
Incorrect gradient in divide_no_nan and reciprocal_no_nan when divide by 0
#60715 commented on Jul 15, 2025 • 0 new comments
failed to build branch r2.13
#60716 commented on Jul 15, 2025 • 0 new comments
Remove or update zh-cn translation from installation instructions
#62245 commented on Jul 15, 2025 • 0 new comments
Convolution: CPU memory increase with growing number of different sequence lengths
#62441 commented on Jul 15, 2025 • 0 new comments
tf.strings.to_number cannot convert positive integers prefixed with "+" when out_type is tf.int32 or tf.int64
#62191 commented on Jul 15, 2025 • 0 new comments
Arch Linux x86 RTX 3050 source compilation Error in fail: tensorflow/core:stream_executor_headers_lib cannot depend on tensorflow/core:lib
#96445 commented on Jul 15, 2025 • 0 new comments
Depth anything V2 Tflite outputs constants on qualcomm gpus
#93476 commented on Jul 15, 2025 • 0 new comments
how to get profile of per operation that delegate gpu opencl like cpu enable_op_profiling result, rather than only TfLiteGpuDelegateV2 ?
#96239 commented on Jul 16, 2025 • 0 new comments
crash when inference use libtensorflowlite_c.so and config threadnum >1 backend cpu
#96347 commented on Jul 16, 2025 • 0 new comments
Prebuilt binaries do not work with CPUs that do not have AVX instruction sets.
#19584 commented on Jul 16, 2025 • 0 new comments
TensorFlow DLL failed to load with newer version of TF
#91656 commented on Jul 17, 2025 • 0 new comments
YoloX different Model Output for Python and Android
#95489 commented on Jul 17, 2025 • 0 new comments
Fail to build libtensorflow_framework.so.2.20.0
#96569 commented on Jul 18, 2025 • 0 new comments

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Alternative Proxies:

Alternative Proxy