-
Notifications
You must be signed in to change notification settings - Fork 55
Insights: NVIDIA/Fuser
Overview
Could not load contribution data
Please try again later
24 Pull requests merged by 8 people
-
Instructions for using cpuprofile
#4089 merged
Mar 18, 2025 -
Update SDPA flash attention API
#4065 merged
Mar 17, 2025 -
Mark supports_segmentation=False in test_issue1273
#4088 merged
Mar 17, 2025 -
Move debugging instructions from Wiki to main
#4084 merged
Mar 17, 2025 -
Mark is_clonable=False to work around a memory corruption
#4085 merged
Mar 17, 2025 -
Rename TorchGather to Gather
#4082 merged
Mar 15, 2025 -
Another test cleanup
#4081 merged
Mar 15, 2025 -
Safe resize vectorization
#3906 merged
Mar 15, 2025 -
Cleaning up test_gpu_indexing_ops.cpp
#4080 merged
Mar 14, 2025 -
Run a DeepSeek-V3 transformer block from Hugging Face
#4009 merged
Mar 14, 2025 -
add register count checks for warp specialization with register sharing
#4061 merged
Mar 14, 2025 -
Add warning about
kir::IfThenElse
#4073 merged
Mar 14, 2025 -
Make Hopper mma tests sparse
#4060 merged
Mar 13, 2025 -
Add missing block sync for TMem
#4026 merged
Mar 13, 2025 -
Fix C++23 backport of
zip
andenumerate
#4068 merged
Mar 13, 2025 -
Enable
TMABankConflictFreeTranspose
#4027 merged
Mar 13, 2025 -
[CudaIpc 3/3]: p2p get-Zcopy
#3911 merged
Mar 12, 2025 -
Expose
MultiDeviceExecutor
and overlapped AG+matmul to python API#3923 merged
Mar 12, 2025 -
Globally use
TensorIndexer
if TMem is used#4030 merged
Mar 12, 2025 -
Translate MatmulOp and LinearOp on Hopper without AxisMapping
#3986 merged
Mar 12, 2025 -
register sharing, add launch bound, disable tests with illegal paras
#4059 merged
Mar 12, 2025 -
Indexing for TMem ld and st
#4017 merged
Mar 11, 2025 -
Tensor memory 32x32b data path pattern matching
#4015 merged
Mar 11, 2025 -
Enable hard-coded index for LdMatrix and create basic copy tutorial
#4039 merged
Mar 11, 2025
18 Pull requests opened by 7 people
-
Tensor-parallelize the DeepSeek V3 transformer layer
#4062 opened
Mar 12, 2025 -
Adding IndexPutAccumulateOp
#4063 opened
Mar 12, 2025 -
Add a backprop test
#4064 opened
Mar 12, 2025 -
indexAccumulate python api
#4066 opened
Mar 12, 2025 -
Print local sizes and strides of each nn.Linear
#4067 opened
Mar 12, 2025 -
Load Epilogue Inputs with LdMatrix in Hopper Matmul Scheduler
#4069 opened
Mar 13, 2025 -
TMem check the stride of outer dims
#4070 opened
Mar 13, 2025 -
Align smem buffer for TMA store at 128B
#4071 opened
Mar 13, 2025 -
[WIP] Translate MmaOp patterns properly on Hopper
#4072 opened
Mar 13, 2025 -
fixing max_vect_factor initialization
#4076 opened
Mar 14, 2025 -
[RFC] Create a basic binding for CPP Fusion in python frontend using Gemini
#4077 opened
Mar 14, 2025 -
[WIP] Recompute non-cacheable unmappable tensors
#4078 opened
Mar 14, 2025 -
Add Blackwell MMA macros
#4079 opened
Mar 14, 2025 -
WAR for false ComputeAtLogicalDomainMap results
#4083 opened
Mar 15, 2025 -
Issue 4086
#4087 opened
Mar 17, 2025 -
cache unmappable buffer in smem
#4090 opened
Mar 17, 2025 -
Update asan instructions to work with .so
#4092 opened
Mar 18, 2025 -
Revert "add register count checks for warp specialization with register sharing"
#4093 opened
Mar 18, 2025
1 Issue closed by 1 person
-
Vectorization analysis returns wrong Vectorization Factor
#3640 closed
Mar 15, 2025
5 Issues opened by 4 people
-
INTERNAL ASSERT FAILED: Inconsistent parallelization found between `TV59` and `TV2`
#4091 opened
Mar 18, 2025 -
TestNvFuserFrontend::test_broadcast_mixing corrupts memory.
#4086 opened
Mar 17, 2025 -
max_persistent_buffer_size may be smaller than total_reduction_numel
#4075 opened
Mar 13, 2025 -
Persistent buffer with broadcast results in inconsistent parallelization
#4074 opened
Mar 13, 2025 -
Refactor `IndexLowering::handle(const LoadStoreOp* ldst)`
#4058 opened
Mar 11, 2025
25 Unresolved conversations
Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.
-
supporting vectorized load on IndexSelectOp
#4048 commented on
Mar 17, 2025 • 11 new comments -
Add benchmarks for cross entropy loss
#3924 commented on
Mar 18, 2025 • 7 new comments -
Check that warps are only accessing the subpartition of TMem that it can access
#4016 commented on
Mar 14, 2025 • 6 new comments -
Concretization replays loop transforms
#3950 commented on
Mar 14, 2025 • 6 new comments -
project persistent buffer to it producer if the buffer is the output of an upcast
#4051 commented on
Mar 14, 2025 • 5 new comments -
DID loop split for reshape without pre-sharding reshape propagation
#3953 commented on
Mar 17, 2025 • 5 new comments -
Reimplement isResharding using index calculation
#3482 commented on
Mar 18, 2025 • 3 new comments -
s/reshape/set when no transforms are applied
#4056 commented on
Mar 11, 2025 • 2 new comments -
Simplify selfAllocationReplay
#4057 commented on
Mar 13, 2025 • 0 new comments -
getOutputShardings checks all TVs to decide single-GPU vs multi-GPU
#4046 commented on
Mar 12, 2025 • 0 new comments -
[WIP] simple L2 model for setting grid swizzle and cta order
#4044 commented on
Mar 12, 2025 • 0 new comments -
Don't revert upcast for persistent schedulers to avoid increasing persistent buffer sizes
#4040 commented on
Mar 12, 2025 • 0 new comments -
revise smem usage estimation
#4033 commented on
Mar 11, 2025 • 0 new comments -
WIP
#4006 commented on
Mar 16, 2025 • 0 new comments -
Remove MmaOp::AxisMapping
#3995 commented on
Mar 12, 2025 • 0 new comments -
redo register sharing PR-3972
#3993 commented on
Mar 17, 2025 • 0 new comments -
Expose backend type to python
#3928 commented on
Mar 12, 2025 • 0 new comments -
[CudaIpc 2/3]: Ipc handle exchange
#3910 commented on
Mar 12, 2025 • 0 new comments -
Enable resize scheduler by default
#3848 commented on
Mar 17, 2025 • 0 new comments -
[WIP] update propagateSharding preseg pass for DID loop split
#3838 commented on
Mar 14, 2025 • 0 new comments -
[WIP] IdModel-based indexing
#2238 commented on
Mar 14, 2025 • 0 new comments -
test_python_frontend.py::test_issue1273 corrupts memory
#3856 commented on
Mar 17, 2025 • 0 new comments -
RuntimeError from normalization_utils.cpp, could not resolve persistent buffer
#4020 commented on
Mar 15, 2025 • 0 new comments -
Allow allocation to be a split and different from loop.
#3479 commented on
Mar 13, 2025 • 0 new comments -
internal assert failure from sync_information.cpp
#4052 commented on
Mar 13, 2025 • 0 new comments