[RFC] Offload collectives to NVSwitch when possible

### 🚀 The feature, motivation and pitch

NVLink SHARP is an engine in NVSwitch that can perform collectives (e.g. all-reduce).

This feature reduces GPU SM consumption by as much as 6x (24 to 4), while boosting performance by 2x (its mechanism is like one-shot all-reduce, hence the 2x theoretical speedup). 

To leverage this feature, please see this doc:
https://docs.nvidia.com/deeplearning/nccl/user-guide/docs/usage/bufferreg.html#nvlink-sharp-buffer-registration

It requires the input / output buffers be allocated through a NCCL API -- `ncclMemAlloc`. The mem alloc is now enabled by a stack of PRs allowing CUDACachingAllocator to use different mem alloc backends. See:
original RFC: https://github.com/pytorch/pytorch/issues/124807 and 
PR impl: https://github.com/pytorch/pytorch/pull/133603.

## Target use
A first target of the feature can be DDP (in cases where we manage the gradient bucket internally). 

Second target would be TP. (For example, "async-tp" -- but we'd need to know whether "async-tp" does all-reduce or not). Otherwise, if "general" TP is in Inductor's hand, we can ask Inductor to allocate specific memory for the result of matmul. 

Cc: @syed-ahmed 

### Alternatives

_No response_

### Additional context

_No response_

cc @XilunWu @H-Huang @awgu @wanchaol @fegin @fduwjj @wz337 @wconstab @d4l3k @c-p-i-o @ptrblck @msaroufim

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[RFC] Offload collectives to NVSwitch when possible #136567

🚀 The feature, motivation and pitch

Target use

Alternatives

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

[RFC] Offload collectives to NVSwitch when possible #136567

Description

🚀 The feature, motivation and pitch

Target use

Alternatives

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.