[Graph Partition] use pinned memory and foreach when moving cpu scalar tensor to gpu #155360
Labels
module: inductor
oncall: pt2
triaged
This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Uh oh!
There was an error while loading. Please reload this page.
Graph partition automatically moves cpu scalar tensors to gpu when possible (#154464). It's better to use pin memory and copy with non_blocking. This depends on #155121. More context in this issue.
cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @chenyang78 @kadeng @muchulee8 @amjames @chauhang @aakhundov
The text was updated successfully, but these errors were encountered: