Skip to content

musa: upgrade musa sdk to 4.2.0 #14498

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

yeahdongcn
Copy link
Collaborator

@yeahdongcn yeahdongcn commented Jul 2, 2025

Make sure to read the contributing guidelines before submitting a PR

This PR upgrades the MUSA SDK from rc4.0.1 to 4.2.0.

Key updates

  1. MUSA docker tags bump to 4.2.0 and switch back to non-muDNN variant
  2. New CMake options (default OFF): GGML_MUSA_GRAPHS for enabling MUSA graphs and GGML_MUSA_MUDNN_COPY for enabling muDNN copy acceleration
  3. cuBLAS API alignment

Why disable muDNN by default

Although enabling muDNN can accelerate contiguous device memory copies, MUSA SDK 4.2.0 no longer includes the fat binary and instead provides per-architecture binaries. Additionally, the performance of musaMemcpyAsync has been improved in this release, making the performance gap minimal.

Testing Done

docker build -t local/llama.cpp:full-musa --target full -f .devops/musa.Dockerfile .
docker build -t local/llama.cpp:light-musa --target light -f .devops/musa.Dockerfile .
docker build -t local/llama.cpp:server-musa --target server -f .devops/musa.Dockerfile .
root@7bd6a1e5dcc0:/ws# cmake -B build -DGGML_MUSA=ON -DMUSA_ARCHITECTURES=21 -DGGML_MUSA_GRAPHS=ON
root@7bd6a1e5dcc0:/ws# cmake --build build -j $(nproc) --config Release
root@7bd6a1e5dcc0:/ws# cmake -B build -DGGML_MUSA=ON -DMUSA_ARCHITECTURES=21 -DGGML_MUSA_MUDNN_COPY=ON
root@7bd6a1e5dcc0:/ws# cmake --build build -j $(nproc) --config Release

@github-actions github-actions bot added Nvidia GPU Issues specific to Nvidia GPUs ggml changes relating to the ggml tensor library for machine learning labels Jul 2, 2025
@yeahdongcn yeahdongcn force-pushed the xd/musa_sdk_upgrade branch from 39146cc to 314e40d Compare July 15, 2025 00:28
@github-actions github-actions bot added documentation Improvements or additions to documentation devops improvements to build systems and github actions labels Jul 15, 2025
@yeahdongcn yeahdongcn changed the title MUSA: upgrade musa sdk to <<TBD>> MUSA: upgrade musa sdk to 4.2.0 Jul 15, 2025
@yeahdongcn yeahdongcn force-pushed the xd/musa_sdk_upgrade branch from 314e40d to 1bf073d Compare July 15, 2025 03:42
@yeahdongcn yeahdongcn changed the title MUSA: upgrade musa sdk to 4.2.0 musa: upgrade musa sdk to 4.2.0 Jul 15, 2025
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
@yeahdongcn yeahdongcn force-pushed the xd/musa_sdk_upgrade branch from 1bf073d to 9e7ccf3 Compare July 17, 2025 02:57
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
devops improvements to build systems and github actions documentation Improvements or additions to documentation ggml changes relating to the ggml tensor library for machine learning Nvidia GPU Issues specific to Nvidia GPUs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy