Add LLaDA 8b Diffusion model #14771

am17an · 2025-07-19T09:50:27Z

Continuing on #14644, this PR adds another diffusion model https://huggingface.co/GSAI-ML/LLaDA-8B-Instruct, which has different semantics compared to the dream-7b model, and overall seems to have better performance

There are very few similarities between how they seem to generate tokens, so for now I've just created two different examples llama-diffusion-dream-cli (for the earlier version) and llama-diffusion-llada-cli, for running the new LLaDA model. Added a README as well

I've uploaded a GGUF.

Example command
./build/bin/llama-diffusion-llada-cli -m llada-8b.gguf -p "Lily can run 12 kilometers per hour for 4 hours. After that, she runs 6 kilometers per hour. How many kilometers can she run in 8 hours?" --diffusion_steps 128 -ngl 99 --temp 0 -ub 128 --diffusion-visual

Also I would like this to the server, but I'm not sure what API would be acceptable so I'm hoping to have a discussion on that as well

convert_hf_to_gguf.py

common/arg.cpp

examples/diffusion/README.md

convert_hf_to_gguf.py

ggerganov · 2025-07-21T05:05:23Z

I would like to avoid adding a second diffusion example - we are increasing the maintenance efforts for not significant benefit. The diffusion architecture is not yet well established.

We can think about extending the llama_sampler functionality to support these use cases and since it is already modular it would make more sense to implement the sampling logic there. Ideally the diffusion CLI example would be just one for all diffusion models, with different samplers attached.

am17an · 2025-07-21T05:37:17Z

I would like to avoid adding a second diffusion example - we are increasing the maintenance efforts for not significant benefit. The diffusion architecture is not yet well established.

We can think about extending the llama_sampler functionality to support these use cases and since it is already modular it would make more sense to implement the sampling logic there. Ideally the diffusion CLI example would be just one for all diffusion models, with different samplers attached.

Yeah agree, I initially wrote them to be one example. However, passing arguments via CLI for two separate sets of sampling parameters/algorithms was quite confusing to me and would be even more so for the end-user, so for the sake of clarity I wrote them separately.
diffusion_generate_dream and diffusion_generate_llada are two different functions with the same outline, decode => sample => unmask, so there is an abstraction to be made, the only thing is to clarify is how we pass separate sets of parameters to the example without overloading the same thing (e.g. --diffusion-algorithm being supported in dream but not llada and vice versa), llama_sampler be used also, but I don't see how it would solve this particular problem

am17an · 2025-07-23T03:01:37Z

@ggerganov would having them in the same example and having extra CLI args for models be acceptable?

ggerganov · 2025-07-25T11:50:08Z

Yes, merging the examples into a single example would be better.

llama: fix llama-model fixup working

am17an · 2025-07-26T07:14:05Z

Yes, merging the examples into a single example would be better.

Made everything into a single example, please have another look when you have the time

ggerganov

I think the example can be improved by not branching between "llada" and "dream" and instead have a common logic for any diffusion logic. This would make it much easier to scale with more diffusion models in the future. Otherwise, the way you've implemented it now, you have to add new structs, sampling types, generation functions, etc. for each new architecture and this seems a bit unnecessary.

ggerganov · 2025-07-26T14:52:44Z

common/arg.cpp

+    ).set_examples({ LLAMA_EXAMPLE_DIFFUSION }));
+
+    add_opt(common_arg(
+        { "--diffusion--dream-eps" }, "F",


Suggested change

{ "--diffusion--dream-eps" }, "F",

{ "--diffusion-dream-eps" }, "F",

ggerganov · 2025-07-26T14:53:59Z

common/arg.cpp

+    add_opt(common_arg(
+        { "--diffusion-llada-algorithm" }, "N",
+        string_format("llada remasking algorithm: 0=LOW_CONFIDENCE, 1=RANDOM (default: %d)", params.diffusion.remasking),
+        [](common_params & params, int value) { params.diffusion.remasking = value; }


The argument names should not be associated with the models. This should be simply --diffusion-algorithm.

ggerganov · 2025-07-26T14:59:36Z