July 5, 2025 – July 12, 2025

Overview

69 Active pull requests

60 Active issues

43 Releases published by 1 person

b5834
published Jul 6, 2025
b5835
published Jul 6, 2025
b5836
published Jul 7, 2025
b5837
published Jul 7, 2025
b5838
published Jul 7, 2025
b5839
published Jul 8, 2025
b5840
published Jul 8, 2025
b5841
published Jul 8, 2025
b5843
published Jul 8, 2025
b5844
published Jul 8, 2025
b5845
published Jul 8, 2025
b5846
published Jul 8, 2025
b5847
published Jul 8, 2025
b5848
published Jul 8, 2025
b5849
published Jul 8, 2025
b5851
published Jul 9, 2025
b5852
published Jul 9, 2025
b5853
published Jul 9, 2025
b5854
published Jul 9, 2025
b5855
published Jul 9, 2025
b5856
published Jul 9, 2025
b5857
published Jul 9, 2025
b5858
published Jul 10, 2025
b5859
published Jul 10, 2025
b5860
published Jul 10, 2025
b5861
published Jul 10, 2025
b5862
published Jul 10, 2025
b5863
published Jul 10, 2025
b5864
published Jul 10, 2025
b5865
published Jul 10, 2025
b5866
published Jul 10, 2025
b5867
published Jul 10, 2025
b5868
published Jul 11, 2025
b5869
published Jul 11, 2025
b5870
published Jul 11, 2025
b5872
published Jul 11, 2025
b5873
published Jul 11, 2025
b5874
published Jul 12, 2025
b5875
published Jul 12, 2025
b5876
published Jul 12, 2025
b5880
published Jul 12, 2025
b5882
published Jul 12, 2025
b5884
published Jul 12, 2025

49 Pull requests merged by 23 people

test-backend-ops : cover lfm2 cases in test_ssm_conv
#14651 merged Jul 12, 2025
readme : add LFM2 to models section
#14650 merged Jul 12, 2025
CUDA: add set rows for f32 and f16
#14551 merged Jul 12, 2025
sync : ggml
#14648 merged Jul 12, 2025
sync : ggml
#14647 merged Jul 12, 2025
server : fix pooled embedding output
#14645 merged Jul 12, 2025
vulkan: support SET_ROWS
#14587 merged Jul 12, 2025
vulkan: optimizations for deepseek prompt processing
#14555 merged Jul 12, 2025
llama : support LiquidAI LFM2 hybrid model family
#14620 merged Jul 11, 2025
HIP: Add HIP 7.0+ compatibility for hipBLAS compute types
#14634 merged Jul 11, 2025
readme : add hot PRs
#14636 merged Jul 11, 2025
CUDA: 4D FlashAttention support
#14628 merged Jul 11, 2025
llama : move enum llama_vocab_pre_type to implementation
#14631 merged Jul 11, 2025
model : add Midm-2.0 model
#14626 merged Jul 11, 2025
Granite Four
#13550 merged Jul 11, 2025
OpenCL: add tiled mul_mat_f16_f32
#14535 merged Jul 10, 2025
opencl: add set_rows for f16 and f32
#14547 merged Jul 10, 2025
Smoldocling support
#14597 merged Jul 10, 2025
Docs: script to auto-generate ggml operations docs
#14598 merged Jul 10, 2025
cmake : do not search for curl libraries by ourselves
#14613 merged Jul 10, 2025
SYCL: Initial set_rows kernel implementation
#14562 merged Jul 10, 2025
llama : minor coding style fix for smollm3
#14605 merged Jul 10, 2025
cmake : bump llguidance version to v1.0.1
#14609 merged Jul 10, 2025
cmake : llguidance build parser library only
#14608 merged Jul 10, 2025
cuda : support Falcon-H1 state size for SSM_SCAN
#14602 merged Jul 10, 2025
llama : remove llm_graph_input_one
#14603 merged Jul 9, 2025
llama : support Jamba hybrid Transformer-Mamba models
#7531 merged Jul 9, 2025
ggml : add ggml_scale_bias
#14417 merged Jul 9, 2025
ggml : prevent integer overflow in tensor size calculation
#14595 merged Jul 9, 2025
model : add skt/A.X-4.0 model vocabulary
#14589 merged Jul 9, 2025
llama : remove unintended whitespace
#14592 merged Jul 9, 2025
llama: add initial support for Falcon-H1 model family
#14534 merged Jul 9, 2025
convert : fix smollm3 jinja template
#14586 merged Jul 9, 2025
vulkan: optimize flash attention split_k_reduce
#14554 merged Jul 8, 2025
model : fix hunyuan moe chat template
#14584 merged Jul 8, 2025
model : add SmolLM3
#14581 merged Jul 8, 2025
memory : fix broken batch splits for recurrent cache
#14575 merged Jul 8, 2025
vulkan: fix rope with partial rotation and non-cont src
#14582 merged Jul 8, 2025
server: Add ability to mount server at prefix
#14544 merged Jul 8, 2025
model : add hunyuan moe
#14425 merged Jul 8, 2025
vulkan: increase timeout for CI
#14574 merged Jul 8, 2025
cuda : fix rope with partial rotation and non-cont src
#14580 merged Jul 8, 2025
CUDA: add bilinear interpolation for upscale
#14563 merged Jul 8, 2025
musa: fix build warnings (unused variable)
#14561 merged Jul 7, 2025
llama : fix incorrect minicpm3 v_states shape
#14571 merged Jul 7, 2025
llama : remove ggml_cont where possible
#14568 merged Jul 7, 2025
CUDA: add bf16 and i32 to getrows
#14529 merged Jul 7, 2025
vulkan: unpack more values at a time for iquants mat mul
#14485 merged Jul 6, 2025
vulkan: fix rms_norm+mul fusion
#14545 merged Jul 6, 2025

20 Pull requests opened by 16 people

model : add PLaMo-2 model
#14560 opened Jul 7, 2025
metal : reuse graphs
#14570 opened Jul 7, 2025
quantize: fix minor logic flaw in --tensor-type
#14572 opened Jul 7, 2025
docker : add cann build pipline
#14591 opened Jul 9, 2025
metal : fuse add
#14596 opened Jul 9, 2025
kv-cache : opt mask set input
#14600 opened Jul 9, 2025
sycl: Batched mulmat rework for oneDNN dispatch
#14617 opened Jul 10, 2025
SYCL: use 1D kernel for set_rows
#14618 opened Jul 10, 2025
webui: Change Download function to download the full text of the conversation
#14619 opened Jul 10, 2025
tool: add convertation of text/parquet to custom format
#14622 opened Jul 10, 2025
HIP: Enable Matrix cores for MMQ Kernels, Enable stream-K for CDNA 3
#14624 opened Jul 10, 2025
graph : refactor context to not pass gf explicitly
#14629 opened Jul 11, 2025
Add EXAONE 4.0 model architecture
#14630 opened Jul 11, 2025
OpenCL: add `mul_mat_f16_f32_image` kernel
#14635 opened Jul 11, 2025
common: add config presets for falcon
#14638 opened Jul 11, 2025
Add CUDA non-contiguous Unary Ops support
#14639 opened Jul 11, 2025
Support diffusion models: Add Dream 7B
#14644 opened Jul 12, 2025
webui : add a preset feature to the settings
#14649 opened Jul 12, 2025
vulkan: add RTE variants for glu/add/sub/mul/div
#14653 opened Jul 12, 2025
Model : Add support for Kimi-K2
#14654 opened Jul 12, 2025

40 Issues closed by 11 people

Misc. bug: Embedding/pooling: I receive 10xvector not 1xvector
#14543 closed Jul 12, 2025
Eval bug: llama-mtmd-cli doesn't support system prompts
#13454 closed Jul 12, 2025
Feature Request: video support in mtmd-cli / server
#13754 closed Jul 12, 2025
Feature Request: Set default of --numa to distribute
#13850 closed Jul 12, 2025
Eval bug: Embeddings Always returned as non
#13854 closed Jul 12, 2025
Feature Request: Optimize for Nvidia Jetson Series' truly Unified Memory Architecture
#13856 closed Jul 12, 2025
Eval bug: Command-A generates a single repeating token when using split mode row on P40
#14228 closed Jul 11, 2025
Misc. bug: Quantize error
#14621 closed Jul 11, 2025
Feature Request: (webui) add import / export function for ALL conversations
#11718 closed Jul 11, 2025
Feature Reequest: Multi model cli tools: Add a possibility to specify a image in conversation mode plus tab auto completion for path
#12983 closed Jul 11, 2025
Feature Request: add per-request "reasoning" options in llama-server
#13272 closed Jul 11, 2025
webui: First user prompt sometimes disappears after sending
#13622 closed Jul 11, 2025
Misc. bug: llama-cli.exe stopped working on Windows Server 10
#13767 closed Jul 11, 2025
Eval bug: seed seems to be locked to a single value 4294967295
#13823 closed Jul 11, 2025
Misc. bug: CANNOT CONVERT THE MODEL
#14610 closed Jul 11, 2025
Feature Request: Add support of convert.py for model Qwen2.5-Omni-7B
#12641 closed Jul 10, 2025
Compile bug: Vulkan shaders build fails due to missing vulkan-shaders directory during ExternalProject\_Add configure step
#13753 closed Jul 10, 2025
Eval bug: Output garbled on DeepSeek-R1-Distill-Qwen-7B-Q4_K_M.gguf from unsloth using musa backend with VMM off
#13788 closed Jul 10, 2025
Compile bug: Vulkan Build Fails in Termux/Proot Due to Missing Cooperative Matrix Shader Variables
#13801 closed Jul 10, 2025
ERROR:hf-to-gguf:Model Qwen2_5_VLModel is not supported
#13802 closed Jul 10, 2025
ERROR:hf-to-gguf:Model MllamaForConditionalGeneration is not supported
#13805 closed Jul 10, 2025
Eval bug: Mistral-Small not working on vulkan
#14550 closed Jul 9, 2025
Suport for Jamba JambaForCausalLM
#6372 closed Jul 9, 2025
Feature Request: Falcon-H1
#13681 closed Jul 9, 2025
something with llama_server? slow vs llama_cli
#13560 closed Jul 9, 2025
Feature Request: Hunyuan-A13B model support
#14415 closed Jul 8, 2025
CI: fix ubuntu-22-cmake-vulkan
#14569 closed Jul 8, 2025
open source dataset for low bit quantization?
#13736 closed Jul 8, 2025
Feature Request: Add keep_alive function for llama-server
#13748 closed Jul 8, 2025
Misc. bug: segfault in test-gbnf-validator
#13762 closed Jul 8, 2025
Eval bug: error loading model: vk::PhysicalDevice::createDevice: ErrorExtensionNotPresent
#14559 closed Jul 7, 2025
Compile bug: CMAKE_CUDA_COMPILER-NOTFOUND
#14558 closed Jul 7, 2025
Misc. bug: oom ，The process does not exit.
#14458 closed Jul 7, 2025
Eval bug: Server and mtmd both crashing when starting Ultravox
#13727 closed Jul 7, 2025
Eval bug: [CUDA] MoE model (Qwen3-30B-A3B) loads to GPU but does not utilize CUDA for inference in build b5466
#13729 closed Jul 7, 2025
Eval bug: Incoherence in Mistral 7B Q8_0 on Vulkan backend
#14540 closed Jul 6, 2025
Feature Request: support for image input in llama-server (and web ui)
#12792 closed Jul 6, 2025
Eval bug: swa_full = true is slower than false
#13683 closed Jul 6, 2025
Misc. bug: ./llama-server API max_completion_tokens Parameter Not Working
#13700 closed Jul 6, 2025
Eval bug: Server Returns Empty Responses Under High Load
#13703 closed Jul 6, 2025

20 Issues opened by 19 people

Feature Request: Add Explicit Context Reset for llama-cli or llama-server
#14652 opened Jul 12, 2025
Feature Request: Support Kimi K2
#14642 opened Jul 12, 2025
Eval bug: The content returned by the model is very strange
#14641 opened Jul 12, 2025
Eval bug: server: unnecessary prompt re-processing with Jamba models
#14625 opened Jul 11, 2025
Misc. bug: llama-perplexity PPL score is too high for Falcon H1 TQ1_0 model
#14616 opened Jul 10, 2025
Misc. bug: mtmd cannot decode an image provided through valid OpenAI API request
#14615 opened Jul 10, 2025
Feature Request: Improve Sampling API: Expose Top‑K/Top‑P Candidate Token Lists in C API
#14612 opened Jul 10, 2025
Feature Request: Built-in Token Probability Output for Inference API
#14611 opened Jul 10, 2025
Eval bug: Gemma 3n incoherent with HIP when prompt length > ubatch
#14604 opened Jul 9, 2025
Eval bug: thinking not working if "tool_choice" is "required" for Qwen models (QwQ, Qwen3, etc.)
#14599 opened Jul 9, 2025
Misc. bug: CLIP mmproj quantization is broken since May 14
#14588 opened Jul 9, 2025
Misc. bug: Docker builds fail with 'No url found for submodule path 'ggml/src/ggml-kompute/kompute' in .gitmodules'
#14585 opened Jul 8, 2025
Eval bug:[5808-] qwen3 30B vulkan run with GGG
#14583 opened Jul 8, 2025
Eval bug: qwen3 with <think> infer Incomplete
#14578 opened Jul 8, 2025
Eval bug: ROCm error: batched GEMM not supported
#14576 opened Jul 8, 2025
Misc. bug: OpenAI HTTP interface returns "HTTP-200" with error details in streamed chunk
#14566 opened Jul 7, 2025
Misc. bug: parameter passing for argument of type
#14564 opened Jul 7, 2025
Feature Request: add tool calling for deepseek-r1-0528
#14557 opened Jul 7, 2025
Misc. bug: crash on vulkan with new max mem alloc size calculations since b5703
#14553 opened Jul 6, 2025
Support for Ovis2 models
#14552 opened Jul 6, 2025

68 Unresolved conversations

Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.

compare-commits.sh: support both llama-bench and test-backend-ops
#14392 commented on Jul 10, 2025 • 18 new comments
finetune.cpp command-line arg
#13873 commented on Jul 11, 2025 • 12 new comments
ggml: Add initial WebGPU backend
#14521 commented on Jul 12, 2025 • 10 new comments
llama : reuse compute graphs
#14482 commented on Jul 12, 2025 • 9 new comments
model : jina-embeddings-v3 support
#13693 commented on Jul 10, 2025 • 9 new comments
imatrix: add option to display importance score statistics for a given imatrix file
#12718 commented on Jul 12, 2025 • 6 new comments
llama : add high-throughput mode
#14363 commented on Jul 12, 2025 • 5 new comments
imatrix : use GGUF to store importance matrices
#9400 commented on Jul 12, 2025 • 2 new comments
PR: Refine ggml-hexagon backend(Qualcomm Hexagon NPU backend) for latest ggml,whisper.cpp,llama.cpp
#12326 commented on Jul 12, 2025 • 0 new comments
[WIP]backend: Integrating QNN (Qualcomm AI Engine Direct) as a dedicated backend for Qualcomm NPUs
#12063 commented on Jul 12, 2025 • 0 new comments
Compile bug: cannot compile get_rows_iq1_m
#14542 commented on Jul 12, 2025 • 0 new comments
Feature Request: Gemma3n multimodal support
#14429 commented on Jul 12, 2025 • 0 new comments
Misc. bug: linux/arm64 does not exist for the server docker image
#13891 commented on Jul 12, 2025 • 0 new comments
ggml : add WebGPU backend
#7773 commented on Jul 12, 2025 • 0 new comments
Misc. bug: [SYCL] llama-cli built by Visual Studio 2022 is not working
#14086 commented on Jul 12, 2025 • 0 new comments
Vulkan Runner Frequent Crashing under workload
#14105 commented on Jul 12, 2025 • 0 new comments
Misc. bug: --cache-reuse no longer seems to be caching prompt prefixes
#14113 commented on Jul 12, 2025 • 0 new comments
Misc. bug: "llama_context_params::swa_full = true" causes very large RAM/VRAM usage
#14123 commented on Jul 12, 2025 • 0 new comments
Misc. bug: llama-server drops multi-part content for final assistant message
#14137 commented on Jul 12, 2025 • 0 new comments
Misc. bug: ROCm images cannot be found
#11913 commented on Jul 11, 2025 • 0 new comments
changelog : `libllama` API
#9289 commented on Jul 11, 2025 • 0 new comments
llama-server : implement universal assisted decoding
#12635 commented on Jul 8, 2025 • 0 new comments
Introduce New Lookup-Table(LUT)-Based Matrix Multiplication Method (TMAC)
#13206 commented on Jul 11, 2025 • 0 new comments
llama : support qwen3 rerank and embeddings
#14029 commented on Jul 9, 2025 • 0 new comments
tests : add test-model-random
#14139 commented on Jul 8, 2025 • 0 new comments
Mtmd: add a way to select device for vision encoder
#14236 commented on Jul 7, 2025 • 0 new comments
ggml: adds CONV_2D op and direct GEMM Vulkan implementation
#14316 commented on Jul 12, 2025 • 0 new comments
Q2k interleaving implementation - x86/x64 SIMD
#14373 commented on Jul 11, 2025 • 0 new comments
ggml : add pointer to attach user data
#14397 commented on Jul 8, 2025 • 0 new comments
OpenCL: add conv2d kernel
#14403 commented on Jul 11, 2025 • 0 new comments
Added CI with RISC-V RVV1.0 Hardware
#14439 commented on Jul 7, 2025 • 0 new comments
Allow truncation when embedding
#14493 commented on Jul 6, 2025 • 0 new comments
train: add simple loading already tokenized data from parquet dataset
#14522 commented on Jul 10, 2025 • 0 new comments
common: detect and prefer big cores on AArch64 hybrid CPU on linux
#14532 commented on Jul 8, 2025 • 0 new comments
ggml : add ANE backend
#10453 commented on Jul 9, 2025 • 0 new comments
Feature Request: Ability to pack multiple GGUFs into single one
#13028 commented on Jul 9, 2025 • 0 new comments
[How to serve lookahead decoding Qwen 3]
#14057 commented on Jul 9, 2025 • 0 new comments
Eval bug: Model produces gibberish or repeated output when using `-sm row` on CUDA
#14075 commented on Jul 9, 2025 • 0 new comments
Refactor: (clip.cpp) identify and regroup pre-processing strategies
#13077 commented on Jul 8, 2025 • 0 new comments
Cmake minor bug: Confusing ggml-cpu: -march=native log message when using explicit -march flags and LLAMA_NATIVE=OFF
#14058 commented on Jul 8, 2025 • 0 new comments
Eval bug: Qwen2.5-VL-7B-Instruct returns extremely inaccurate bbox coordinates
#13694 commented on Jul 7, 2025 • 0 new comments
Eval bug: OpenAI streaming API changed/broken
#14268 commented on Jul 7, 2025 • 0 new comments
Compile bug: Looking for C++ include rocwmma/rocwmma.hpp - not found
#14538 commented on Jul 7, 2025 • 0 new comments
Feature Request: Improve model load time when using the RPC backend
#12954 commented on Jul 7, 2025 • 0 new comments
bug: ValueError: Architecture qwen3 not supported
#13157 commented on Jul 7, 2025 • 0 new comments
Feature Request: add support for length_penalty
#14053 commented on Jul 7, 2025 • 0 new comments
Feature Request: s390x CI
#13243 commented on Jul 6, 2025 • 0 new comments
Qwen3-8B and other models generate garbage output / repeat tokens (GGGGGG...) in llama.cpp via LM Studio Vulkan backend
#13310 commented on Jul 6, 2025 • 0 new comments
Feature Request: support FP8 data type in llama.cpp
#14020 commented on Jul 6, 2025 • 0 new comments
Feature Request: add a new repo for convertion of gguf
#14027 commented on Jul 6, 2025 • 0 new comments
Misc. minor bug: llama-server: model parameters labels in webui settings are not shown in android chrome browser when not set in "desktop site" mode
#14036 commented on Jul 6, 2025 • 0 new comments
Feature Request: Add Ernie4.5MoE support
#14465 commented on Jul 11, 2025 • 0 new comments
Feature Request: (webui) read data from /props endpoint and use it on the webui
#11717 commented on Jul 11, 2025 • 0 new comments
Support Hybrid Models
#12331 commented on Jul 11, 2025 • 0 new comments
Eval bug: Unusual high RAM usage on Windows when running DeepSeek V3 Q2_K_XL/IQ2_XXS, on Hybrid CPU+GPU
#13978 commented on Jul 11, 2025 • 0 new comments
Eval bug: MiniCPM4 0.5B run failed
#14094 commented on Jul 11, 2025 • 0 new comments
Eval bug: Gemma3 decode and update_slots fail with parallel slots
#14097 commented on Jul 11, 2025 • 0 new comments
Feature Request: Granite 4 Support
#13275 commented on Jul 10, 2025 • 0 new comments
Misc. bug: convert_hf_to_gguf.py not working on qwen3-embedding and qwen3-embedding lora tuned models
#14459 commented on Jul 10, 2025 • 0 new comments
Eval bug: build with backend-cpu，run llama-server report load_backend: failed to find ggml_backend_init in /home/ubutnu/llama.cpp-master/build/bin/libggml-cpu.so
#14160 commented on Jul 10, 2025 • 0 new comments
Feature Request: Add support for Kokoro TTS
#11050 commented on Jul 10, 2025 • 0 new comments
Misc. bug: llama-server --batch-size always set to 64
#14046 commented on Jul 10, 2025 • 0 new comments
Can you add an example of running the model using the llama-cpp-python Python binding for quick start?
#14066 commented on Jul 10, 2025 • 0 new comments
Eval bug: Ollama Runner uses only 1 CPU for its threads, in guaranteed mode in pod when 8 CPUs are allocated to it
#14089 commented on Jul 10, 2025 • 0 new comments
Misc. bug: Qwen3-Embedding-0.6B-GGUF doesn't work for 32768 context size (too much memory used)
#14084 commented on Jul 10, 2025 • 0 new comments
Misc. bug: Server tests /health race conditions
#14092 commented on Jul 10, 2025 • 0 new comments
Misc. bug: Complex tool calling schema causes an "Unrecognized Schema" exception
#14227 commented on Jul 9, 2025 • 0 new comments
Eval bug: Tools crash and/or fail for deepseek r1/v3 unsloth dynamic quantization
#14406 commented on Jul 9, 2025 • 0 new comments

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Alternative Proxies:

Alternative Proxy