-
Notifications
You must be signed in to change notification settings - Fork 12.3k
Insights: ggml-org/llama.cpp
Overview
Could not load contribution data
Please try again later
43 Releases published by 1 person
-
b5834
published
Jul 6, 2025 -
b5835
published
Jul 6, 2025 -
b5836
published
Jul 7, 2025 -
b5837
published
Jul 7, 2025 -
b5838
published
Jul 7, 2025 -
b5839
published
Jul 8, 2025 -
b5840
published
Jul 8, 2025 -
b5841
published
Jul 8, 2025 -
b5843
published
Jul 8, 2025 -
b5844
published
Jul 8, 2025 -
b5845
published
Jul 8, 2025 -
b5846
published
Jul 8, 2025 -
b5847
published
Jul 8, 2025 -
b5848
published
Jul 8, 2025 -
b5849
published
Jul 8, 2025 -
b5851
published
Jul 9, 2025 -
b5852
published
Jul 9, 2025 -
b5853
published
Jul 9, 2025 -
b5854
published
Jul 9, 2025 -
b5855
published
Jul 9, 2025 -
b5856
published
Jul 9, 2025 -
b5857
published
Jul 9, 2025 -
b5858
published
Jul 10, 2025 -
b5859
published
Jul 10, 2025 -
b5860
published
Jul 10, 2025 -
b5861
published
Jul 10, 2025 -
b5862
published
Jul 10, 2025 -
b5863
published
Jul 10, 2025 -
b5864
published
Jul 10, 2025 -
b5865
published
Jul 10, 2025 -
b5866
published
Jul 10, 2025 -
b5867
published
Jul 10, 2025 -
b5868
published
Jul 11, 2025 -
b5869
published
Jul 11, 2025 -
b5870
published
Jul 11, 2025 -
b5872
published
Jul 11, 2025 -
b5873
published
Jul 11, 2025 -
b5874
published
Jul 12, 2025 -
b5875
published
Jul 12, 2025 -
b5876
published
Jul 12, 2025 -
b5880
published
Jul 12, 2025 -
b5882
published
Jul 12, 2025 -
b5884
published
Jul 12, 2025
49 Pull requests merged by 23 people
-
test-backend-ops : cover lfm2 cases in test_ssm_conv
#14651 merged
Jul 12, 2025 -
readme : add LFM2 to models section
#14650 merged
Jul 12, 2025 -
CUDA: add set rows for f32 and f16
#14551 merged
Jul 12, 2025 -
sync : ggml
#14648 merged
Jul 12, 2025 -
sync : ggml
#14647 merged
Jul 12, 2025 -
server : fix pooled embedding output
#14645 merged
Jul 12, 2025 -
vulkan: support SET_ROWS
#14587 merged
Jul 12, 2025 -
vulkan: optimizations for deepseek prompt processing
#14555 merged
Jul 12, 2025 -
llama : support LiquidAI LFM2 hybrid model family
#14620 merged
Jul 11, 2025 -
HIP: Add HIP 7.0+ compatibility for hipBLAS compute types
#14634 merged
Jul 11, 2025 -
readme : add hot PRs
#14636 merged
Jul 11, 2025 -
CUDA: 4D FlashAttention support
#14628 merged
Jul 11, 2025 -
llama : move enum llama_vocab_pre_type to implementation
#14631 merged
Jul 11, 2025 -
model : add Midm-2.0 model
#14626 merged
Jul 11, 2025 -
Granite Four
#13550 merged
Jul 11, 2025 -
OpenCL: add tiled mul_mat_f16_f32
#14535 merged
Jul 10, 2025 -
opencl: add
set_rows
forf16
andf32
#14547 merged
Jul 10, 2025 -
Smoldocling support
#14597 merged
Jul 10, 2025 -
Docs: script to auto-generate ggml operations docs
#14598 merged
Jul 10, 2025 -
cmake : do not search for curl libraries by ourselves
#14613 merged
Jul 10, 2025 -
SYCL: Initial set_rows kernel implementation
#14562 merged
Jul 10, 2025 -
llama : minor coding style fix for smollm3
#14605 merged
Jul 10, 2025 -
cmake : bump llguidance version to v1.0.1
#14609 merged
Jul 10, 2025 -
cmake : llguidance build parser library only
#14608 merged
Jul 10, 2025 -
cuda : support Falcon-H1 state size for SSM_SCAN
#14602 merged
Jul 10, 2025 -
llama : remove llm_graph_input_one
#14603 merged
Jul 9, 2025 -
llama : support Jamba hybrid Transformer-Mamba models
#7531 merged
Jul 9, 2025 -
ggml : add ggml_scale_bias
#14417 merged
Jul 9, 2025 -
ggml : prevent integer overflow in tensor size calculation
#14595 merged
Jul 9, 2025 -
model : add skt/A.X-4.0 model vocabulary
#14589 merged
Jul 9, 2025 -
llama : remove unintended whitespace
#14592 merged
Jul 9, 2025 -
llama: add initial support for Falcon-H1 model family
#14534 merged
Jul 9, 2025 -
convert : fix smollm3 jinja template
#14586 merged
Jul 9, 2025 -
vulkan: optimize flash attention split_k_reduce
#14554 merged
Jul 8, 2025 -
model : fix hunyuan moe chat template
#14584 merged
Jul 8, 2025 -
model : add SmolLM3
#14581 merged
Jul 8, 2025 -
memory : fix broken batch splits for recurrent cache
#14575 merged
Jul 8, 2025 -
vulkan: fix rope with partial rotation and non-cont src
#14582 merged
Jul 8, 2025 -
server: Add ability to mount server at prefix
#14544 merged
Jul 8, 2025 -
model : add hunyuan moe
#14425 merged
Jul 8, 2025 -
vulkan: increase timeout for CI
#14574 merged
Jul 8, 2025 -
cuda : fix rope with partial rotation and non-cont src
#14580 merged
Jul 8, 2025 -
CUDA: add bilinear interpolation for upscale
#14563 merged
Jul 8, 2025 -
musa: fix build warnings (unused variable)
#14561 merged
Jul 7, 2025 -
llama : fix incorrect minicpm3 v_states shape
#14571 merged
Jul 7, 2025 -
llama : remove ggml_cont where possible
#14568 merged
Jul 7, 2025 -
CUDA: add bf16 and i32 to getrows
#14529 merged
Jul 7, 2025 -
vulkan: unpack more values at a time for iquants mat mul
#14485 merged
Jul 6, 2025 -
vulkan: fix rms_norm+mul fusion
#14545 merged
Jul 6, 2025
20 Pull requests opened by 16 people
-
model : add PLaMo-2 model
#14560 opened
Jul 7, 2025 -
metal : reuse graphs
#14570 opened
Jul 7, 2025 -
quantize: fix minor logic flaw in --tensor-type
#14572 opened
Jul 7, 2025 -
docker : add cann build pipline
#14591 opened
Jul 9, 2025 -
metal : fuse add
#14596 opened
Jul 9, 2025 -
kv-cache : opt mask set input
#14600 opened
Jul 9, 2025 -
sycl: Batched mulmat rework for oneDNN dispatch
#14617 opened
Jul 10, 2025 -
SYCL: use 1D kernel for set_rows
#14618 opened
Jul 10, 2025 -
webui: Change Download function to download the full text of the conversation
#14619 opened
Jul 10, 2025 -
tool: add convertation of text/parquet to custom format
#14622 opened
Jul 10, 2025 -
HIP: Enable Matrix cores for MMQ Kernels, Enable stream-K for CDNA 3
#14624 opened
Jul 10, 2025 -
graph : refactor context to not pass gf explicitly
#14629 opened
Jul 11, 2025 -
Add EXAONE 4.0 model architecture
#14630 opened
Jul 11, 2025 -
OpenCL: add `mul_mat_f16_f32_image` kernel
#14635 opened
Jul 11, 2025 -
common: add config presets for falcon
#14638 opened
Jul 11, 2025 -
Add CUDA non-contiguous Unary Ops support
#14639 opened
Jul 11, 2025 -
Support diffusion models: Add Dream 7B
#14644 opened
Jul 12, 2025 -
webui : add a preset feature to the settings
#14649 opened
Jul 12, 2025 -
vulkan: add RTE variants for glu/add/sub/mul/div
#14653 opened
Jul 12, 2025 -
Model : Add support for Kimi-K2
#14654 opened
Jul 12, 2025
40 Issues closed by 11 people
-
Misc. bug: Embedding/pooling: I receive 10xvector not 1xvector
#14543 closed
Jul 12, 2025 -
Eval bug: llama-mtmd-cli doesn't support system prompts
#13454 closed
Jul 12, 2025 -
Feature Request: video support in mtmd-cli / server
#13754 closed
Jul 12, 2025 -
Feature Request: Set default of --numa to distribute
#13850 closed
Jul 12, 2025 -
Eval bug: Embeddings Always returned as non
#13854 closed
Jul 12, 2025 -
Feature Request: Optimize for Nvidia Jetson Series' truly Unified Memory Architecture
#13856 closed
Jul 12, 2025 -
Eval bug: Command-A generates a single repeating token when using split mode row on P40
#14228 closed
Jul 11, 2025 -
Misc. bug: Quantize error
#14621 closed
Jul 11, 2025 -
Feature Request: (webui) add import / export function for ALL conversations
#11718 closed
Jul 11, 2025 -
Feature Request: add per-request "reasoning" options in llama-server
#13272 closed
Jul 11, 2025 -
webui: First user prompt sometimes disappears after sending
#13622 closed
Jul 11, 2025 -
Misc. bug: llama-cli.exe stopped working on Windows Server 10
#13767 closed
Jul 11, 2025 -
Eval bug: seed seems to be locked to a single value 4294967295
#13823 closed
Jul 11, 2025 -
Misc. bug: CANNOT CONVERT THE MODEL
#14610 closed
Jul 11, 2025 -
Feature Request: Add support of convert.py for model Qwen2.5-Omni-7B
#12641 closed
Jul 10, 2025 -
Compile bug: Vulkan Build Fails in Termux/Proot Due to Missing Cooperative Matrix Shader Variables
#13801 closed
Jul 10, 2025 -
ERROR:hf-to-gguf:Model Qwen2_5_VLModel is not supported
#13802 closed
Jul 10, 2025 -
ERROR:hf-to-gguf:Model MllamaForConditionalGeneration is not supported
#13805 closed
Jul 10, 2025 -
Eval bug: Mistral-Small not working on vulkan
#14550 closed
Jul 9, 2025 -
Suport for Jamba JambaForCausalLM
#6372 closed
Jul 9, 2025 -
Feature Request: Falcon-H1
#13681 closed
Jul 9, 2025 -
something with llama_server? slow vs llama_cli
#13560 closed
Jul 9, 2025 -
Feature Request: Hunyuan-A13B model support
#14415 closed
Jul 8, 2025 -
CI: fix ubuntu-22-cmake-vulkan
#14569 closed
Jul 8, 2025 -
open source dataset for low bit quantization?
#13736 closed
Jul 8, 2025 -
Feature Request: Add keep_alive function for llama-server
#13748 closed
Jul 8, 2025 -
Misc. bug: segfault in test-gbnf-validator
#13762 closed
Jul 8, 2025 -
Eval bug: error loading model: vk::PhysicalDevice::createDevice: ErrorExtensionNotPresent
#14559 closed
Jul 7, 2025 -
Compile bug: CMAKE_CUDA_COMPILER-NOTFOUND
#14558 closed
Jul 7, 2025 -
Misc. bug: oom ,The process does not exit.
#14458 closed
Jul 7, 2025 -
Eval bug: Server and mtmd both crashing when starting Ultravox
#13727 closed
Jul 7, 2025 -
Eval bug: Incoherence in Mistral 7B Q8_0 on Vulkan backend
#14540 closed
Jul 6, 2025 -
Feature Request: support for image input in llama-server (and web ui)
#12792 closed
Jul 6, 2025 -
Eval bug: swa_full = true is slower than false
#13683 closed
Jul 6, 2025 -
Misc. bug: ./llama-server API max_completion_tokens Parameter Not Working
#13700 closed
Jul 6, 2025 -
Eval bug: Server Returns Empty Responses Under High Load
#13703 closed
Jul 6, 2025
20 Issues opened by 19 people
-
Feature Request: Add Explicit Context Reset for llama-cli or llama-server
#14652 opened
Jul 12, 2025 -
Feature Request: Support Kimi K2
#14642 opened
Jul 12, 2025 -
Eval bug: The content returned by the model is very strange
#14641 opened
Jul 12, 2025 -
Eval bug: server: unnecessary prompt re-processing with Jamba models
#14625 opened
Jul 11, 2025 -
Misc. bug: llama-perplexity PPL score is too high for Falcon H1 TQ1_0 model
#14616 opened
Jul 10, 2025 -
Misc. bug: mtmd cannot decode an image provided through valid OpenAI API request
#14615 opened
Jul 10, 2025 -
Feature Request: Improve Sampling API: Expose Top‑K/Top‑P Candidate Token Lists in C API
#14612 opened
Jul 10, 2025 -
Feature Request: Built-in Token Probability Output for Inference API
#14611 opened
Jul 10, 2025 -
Eval bug: Gemma 3n incoherent with HIP when prompt length > ubatch
#14604 opened
Jul 9, 2025 -
Eval bug: thinking not working if "tool_choice" is "required" for Qwen models (QwQ, Qwen3, etc.)
#14599 opened
Jul 9, 2025 -
Misc. bug: CLIP mmproj quantization is broken since May 14
#14588 opened
Jul 9, 2025 -
Eval bug:[5808-] qwen3 30B vulkan run with GGG
#14583 opened
Jul 8, 2025 -
Eval bug: qwen3 with <think> infer Incomplete
#14578 opened
Jul 8, 2025 -
Eval bug: ROCm error: batched GEMM not supported
#14576 opened
Jul 8, 2025 -
Misc. bug: OpenAI HTTP interface returns "HTTP-200" with error details in streamed chunk
#14566 opened
Jul 7, 2025 -
Misc. bug: parameter passing for argument of type
#14564 opened
Jul 7, 2025 -
Feature Request: add tool calling for deepseek-r1-0528
#14557 opened
Jul 7, 2025 -
Misc. bug: crash on vulkan with new max mem alloc size calculations since b5703
#14553 opened
Jul 6, 2025 -
Support for Ovis2 models
#14552 opened
Jul 6, 2025
68 Unresolved conversations
Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.
-
compare-commits.sh: support both llama-bench and test-backend-ops
#14392 commented on
Jul 10, 2025 • 18 new comments -
finetune.cpp command-line arg
#13873 commented on
Jul 11, 2025 • 12 new comments -
ggml: Add initial WebGPU backend
#14521 commented on
Jul 12, 2025 • 10 new comments -
llama : reuse compute graphs
#14482 commented on
Jul 12, 2025 • 9 new comments -
model : jina-embeddings-v3 support
#13693 commented on
Jul 10, 2025 • 9 new comments -
imatrix: add option to display importance score statistics for a given imatrix file
#12718 commented on
Jul 12, 2025 • 6 new comments -
llama : add high-throughput mode
#14363 commented on
Jul 12, 2025 • 5 new comments -
imatrix : use GGUF to store importance matrices
#9400 commented on
Jul 12, 2025 • 2 new comments -
PR: Refine ggml-hexagon backend(Qualcomm Hexagon NPU backend) for latest ggml,whisper.cpp,llama.cpp
#12326 commented on
Jul 12, 2025 • 0 new comments -
[WIP]backend: Integrating QNN (Qualcomm AI Engine Direct) as a dedicated backend for Qualcomm NPUs
#12063 commented on
Jul 12, 2025 • 0 new comments -
Compile bug: cannot compile get_rows_iq1_m
#14542 commented on
Jul 12, 2025 • 0 new comments -
Feature Request: Gemma3n multimodal support
#14429 commented on
Jul 12, 2025 • 0 new comments -
Misc. bug: linux/arm64 does not exist for the server docker image
#13891 commented on
Jul 12, 2025 • 0 new comments -
ggml : add WebGPU backend
#7773 commented on
Jul 12, 2025 • 0 new comments -
Misc. bug: [SYCL] llama-cli built by Visual Studio 2022 is not working
#14086 commented on
Jul 12, 2025 • 0 new comments -
Vulkan Runner Frequent Crashing under workload
#14105 commented on
Jul 12, 2025 • 0 new comments -
Misc. bug: --cache-reuse no longer seems to be caching prompt prefixes
#14113 commented on
Jul 12, 2025 • 0 new comments -
Misc. bug: "llama_context_params::swa_full = true" causes very large RAM/VRAM usage
#14123 commented on
Jul 12, 2025 • 0 new comments -
Misc. bug: llama-server drops multi-part content for final assistant message
#14137 commented on
Jul 12, 2025 • 0 new comments -
Misc. bug: ROCm images cannot be found
#11913 commented on
Jul 11, 2025 • 0 new comments -
changelog : `libllama` API
#9289 commented on
Jul 11, 2025 • 0 new comments -
llama-server : implement universal assisted decoding
#12635 commented on
Jul 8, 2025 • 0 new comments -
Introduce New Lookup-Table(LUT)-Based Matrix Multiplication Method (TMAC)
#13206 commented on
Jul 11, 2025 • 0 new comments -
llama : support qwen3 rerank and embeddings
#14029 commented on
Jul 9, 2025 • 0 new comments -
tests : add test-model-random
#14139 commented on
Jul 8, 2025 • 0 new comments -
Mtmd: add a way to select device for vision encoder
#14236 commented on
Jul 7, 2025 • 0 new comments -
ggml: adds CONV_2D op and direct GEMM Vulkan implementation
#14316 commented on
Jul 12, 2025 • 0 new comments -
Q2k interleaving implementation - x86/x64 SIMD
#14373 commented on
Jul 11, 2025 • 0 new comments -
ggml : add pointer to attach user data
#14397 commented on
Jul 8, 2025 • 0 new comments -
OpenCL: add conv2d kernel
#14403 commented on
Jul 11, 2025 • 0 new comments -
Added CI with RISC-V RVV1.0 Hardware
#14439 commented on
Jul 7, 2025 • 0 new comments -
Allow truncation when embedding
#14493 commented on
Jul 6, 2025 • 0 new comments -
train: add simple loading already tokenized data from parquet dataset
#14522 commented on
Jul 10, 2025 • 0 new comments -
common: detect and prefer big cores on AArch64 hybrid CPU on linux
#14532 commented on
Jul 8, 2025 • 0 new comments -
ggml : add ANE backend
#10453 commented on
Jul 9, 2025 • 0 new comments -
Feature Request: Ability to pack multiple GGUFs into single one
#13028 commented on
Jul 9, 2025 • 0 new comments -
[How to serve lookahead decoding Qwen 3]
#14057 commented on
Jul 9, 2025 • 0 new comments -
Eval bug: Model produces gibberish or repeated output when using `-sm row` on CUDA
#14075 commented on
Jul 9, 2025 • 0 new comments -
Refactor: (clip.cpp) identify and regroup pre-processing strategies
#13077 commented on
Jul 8, 2025 • 0 new comments -
Cmake minor bug: Confusing ggml-cpu: -march=native log message when using explicit -march flags and LLAMA_NATIVE=OFF
#14058 commented on
Jul 8, 2025 • 0 new comments -
Eval bug: Qwen2.5-VL-7B-Instruct returns extremely inaccurate bbox coordinates
#13694 commented on
Jul 7, 2025 • 0 new comments -
Eval bug: OpenAI streaming API changed/broken
#14268 commented on
Jul 7, 2025 • 0 new comments -
Compile bug: Looking for C++ include rocwmma/rocwmma.hpp - not found
#14538 commented on
Jul 7, 2025 • 0 new comments -
Feature Request: Improve model load time when using the RPC backend
#12954 commented on
Jul 7, 2025 • 0 new comments -
bug: ValueError: Architecture qwen3 not supported
#13157 commented on
Jul 7, 2025 • 0 new comments -
Feature Request: add support for length_penalty
#14053 commented on
Jul 7, 2025 • 0 new comments -
Feature Request: s390x CI
#13243 commented on
Jul 6, 2025 • 0 new comments -
Qwen3-8B and other models generate garbage output / repeat tokens (GGGGGG...) in llama.cpp via LM Studio Vulkan backend
#13310 commented on
Jul 6, 2025 • 0 new comments -
Feature Request: support FP8 data type in llama.cpp
#14020 commented on
Jul 6, 2025 • 0 new comments -
Feature Request: add a new repo for convertion of gguf
#14027 commented on
Jul 6, 2025 • 0 new comments -
Misc. minor bug: llama-server: model parameters labels in webui settings are not shown in android chrome browser when not set in "desktop site" mode
#14036 commented on
Jul 6, 2025 • 0 new comments -
Feature Request: Add Ernie4.5MoE support
#14465 commented on
Jul 11, 2025 • 0 new comments -
Feature Request: (webui) read data from /props endpoint and use it on the webui
#11717 commented on
Jul 11, 2025 • 0 new comments -
Support Hybrid Models
#12331 commented on
Jul 11, 2025 • 0 new comments -
Eval bug: Unusual high RAM usage on Windows when running DeepSeek V3 Q2_K_XL/IQ2_XXS, on Hybrid CPU+GPU
#13978 commented on
Jul 11, 2025 • 0 new comments -
Eval bug: MiniCPM4 0.5B run failed
#14094 commented on
Jul 11, 2025 • 0 new comments -
Eval bug: Gemma3 decode and update_slots fail with parallel slots
#14097 commented on
Jul 11, 2025 • 0 new comments -
Feature Request: Granite 4 Support
#13275 commented on
Jul 10, 2025 • 0 new comments -
Misc. bug: convert_hf_to_gguf.py not working on qwen3-embedding and qwen3-embedding lora tuned models
#14459 commented on
Jul 10, 2025 • 0 new comments -
Eval bug: build with backend-cpu,run llama-server report load_backend: failed to find ggml_backend_init in /home/ubutnu/llama.cpp-master/build/bin/libggml-cpu.so
#14160 commented on
Jul 10, 2025 • 0 new comments -
Feature Request: Add support for Kokoro TTS
#11050 commented on
Jul 10, 2025 • 0 new comments -
Misc. bug: llama-server --batch-size always set to 64
#14046 commented on
Jul 10, 2025 • 0 new comments -
Can you add an example of running the model using the llama-cpp-python Python binding for quick start?
#14066 commented on
Jul 10, 2025 • 0 new comments -
Eval bug: Ollama Runner uses only 1 CPU for its threads, in guaranteed mode in pod when 8 CPUs are allocated to it
#14089 commented on
Jul 10, 2025 • 0 new comments -
Misc. bug: Qwen3-Embedding-0.6B-GGUF doesn't work for 32768 context size (too much memory used)
#14084 commented on
Jul 10, 2025 • 0 new comments -
Misc. bug: Server tests /health race conditions
#14092 commented on
Jul 10, 2025 • 0 new comments -
Misc. bug: Complex tool calling schema causes an "Unrecognized Schema" exception
#14227 commented on
Jul 9, 2025 • 0 new comments -
Eval bug: Tools crash and/or fail for deepseek r1/v3 unsloth dynamic quantization
#14406 commented on
Jul 9, 2025 • 0 new comments