[WIP] [GSOC] KV Caching for LLM inference #27205

nklskyoy · 2025-04-07T19:01:29Z

This PR Introduces new dnn::ArgKind: DNN_ARG_CACHED. This can be used to store cache of dynamic size, which persists between runs of a net.

The cache management (allocation, growing, writing) is intended to be performed directly from the layers.
The memory is organized in pages, where each page is a Mat.

Caching is particularly useful for LLM inference, where Key and Value tokens are reused for generating subsequent tokens. The KV Caching will be added to attention layer.

Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

I agree to contribute to the project under Apache 2 License.
To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
The PR is proposed to the proper branch
There is a reference to the original bug report and related work
There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
The feature is well documented and sample code can be built with the project CMake

Reference: #27176

modules/dnn/include/opencv2/dnn/dnn.hpp

modules/dnn/src/net_impl.hpp

PagedCacheManager and DNN_ARG_CACHED

ce5db2b

asmorkalov added feature category: dnn labels Apr 8, 2025

nklskyoy added 2 commits April 8, 2025 10:18

drop readNetFromGGUF

bd6c746

PagedCacheManager

156e7b9

asmorkalov reviewed Apr 22, 2025

View reviewed changes

modules/dnn/include/opencv2/dnn/dnn.hpp Outdated Show resolved Hide resolved

modules/dnn/src/net_impl.hpp Outdated Show resolved Hide resolved

asmorkalov added the pr: needs test New functionality requires minimal tests set label Apr 22, 2025

fengyuentau self-requested a review April 23, 2025 07:39

nklskyoy added 3 commits April 27, 2025 23:28

DNN_ARG_CACHED description

d1f8891

PagedCacheManager: basic implementation

b10cbc7

add reference to vLLM

65efb30

asmorkalov added this to the 5.0-release milestone Apr 28, 2025

nklskyoy added 5 commits May 9, 2025 23:11

bind cache to netimpl

b79c0ab

solve compiler errors

f692afb

Merge branch '5.x' into arg-cached

591928f

resolve compiler warnings

410f25f

no designated initializer

3335a20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] [GSOC] KV Caching for LLM inference #27205

[WIP] [GSOC] KV Caching for LLM inference #27205

nklskyoy commented Apr 7, 2025 •

edited

Loading

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

[WIP] [GSOC] KV Caching for LLM inference #27205

Are you sure you want to change the base?

[WIP] [GSOC] KV Caching for LLM inference #27205

Conversation

nklskyoy commented Apr 7, 2025 • edited Loading

Pull Request Readiness Checklist

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

nklskyoy commented Apr 7, 2025 •

edited

Loading