LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
llama cuda-kernels deepspeed llm fastertransformer llm-inference turbomind internlm llama2 codellama llama3
-
Updated
Jan 15, 2025 - Python