저장소

aivrar/multi-turboquant

Unified KV cache compression for LLM inference — TurboQuant, IsoQuant, PlanarQuant, TriAttention.

Stars
20
Forks
5
Issues
2
Updated
4월 26일
Language
Python
License
MIT
#attention#compression#cuda#deep-learning#gpu#inference#kv-cache#llama-cpp
GitHub 열기 ↗