저장소
aivrar/multi-turboquant
Unified KV cache compression for LLM inference — TurboQuant, IsoQuant, PlanarQuant, TriAttention.
- Stars
- ★ 20
- Forks
- 5
- Issues
- 2
- Updated
- 4월 26일
- Language
- Python
- License
- MIT
#attention#compression#cuda#deep-learning#gpu#inference#kv-cache#llama-cpp