GitHub Trending repo Repositories Apr 10 4 views

rookiemann/multi-turboquant: Unified KV cache compression for LLM inference — TurboQuant, IsoQuant, PlanarQuant, TriAttention. 10 methods, GPU-validated, multi-GPU planner. Compress KV cache 5-80x to run bigger models, longer context, more agents on your GPU.

by rookiemann

Unified KV cache compression for LLM inference — TurboQuant, IsoQuant, PlanarQuant, TriAttention. 10 methods, GPU-validated, multi-GPU planner. Compress KV cache 5-80x to run bigger models, longer context, more agents on your GPU.

Read Original

AI Hardware LLM

Metadata

Stars: 6
Forks: 2
Language: Python
Watchers: 6
Open Issues: 1
License: MIT

GitHub Trending repo 10h ago

AbhishekK130804/Claude-Mythos-AI-Anthropic-App: Claude pro free Mythos design Opus Cowork Sonnet AI Anthropic App: download free PC android apk iOS, Anthropic Claude API key setup, Claude roleplay mythos client, SillyTavern Claude prompt formatting, custom system prompt jailbreak, Mythos AI creative writing app, Claude 3.5 Sonnet Opus API cost, open source LLM frontend, Claude reverse proxy

Claude pro free Mythos design Opus Cowork Sonnet AI Anthropic App: download free PC android apk iOS, Anthropic Claude API key setup, Claude roleplay mythos client, SillyTavern Clau...

GitHub Trending repo 21h ago

Doorman11991/smallcode: AI coding agent optimized for small LLMs. 87% benchmark with 4B-active model.

AI coding agent optimized for small LLMs. 87% benchmark with 4B-active model.

GitHub Trending repo 1d ago

rookiemann/multi-turboquant: Unified KV cache compression for LLM inference — TurboQuant, IsoQuant, PlanarQuant, TriAttention. 10 methods, GPU-validated, multi-GPU planner. Compress KV cache 5-80x to run bigger models, longer context, more agents on your GPU.

Metadata

Related

Doorman11991/smallcode: AI coding agent optimized for small LLMs. 87% benchmark with 4B-active model.

vorhersager/deep-learning-jax: No description