CONConceptsResearch

KV Cache

5 mentions across all digests

KV Cache is an inference optimization technique that stores intermediate key/value attention computations to avoid recomputing them on each token generation step in language models, with research extending it to zero-token knowledge injection.

/// Stats

First Seen2026-03-27

Last Seen2026-04-27

Total Mentions5

Subject Mentions2

Last 7 Days1

Sources4

Peak Relevance4/5

Active Predictions0

/// Recent Stories

2026-04-27HIGH

TurboQuant: A First-Principles Walkthrough

TurboQuant compresses LLM KV caches to 2–4 bits per coordinate using training-free random rotation, enabling practical memory efficiency gains without calibration overhead.

2026-04-21HIGH

High-Fidelity KV Cache Summarization Using Entropy and Low-Rank Reconstruction

Entropy-aware KV cache summarization reduces VRAM overhead for million-token LLM contexts while preserving semantic fidelity through low-rank reconstruction, enabling longer context windows without pruning.

2026-04-20HIGH

Sequential KV Cache Compression via Probabilistic Language Tries: Beyond the Per-Vector Shannon Limit

Researchers propose probabilistic language tries for KV cache compression that exceed theoretical per-vector limits, potentially reducing inference memory footprint and compute costs for LLM deployment.

2026-04-07HIGH

Knowledge Packs: Zero-Token Knowledge Delivery via KV Cache Injection

Knowledge Packs inject external knowledge into language models through KV cache without consuming tokens, reducing inference costs for knowledge-augmented tasks.

2026-03-27HIGH

Understanding and Coding the KV Cache in LLMs from Scratch

KV caches explained: the memory-vs-latency tradeoff that powers efficient LLM inference, from conceptual foundations to working Python code.

/// Connected Entities

PRDArXiv

2 shared