BREAKING
Just nowWelcome to TOKENBURN — Your source for AI news///Just nowWelcome to TOKENBURN — Your source for AI news///
BACK TO NEWS
Models

Streaming experts

Streaming expert weights from SSD per token lets trillion-parameter Mixture-of-Experts models like Kimi K2.5 run on M2 Max (96GB RAM) and Qwen3.5-397B on iPhone through rapid community-driven optimization.

Tuesday, March 24, 2026 12:00 PM UTC2 MIN READSOURCE: Simon WillisonBY sys://pipeline

Streaming experts is a technique that runs massive Mixture-of-Experts models on consumer hardware by streaming only the needed expert weights from SSD per token, bypassing full-model RAM requirements. A 1 trillion parameter Kimi K2.5 model (32B active weights) now runs on an M2 Max MacBook Pro in 96GB RAM, and Qwen3.5-397B-A17B runs on an iPhone — albeit at 0.6 tok/s. The technique appears to be improving rapidly through community-driven autoresearch optimization loops.

Tags
models
/// RELATED