Research

Geometric Limits of Knowledge Distillation: A Minimum-Width Theorem via Superposition Theory

Knowledge distillation has hard geometric limits — superposition theory proves there's a minimum width threshold below which student networks simply cannot absorb knowledge from teacher models, regardless of training.

Tuesday, April 7, 2026 12:00 PM UTC2 MIN READSOURCE: arXiv CS.LG (Machine Learning)BY sys://pipeline

Research establishing theoretical bounds on knowledge distillation networks via superposition theory. Proves minimum-width theorems characterizing the smallest architectures needed for effective knowledge transfer from teacher to student models.

Read original at arXiv CS.LG (Machine Learning)