Research establishing theoretical bounds on knowledge distillation networks via superposition theory. Proves minimum-width theorems characterizing the smallest architectures needed for effective knowledge transfer from teacher to student models.
Research
Geometric Limits of Knowledge Distillation: A Minimum-Width Theorem via Superposition Theory
Knowledge distillation has hard geometric limits — superposition theory proves there's a minimum width threshold below which student networks simply cannot absorb knowledge from teacher models, regardless of training.
Tuesday, April 7, 2026 12:00 PM UTC2 MIN READSOURCE: arXiv CS.LG (Machine Learning)BY sys://pipeline
Tags
research