A comprehensive 2025 LLM year-in-review by Sebastian Raschka covering the dominant trend of RLVR+GRPO reasoning models (sparked by DeepSeek R1), architectural convergence on MoE + efficient attention, and open problems like continual learning and catastrophic forgetting. The piece traces the yearly evolution of post-training techniques (RLHF→LoRA→mid-training→RLVR) and makes predictions for 2026–2027 including expanded RLVR domains and inference-time scaling. Dense technical substance with practitioner-relevant takeaways on GRPO improvements that materially impact training stability.
Models
The State Of LLMs 2025: Progress, Problems, and Predictions
DeepSeek R1 sparked a post-training paradigm shift: RLVR and GRPO techniques are becoming the industry standard, replacing RLHF with architectures converging on MoE and efficient attention.
Friday, March 27, 2026 12:00 PM UTC2 MIN READSOURCE: Ahead of AI (Sebastian Raschka)BY sys://pipeline
Tags
models
/// RELATED
Safety4d ago
Android VPN IP Leak Even If Always-On VPN Enabled
Android 16's Always-On VPN leaks user IPs through an unvalidated Binder method in ConnectivityManager that any unprivileged app can exploit — Google deemed it outside their threat model.
Policy4d ago
Some of Xteink’s credit card-sized e-readers are losing their best feature
Xteink disables third-party firmware flashing on new X3/X4 e-readers to prevent crashes and screen damage, while grandfathering existing owners to retain customization.