DeepSeek releases the V4 model series featuring Mixture-of-Experts variants: DeepSeek-V4-Pro (1.6T total, 49B activated) and DeepSeek-V4-Flash (284B total, 13B activated), both supporting million-token context. Key technical innovations include Hybrid Attention Architecture combining Compressed Sparse Attention and Heavily Compressed Attention—V4-Pro requires only 27% of inference FLOPs and 10% KV cache versus V3.2. The models employ the Muon optimizer and undergo two-stage post-training combining domain-specific expert cultivation with unified consolidation via on-policy distillation.
Models
DeepSeek-V4: Towards Highly Efficient Million-Token Context Intelligence
DeepSeek-V4 slashes inference costs to 27% of its predecessor while scaling to million-token context, demonstrating major efficiency gains for practical long-context LLMs.
Friday, April 24, 2026 12:00 PM UTC2 MIN READSOURCE: Hacker NewsBY sys://pipeline
Tags
models
/// RELATED
Research3d ago
Reducing ML-KEM-768 encapsulation key sizes by 24 octets
Bit-packing optimization trims ML-KEM-768 post-quantum cryptography encapsulation keys by 24 octets, enabling better UDP packet alignment for practical PQC deployment.
Infrastructure4d ago
The AI scaffolding layer is collapsing. LlamaIndex's CEO explains what survives.
As the AI scaffolding layer consolidates, LlamaIndex's CEO reveals which infrastructure platforms and tool categories survive the market shakeout.