Models

DeepSeek-V4: Towards Highly Efficient Million-Token Context Intelligence

DeepSeek-V4 slashes inference costs to 27% of its predecessor while scaling to million-token context, demonstrating major efficiency gains for practical long-context LLMs.

Friday, April 24, 2026 12:00 PM UTC2 MIN READSOURCE: Hacker NewsBY sys://pipeline

DeepSeek releases the V4 model series featuring Mixture-of-Experts variants: DeepSeek-V4-Pro (1.6T total, 49B activated) and DeepSeek-V4-Flash (284B total, 13B activated), both supporting million-token context. Key technical innovations include Hybrid Attention Architecture combining Compressed Sparse Attention and Heavily Compressed Attention—V4-Pro requires only 27% of inference FLOPs and 10% KV cache versus V3.2. The models employ the Muon optimizer and undergo two-stage post-training combining domain-specific expert cultivation with unified consolidation via on-policy distillation.

Read original at Hacker News

Reducing ML-KEM-768 encapsulation key sizes by 24 octets

Bit-packing optimization trims ML-KEM-768 post-quantum cryptography encapsulation keys by 24 octets, enabling better UDP packet alignment for practical PQC deployment.

Infrastructure4d ago

The AI scaffolding layer is collapsing. LlamaIndex's CEO explains what survives.

As the AI scaffolding layer consolidates, LlamaIndex's CEO reveals which infrastructure platforms and tool categories survive the market shakeout.