Decoding Nvidia's Groq-powered LPX and the rest of its new rack systems

Nvidia's Groq-powered LPX rack system delivers 150 TB/s bandwidth and 500-1000+ tokens/sec by pairing LPUs for decode operations with GPUs for prefill, enabling faster inference for trillion-parameter models at hyperscale.

Friday, March 20, 2026 12:00 PM UTC2 MIN READSOURCE: The RegisterBY sys://pipeline

Nvidia revealed the Groq-3 LPX rack system at GTC — 256 LP30 LPUs per rack, achieving 150 TB/s bandwidth and 500–1000+ tokens/sec — explaining the $20B Groq acquisition as a time-to-market play for high-speed inference hardware. The hybrid architecture splits workloads: GPUs handle prefill and attention ops while LPUs handle bandwidth-intensive feed-forward decode, orchestrated by Nvidia's Dynamo platform. Faster token generation directly enables test-time scaling and more responsive AI agents/code assistants, though LPX is primarily targeted at hyperscalers and neocloud providers running trillion-parameter models.