vLLM
10 mentions across all digests
vLLM is an open-source LLM inference framework with day-0 support for models like Gemma 4, used in production inference stacks including NVIDIA Dynamo for datacenter-scale serving.
The moat or the commons
Open-source and Chinese models have commoditized frontier AI capabilities in 6–12 months at 10–30x lower cost, forcing the $1 trillion U.S. capex bet to abandon margin-based monopolies and pursue regulatory/vertical lock-in instead.
[AINews] GPT 5.5 and OpenAI Codex Superapp
OpenAI's GPT-5.5 matches Claude Opus's capabilities at 1/4 the cost while bundling autonomous agent features, but immediately faces competition from DeepSeek's aggressive open-source 1.6T-parameter V4 model.
Accelerating PayPal's Commerce Agent with Speculative Decoding: An Empirical Study on EAGLE3 with Fine-Tuned Nemotron Models
PayPal cuts GPU inference costs by 50% using speculative decoding with EAGLE3, enabling one H100 to match two H100s while boosting Commerce Agent throughput 22-49% and cutting latency 18-33%.
Our eighth generation TPUs: two chips for the agentic era
Google's TPU-8 chips (8t training, 8i inference) deliver 2x better power efficiency over Ironwood, purpose-built for agentic AI workloads with Boardfly topology and bare-metal framework support.
The most interesting startups showcased at Google Cloud Next 2026
Google commits $750M to fund Cloud partners building enterprise AI agents, partnering with startups like Lovable, Notion, and Gamma to distribute Gemini-powered tools across the market.