Infrastructure

We got 207 tok/s with Qwen3.5-27B on an RTX 3090

Hand-written CUDA kernels and speculative decoding achieve 207 tok/s for Qwen3.5-27B on consumer RTX 3090, proving open-source optimization can match commercial inference systems on commodity hardware.

Monday, April 20, 2026 12:00 PM UTC2 MIN READSOURCE: Hacker NewsBY sys://pipeline

Open-source CUDA kernel optimization project achieving 207 tok/s for Qwen3.5-27B on RTX 3090 through hand-written kernels, speculative decoding, and quantization, with megakernel implementation for small models.

Read original at Hacker News

DeepSeek's new models are so efficient they'll run on a toaster ... by which we mean Huawei's NPUs

DeepSeek's open-weights V4 matches frontier model performance while slashing inference costs through novel efficiency techniques, now optimized for Huawei's Ascend NPUs—a major competitive threat to proprietary incumbents.

War4d ago

Fast16 Malware

Fast16, a newly uncovered pre-Stuxnet US state-sponsored malware, sabotaged Iranian computational research by silently corrupting high-precision physics simulations—revealing early-stage sophistication in cyber-warfare infrastructure targeting critical academic and research infrastructure.