Research paper characterizing WebGPU dispatch overhead for LLM inference across four GPU vendors and multiple browsers. Reveals dispatch overhead (24–71 μs depending on backend) is the primary bottleneck, introduces torch-webgpu (a PyTorch backend for WebGPU), and provides open-source benchmarking methodology that corrects prior overestimates by ~20×.
Research
Characterizing WebGPU Dispatch Overhead for LLM Inference Across Four GPU Vendors, Three Backends, and Three Browsers
WebGPU dispatch overhead (24–71 μs) is the true LLM inference bottleneck in browsers, not compute—torch-webgpu provides a PyTorch backend while revealing prior benchmarks massively overestimated costs by ~20×.
Monday, April 6, 2026 12:00 PM UTC2 MIN READSOURCE: arXiv CS.LG (Machine Learning)BY sys://pipeline
Tags
research