ATLAS (Adaptive Test-time Learning and Autonomous Specialization) achieves 74.6% pass@1 on LiveCodeBench v5 using a frozen Qwen3-14B-Q4_K_M model on a $500 RTX 5060 Ti 16GB GPU — claimed to exceed Claude Sonnet — with zero fine-tuning and no API calls. The gains come entirely from inference-time scaffolding: PlanSearch, constraint-driven generation, Lens routing, and self-verified iterative repair (PR-CoT), with Phase 3 alone adding +7.3pp. The methodology note is important: the reported score uses best-of-3 candidate selection plus iterative repair, making it closer to pass@3 than true single-shot pass@1.
Research
$500 GPU outperforms Claude Sonnet on coding benchmarks using open-source AI system
Inference-time optimization lets a $500 GPU match Claude Sonnet on coding benchmarks — ATLAS demonstrates test-time techniques like PlanSearch and iterative repair can rival fine-tuning, though best-of-3 selection complicates the single-shot comparison.
Friday, March 27, 2026 12:00 PM UTC2 MIN READSOURCE: LobstersBY sys://pipeline
Tags
research