METR
8 mentions across all digests
METR is an AI evaluation organization whose time-horizon methodology has been applied to measure long-horizon agent capability, including research showing frontier AI models now match human expert performance on 3+ hour offensive security tasks.
The Most Important Charts In The World
METR's scaling analysis reveals exponential improvements in AI autonomous task horizons, raising the prospect of recursive self-improvement as capability trends continue.
The West Forgot How to Make Things. Now It's Forgetting How to Code
Raytheon's 4-year Stinger restart and Europe's 50% artillery shortfall predict software engineering's emerging capacity crisis—the West is trading human talent development for AI substitutes, leaving no foundation to rebuild from when shortcuts fail.
Are the costs of AI agents also rising exponentially? (2025)
Seven years of 4,000x parameter growth and efficiency gains are being offset by exponential cost increases, making cutting-edge AI agents potentially less cost-competitive with human labor.
Import AI 453: Breaking AI agents; MirrorCode; and ten views on gradual disempowerment
Claude Opus 4.6 autonomously reimplements 16,000-line bioinformatics tools in the MirrorCode benchmark, while researchers catalog attack vectors against agents and policymakers organize 48 governance proposals for transformative AI.
Offensive Cybersecurity Time Horizons
AI capability in offensive cybersecurity is doubling every 5.7 months since 2024, with Opus 4.6 and GPT-5.3 Codex now matching human expert performance on multi-hour hacking tasks.