STDStandardsModels
SWE-bench Verified
2 mentions across all digests
SWE-bench Verified is a benchmark that evaluates AI coding agents on their ability to resolve real GitHub issues in open-source repositories; OpenAI announced they stopped using it as a primary evaluation benchmark, signaling potential saturation concerns.
/// Stats
First Seen2026-03-24
Last Seen2026-03-24
Total Mentions2
Subject Mentions1
Last 7 Days0
Sources1
Peak Relevance4/5
Active Predictions0
/// Recent Stories
/// Connected Entities