SWE-Bench - AI News | GogoAI News

0% Pass Rate: New Benchmark Stumps All AI Models

2026-05-07 research 👁 9

The creators of SWE-Bench release a devastating new benchmark where Claude, GPT, and Gemini all score zero, exposing fun…

2026-05-07 llm 👁 12

Google's Gemini 2.5 Pro claims the top spot on major coding benchmarks, showcasing advanced agentic capabilities that re…

2026-05-06 research 👁 9

Meta, Stanford, and Harvard launch ProgramBench, a brutal new benchmark that asks AI to build software from scratch. GPT…

2026-05-06 llm 👁 9

Anthropic's Claude 4.5 Sonnet sets a new state-of-the-art on SWE-Bench Full, outperforming GPT-4o and Gemini in real-wor…