benchmark - AI News | GogoAI News

Claude 4.5 Sonnet Tops SWE-Bench Full Benchmark

2026-05-06 llm 👁 9

Anthropic's Claude 4.5 Sonnet sets a new state-of-the-art on SWE-Bench Full, outperforming GPT-4o and Gemini in real-wor…

2026-05-06 research 👁 11

Meta's FAIR lab releases a comprehensive new benchmark framework designed to evaluate the safety of autonomous AI agents…

2026-05-06 industry 👁 7

SPEC releases CPU 2026, a major update to its industry-standard benchmark suite, expanding from 43 to 52 tests with AI a…

2026-05-04 industry 👁 8

AMD's first commercial 3D V-Cache desktop processor appears in PassMark database, revealing key specs ahead of official …