Claude 4 Opus Smashes Graduate-Level Benchmark Records
Anthropic's Claude 4 Opus sets new state-of-the-art scores on GPQA and other graduate-level reasoning benchmarks, outpac…
2 articles about 'GPQA benchmark'
Anthropic's Claude 4 Opus sets new state-of-the-art scores on GPQA and other graduate-level reasoning benchmarks, outpac…
Anthropic's Claude 4 sets a new state-of-the-art score on graduate-level science benchmarks, outpacing GPT-4o and Gemini…