Claude 4 Opus Shatters Scientific Reasoning Benchmarks
Anthropic's Claude 4 Opus achieves record scores across multiple scientific reasoning benchmarks, outperforming GPT-4o a…
3 articles about 'GPQA Diamond'
Anthropic's Claude 4 Opus achieves record scores across multiple scientific reasoning benchmarks, outperforming GPT-4o a…
Anthropic's Claude 4 Opus achieves unprecedented scores on graduate-level science benchmarks, surpassing GPT-4o and Gemi…
Anthropic's Claude Opus 4 achieves state-of-the-art results on GPQA Diamond, outperforming OpenAI and Google on PhD-leve…