OpenAI Launches GPT-5 Turbo with Multimodal Reasoning
OpenAI unveils GPT-5 Turbo, featuring native multimodal reasoning across text, image, audio, and video inputs.
38 articles about 'AI benchmarks'
OpenAI unveils GPT-5 Turbo, featuring native multimodal reasoning across text, image, audio, and video inputs.
Anthropic's Claude 4 Opus achieves record scores across multiple scientific reasoning benchmarks, outperforming GPT-4o a…
Google's Gemini 2.5 Ultra achieves state-of-the-art results across major multimodal reasoning benchmarks, outperforming …
The creators of SWE-Bench release a devastating new benchmark where Claude, GPT, and Gemini all score zero, exposing fun…
Google DeepMind launches Gemini 2.5 Ultra, its most powerful AI model featuring native multimodal reasoning across text,…
Anthropic's Claude 4 Opus achieves unprecedented scores on graduate-level science benchmarks, surpassing GPT-4o and Gemi…
Anthropic's Claude 4 sets a new state-of-the-art score on graduate-level science benchmarks, outpacing GPT-4o and Gemini…
Elon Musk's xAI releases Grok 3 with math reasoning scores rivaling OpenAI's GPT-4o, intensifying the LLM competition.
OpenAI's next-generation GPT-5 model reportedly achieves PhD-level reasoning on internal benchmarks, signaling a major l…
Google DeepMind's Gemini Ultra 2 achieves human-level scores on graduate-level mathematics exams, marking a major milest…
Anthropic's Claude 4 sets new records on GPQA and other graduate-level science benchmarks, outpacing GPT-4o and Gemini U…
Meta, Stanford, and Harvard launch ProgramBench, a brutal new benchmark that asks AI to build software from scratch. GPT…