AI benchmarks - AI News

OpenAI Launches GPT-5 Turbo with Multimodal Reasoning

2026-05-10 llm 👁 9

OpenAI unveils GPT-5 Turbo, featuring native multimodal reasoning across text, image, audio, and video inputs.

2026-05-08 llm 👁 10

Anthropic's Claude 4 Opus achieves record scores across multiple scientific reasoning benchmarks, outperforming GPT-4o a…

2026-05-07 llm 👁 11

Google's Gemini 2.5 Ultra achieves state-of-the-art results across major multimodal reasoning benchmarks, outperforming …

2026-05-07 research 👁 9

The creators of SWE-Bench release a devastating new benchmark where Claude, GPT, and Gemini all score zero, exposing fun…

2026-05-07 llm 👁 9

Google DeepMind launches Gemini 2.5 Ultra, its most powerful AI model featuring native multimodal reasoning across text,…

2026-05-07 llm 👁 10

Anthropic's Claude 4 Opus achieves unprecedented scores on graduate-level science benchmarks, surpassing GPT-4o and Gemi…

2026-05-07 llm 👁 10

Anthropic's Claude 4 sets a new state-of-the-art score on graduate-level science benchmarks, outpacing GPT-4o and Gemini…

2026-05-07 llm 👁 9

Elon Musk's xAI releases Grok 3 with math reasoning scores rivaling OpenAI's GPT-4o, intensifying the LLM competition.

2026-05-07 llm 👁 9

OpenAI's next-generation GPT-5 model reportedly achieves PhD-level reasoning on internal benchmarks, signaling a major l…

2026-05-07 llm 👁 10

Google DeepMind's Gemini Ultra 2 achieves human-level scores on graduate-level mathematics exams, marking a major milest…

2026-05-06 llm 👁 9

Anthropic's Claude 4 sets new records on GPQA and other graduate-level science benchmarks, outpacing GPT-4o and Gemini U…

2026-05-06 research 👁 9

Meta, Stanford, and Harvard launch ProgramBench, a brutal new benchmark that asks AI to build software from scratch. GPT…