ai benchmarks - AI News

Microsoft Phi-4 Rivals GPT With Just 14B Parameters

2026-05-05 llm 👁 7

Microsoft Research unveils Phi-4, a 14-billion parameter small language model that matches or exceeds GPT-4 on key bench…

2026-05-05 llm 👁 11

Google DeepMind unveils Gemini 2.5 Ultra, a frontier AI model purpose-built for accelerating scientific research across …

2026-05-05 llm 👁 11

Anthropic's Claude 4 sets new records on major mathematical reasoning benchmarks, outperforming GPT-4o and Gemini Ultra.

2026-05-05 llm 👁 8

Google's Gemini 2.5 Pro claims the #1 spot across all categories on the LMSYS Chatbot Arena leaderboard, beating OpenAI …

2026-05-05 research 👁 10

Stanford's Human-Centered AI Institute launches a new benchmark designed to measure how well AI agents complete real-wor…

2026-05-05 llm 👁 11

DeepSeek's open-source R2 model matches or exceeds GPT-5 on key reasoning tasks, shaking up the AI competitive landscape…

2026-05-05 llm 👁 9

Mistral AI releases Codestral 2.0, a code-focused LLM that outperforms GPT-4o and Claude 4 Sonnet across major coding be…

2026-05-05 llm 👁 10

Google DeepMind's Gemini 2.5 Ultra tops benchmarks across text, vision, code, and math, raising the bar for frontier AI …

2026-05-05 llm 👁 10

Google DeepMind unveils Gemini 2.5 Ultra, its most powerful AI model yet, featuring a 1-million-token context window and…

2026-05-04 llm 👁 235

NIST's CAISI evaluation of DeepSeek V4 Pro offers rare independent benchmarking, challenging vendor self-reported perfor…

2026-05-04 llm 👁 36

NIST's CAISI evaluation of DeepSeek V4 Pro highlights the growing gap between self-reported AI benchmarks and independen…

2026-05-03 opinion 👁 8

A US government benchmark claims China trails in AI development, but independent evidence and DeepSeek's cost advantage …