ai benchmarks - AI News

Stanford HAI Finds AI Benchmarks Hitting Ceiling

2026-05-06 research 👁 9

Stanford's HAI 2025 AI Index reveals that leading AI models now saturate most major benchmarks, raising urgent questions…

2026-05-06 llm 👁 12

Hugging Face debuts an open leaderboard for evaluating agentic AI systems, bringing transparency to one of AI's fastest-…

2026-05-06 llm 👁 8

Google DeepMind launches Gemini 2.5 Ultra, its most powerful AI model yet, featuring a million-token context window and …

2026-05-06 llm 👁 7

OpenAI releases GPT-5.5 Instant with 52.5% fewer hallucinations and major math gains, replacing GPT-5.3 Instant for all …

2026-05-06 llm 👁 10

Anthropic's Claude 4 sets new records on GPQA and other graduate-level evaluations, outperforming GPT-4o and Gemini Ultr…

2026-05-06 llm 👁 10

Elon Musk's xAI releases Grok 3.5, which outperforms OpenAI's GPT-5 across major mathematical reasoning benchmarks.

2026-05-06 llm 👁 10

Anthropic's Claude Opus 4 achieves state-of-the-art results on GPQA Diamond, outperforming OpenAI and Google on PhD-leve…

2026-05-06 llm 👁 9

OpenAI's GPT-5 Turbo achieves breakthrough scores on complex reasoning benchmarks, outpacing rivals by significant margi…

2026-05-06 opinion 👁 10

A growing debate among researchers and industry leaders questions whether popular AI benchmarks reflect genuine intellig…

2026-05-05 llm 👁 10

Mistral AI releases Codestral 2.0, a code-focused LLM matching or exceeding GPT-5 on key coding benchmarks.

2026-05-05 opinion 👁 8

The context window race is effectively over. The real competition now shifts to reasoning depth, efficiency, and archite…

2026-05-05 research 👁 9

Carnegie Mellon researchers unveil new techniques to enhance reasoning capabilities in vision-language models, closing k…