benchmarks - AI News | GogoAI News

Anthropic Pivots: Claude Drops Benchmarks for Agent Autonomy

2026-05-19 llm 👁 11

Anthropic shifts focus from benchmark scores to developing autonomous AI agents with distinct personalities and reasonin…

2026-05-07 research 👁 11

OpenAI unveils a new visual reasoning benchmark designed to stress-test multimodal AI systems on complex perception task…

2026-05-07 llm 👁 9

Leading open-source LLMs from Meta, Mistral, and others now match or exceed proprietary systems across major benchmarks,…

2026-05-05 research 👁 11

Hugging Face proposes a comprehensive benchmark framework to standardize how AI agents are evaluated across real-world t…

2026-05-03 llm 👁 11

xAI releases Grok 4.3 as a pragmatic upgrade — cheaper and faster, but still trailing GPT-5.5 and Claude Opus 4.7 in key…