Microsoft Phi-4 Rivals GPT With Just 14B Parameters
Microsoft Research unveils Phi-4, a 14-billion parameter small language model that matches or exceeds GPT-4 on key bench…
38 articles about 'ai benchmarks'
Microsoft Research unveils Phi-4, a 14-billion parameter small language model that matches or exceeds GPT-4 on key bench…
Google DeepMind unveils Gemini 2.5 Ultra, a frontier AI model purpose-built for accelerating scientific research across …
Anthropic's Claude 4 sets new records on major mathematical reasoning benchmarks, outperforming GPT-4o and Gemini Ultra.
Google's Gemini 2.5 Pro claims the #1 spot across all categories on the LMSYS Chatbot Arena leaderboard, beating OpenAI …
Stanford's Human-Centered AI Institute launches a new benchmark designed to measure how well AI agents complete real-wor…
DeepSeek's open-source R2 model matches or exceeds GPT-5 on key reasoning tasks, shaking up the AI competitive landscape…
Mistral AI releases Codestral 2.0, a code-focused LLM that outperforms GPT-4o and Claude 4 Sonnet across major coding be…
Google DeepMind's Gemini 2.5 Ultra tops benchmarks across text, vision, code, and math, raising the bar for frontier AI …
Google DeepMind unveils Gemini 2.5 Ultra, its most powerful AI model yet, featuring a 1-million-token context window and…
NIST's CAISI evaluation of DeepSeek V4 Pro offers rare independent benchmarking, challenging vendor self-reported perfor…
NIST's CAISI evaluation of DeepSeek V4 Pro highlights the growing gap between self-reported AI benchmarks and independen…
A US government benchmark claims China trails in AI development, but independent evidence and DeepSeek's cost advantage …