Benchmarking - AI News

MathNet Brings 30K Competition Problems to AI Benchmarking

2026-05-04 research 👁 8

MathNet introduces 30,000 competition-level math problems to rigorously test AI mathematical reasoning, raising the bar …

2026-04-30 llm 👁 9

The developer community has launched a new benchmarking tool specifically designed to evaluate whether large language mo…

2026-04-27 research 👁 10

A research team has released the AgentSearchBench benchmark, designed to address the challenge of finding the right AI a…

2026-04-27 opinion 👁 13

DeepSeek released its V4 model with characteristically modest self-assessments, but hands-on testing of its long-context…