benchmarking - AI News

LLM Agents Fail to Fix Real-World Security Bugs

2026-06-06 research 👁 2

New benchmarks reveal LLM agents struggle with complex security vulnerabilities, raising concerns for automated DevSecOp…

2026-06-04 industry 👁 6

CAICT will release the first public cloud large model token service performance results on June 16, establishing new ind…

2026-06-03 industry 👁 5

UL announces next-gen 3DMark benchmark featuring native 4K path tracing, AI upscaling, and frame generation for high-end…

2026-05-04 research 👁 17

MathNet introduces 30,000 competition-level math problems to rigorously test AI mathematical reasoning, raising the bar …

2026-04-30 llm 👁 19

The developer community has launched a new benchmarking tool specifically designed to evaluate whether large language mo…

2026-04-27 research 👁 24

A research team has released the AgentSearchBench benchmark, designed to address the challenge of finding the right AI a…

2026-04-27 opinion 👁 30

DeepSeek released its V4 model with characteristically modest self-assessments, but hands-on testing of its long-context…