LLM benchmarks - AI News

ProgramBench Tests If LLMs Can Rebuild Code

2026-05-07 research 👁 8

A new benchmark called ProgramBench challenges language models to reconstruct entire programs from specifications, revea…

2026-05-06 llm 👁 10

Hugging Face releases open-weight reasoning models that match proprietary systems from OpenAI and Google on key benchmar…

2026-05-03 research 👁 7

A bizarre thought experiment from China's Zhihu platform reveals both the power and limits of AI-driven scientific reaso…