🏷️ Language Model Evaluation

3 articles about 'Language Model Evaluation'

CL-bench Life: Can Language Models Learn from Real-Life Scenarios?

2026-05-01 research 👁 15

A new study introduces CL-bench Life, a benchmark that systematically evaluates the ability of large language models to …

2026-05-01 research 👁 11

A research team has released BatteryPass-12K, the first publicly available benchmark dataset for Digital Battery Passpor…

2026-04-30 research 👁 11

A systematic study covering 115 large language models has released the DenialBench benchmark, quantitatively analyzing h…