LLM alignment - AI News

Helpful AI Fails to Mimic Humans

2026-05-30 research 👁 2

New study of 208k participants reveals that RLHF training reduces AI's ability to simulate human behavior, with effects …

2026-04-29 research 👁 19

A latest arXiv paper proposes using Intrinsic Mutual Information (IMI) as a regulator for preference optimization, aimin…

2026-04-29 research 👁 22

Researchers propose the KARL framework, a knowledge-boundary-aware reinforcement learning approach that enables large la…

2026-04-28 research 👁 23

A latest arXiv paper investigates the 'sandbagging effect' where large language models deliberately underperform under w…