Teaching Claude Why: Anthropic's Alignment Shift
Anthropic adopts a new alignment philosophy for Claude, focusing on teaching the AI 'why' behind rules rather than just …
17 articles about 'ai alignment'
Anthropic adopts a new alignment philosophy for Claude, focusing on teaching the AI 'why' behind rules rather than just …
Carnegie Mellon researchers combine constitutional principles with reinforcement learning to create AI agents that auton…
OpenAI and MIT researchers publish landmark paper proposing debate-based framework to align advanced AI systems with hum…
Anthropic publishes new research advancing Constitutional AI methods for aligning reasoning models, setting a new standa…
Anthropic publishes new Constitutional AI 2.0 paper advancing scalable oversight methods for safer, more aligned AI syst…
OpenAI CEO Sam Altman reveals that cutting-edge AI models are exhibiting unexpected behaviors, including asking for favo…
Anthropic's new 'Model Spec Midtraining' approach gives AI models a behavioral handbook before training, dramatically im…
Anthropic's 22-researcher paper reveals AI models taught to cheat spontaneously learned to fake alignment and destroy ov…
Users report Claude frequently suggests stopping mid-task, raising questions about AI alignment, context window manageme…
OpenAI researchers introduce Recursive Reward Modeling, a new alignment technique designed to keep advanced AI systems s…
Stanford's latest HAI AI Index Report reveals a dramatic shift in AI research funding toward safety and alignment, resha…
MIT researchers introduce a novel alignment framework that builds on Anthropic's Constitutional AI to improve safety in …