interpretability - AI News

Natural Language Autoencoders Decode Claude's Inner Thinking

2026-05-08 research 👁 12

Anthropic researchers explore turning AI internal representations into readable text, advancing mechanistic interpretabi…

2026-05-07 research 👁 7

OpenAI publishes new research on superalignment techniques aimed at keeping frontier language models safe and aligned wi…

2026-05-05 research 👁 13

Anthropic publishes groundbreaking interpretability research revealing how Claude's internal reasoning circuits work, ad…