🏷️ Safety Training

1 articles about 'Safety Training'

Study Finds: LLM Refusal Behavior Is Controlled by a Single Direction

2026-05-02 research 👁 9

New research reveals that the safety refusal mechanism in large language models is mediated by a single direction within…