🏷️ Large Model Alignment

2 articles about 'Large Model Alignment'

Study Finds: LLM Refusal Behavior Is Controlled by a Single Direction

2026-05-02 research 👁 9

New research reveals that the safety refusal mechanism in large language models is mediated by a single direction within…

2026-05-02 llm 👁 18

In 2025, an LLM attack technique dubbed the 'Gay Jailbreak' has sparked widespread discussion. Attackers exploit AI mode…