Model Safety - AI News

The 'Gay Jailbreak' Exposes Deep Contradictions in AI Safety Alignment

2026-05-02 opinion 👁 14

An AI model attack technique dubbed the 'Gay Jailbreak' has sparked heated debate on social media. The method exploits p…

2026-05-02 llm 👁 11

A large language model bypass technique dubbed the 'Gay Jailbreak' has sparked heated debate in the AI community. The me…

2026-05-02 llm 👁 11

The latest cybersecurity tests reveal that OpenAI's GPT-5.5 has reached a level comparable to the much-hyped Mythos Prev…

2026-05-01 research 👁 11

A latest arXiv study reveals that dynamic adversarial fine-tuning reorganizes the refusal directions of language models …

2026-04-30 research 👁 13

A new study constructs a dataset of 270 harmful instructions based on AMA ethical guidelines to benchmark 72 large langu…

2026-04-30 research 👁 11

A latest arXiv paper explores how large language models behave when facing three-way conflicts among internal parameteri…

2026-04-27 research 👁 11

Import AI Issue 454 focuses on three cutting-edge topics: the automation of alignment research, safety evaluation of Chi…