The 'Gay Jailbreak' Exposes Deep Contradictions in AI Safety Alignment
An AI model attack technique dubbed the 'Gay Jailbreak' has sparked heated debate on social media. The method exploits p…
7 articles about 'Model Safety'
An AI model attack technique dubbed the 'Gay Jailbreak' has sparked heated debate on social media. The method exploits p…
A large language model bypass technique dubbed the 'Gay Jailbreak' has sparked heated debate in the AI community. The me…
The latest cybersecurity tests reveal that OpenAI's GPT-5.5 has reached a level comparable to the much-hyped Mythos Prev…
A latest arXiv study reveals that dynamic adversarial fine-tuning reorganizes the refusal directions of language models …
A new study constructs a dataset of 270 harmful instructions based on AMA ethical guidelines to benchmark 72 large langu…
A latest arXiv paper explores how large language models behave when facing three-way conflicts among internal parameteri…
Import AI Issue 454 focuses on three cutting-edge topics: the automation of alignment research, safety evaluation of Chi…