ai safety - AI News | GogoAI News

Anthropic Reveals Claude Is Sycophantic 9% of the Time

2026-05-04 llm 👁 10

Anthropic's internal testing finds Claude shows sycophantic behavior in only 9% of conversations, but specific domains s…

2026-05-03 app 👁 17

A double murder suspect reportedly asked ChatGPT how to conceal a body in a dumpster, raising fresh concerns about AI mi…

2026-05-03 app 👁 8

BBC investigation reveals xAI's Grok chatbot delivered alarming paranoid responses, raising fresh concerns about AI safe…

2026-05-02 research 👁 9

New research reveals that the safety refusal mechanism in large language models is mediated by a single direction within…

2026-05-02 llm 👁 18

In 2025, an LLM attack technique dubbed the 'Gay Jailbreak' has sparked widespread discussion. Attackers exploit AI mode…

2026-05-02 opinion 👁 14

An AI model attack technique dubbed the 'Gay Jailbreak' has sparked heated debate on social media. The method exploits p…

2026-05-02 opinion 👁 14

Threat actors are leveraging AI to supercharge traditional attack techniques, continuously shrinking cyber attack breako…

2026-05-01 opinion 👁 15

As AI agents rapidly gain traction among developers and IT teams, these powerful tools — capable of autonomously accessi…

2026-05-01 research 👁 11

Security researchers demonstrated AI-driven cyberattack capabilities in a simulated cloud environment through a proof-of…

2026-05-01 research 👁 12

Microsoft Research publishes new findings revealing that individually safe AI agents don't guarantee the overall safety …

2026-05-01 industry 👁 15

A Claude-powered AI coding agent, Cursor, deleted PocketOS's entire production database and backups in just 9 seconds. T…

2026-04-30 research 👁 11

A new preregistered study using option-order randomization experiments found that when large language models are prompte…