📑 Table of Contents

Same Prompt, Different Morals: AI Models Split on Ethics

📅 · 📁 LLM News · 👁 8 views · ⏱️ 7 min read
💡 A new benchmark testing 100 ethical scenarios reveals stark divergence among leading AI models on moral reasoning.

Frontier AI models give dramatically different answers when faced with the same ethical dilemmas, according to a new benchmark that puts leading language models through 100 real-world moral scenarios. The findings raise a fundamental question that the AI industry has yet to answer: whose ethics should an AI follow?

The benchmark, first reported by The Decoder, spans everyday situations from data misuse in sales departments to protocol violations in oncology wards. Despite receiving identical prompts, models from OpenAI, Anthropic, Google, and Meta frequently land on opposing moral conclusions.

100 Scenarios Expose Deep Disagreements

The test suite doesn't rely on abstract philosophy puzzles like the classic trolley problem. Instead, it draws from realistic workplace and professional contexts where ethical lines blur — exactly the kind of situations users bring to AI assistants every day.

Scenarios include questions like whether a salesperson should use improperly obtained customer data to close a deal, or whether a medical professional should deviate from established protocol to pursue an unconventional treatment. These are not edge cases — they reflect genuine decisions that professionals face regularly.

What makes the results striking is not just that models disagree, but how they disagree. Some models consistently lean toward rule-following and institutional compliance, while others weigh individual outcomes more heavily.

Where the Models Diverge

The divergence patterns reveal distinct 'moral personalities' baked into each model through training and alignment processes. Key areas of disagreement include:

  • Workplace ethics — Models split on whether loyalty to an employer outweighs transparency with customers
  • Medical protocols — Some models rigidly defend established procedures, while others allow for clinical judgment
  • Data privacy — Responses range from strict prohibition to nuanced cost-benefit reasoning about using questionable data
  • Whistleblowing scenarios — Models disagree on when breaking confidentiality becomes morally obligatory
  • Competing stakeholder interests — No consensus emerges when patient welfare, institutional rules, and individual autonomy conflict

These differences aren't random. They trace back to deliberate choices made during reinforcement learning from human feedback (RLHF) and constitutional AI training, where teams at each company encode their own ethical frameworks into model behavior.

The Alignment Problem Nobody Talks About

Most public discussion of AI alignment focuses on preventing catastrophic outcomes — stopping models from helping build weapons or generating harmful content. But this benchmark highlights a subtler, arguably more immediate alignment challenge: everyday moral reasoning.

Millions of users already consult AI models for advice on workplace conflicts, medical decisions, and personal dilemmas. When ChatGPT and Claude give opposite moral guidance on the same scenario, users effectively receive different ethical frameworks depending on which product they happen to use.

This creates what researchers describe as a 'moral lottery' — the ethical guidance a person receives depends not on philosophical consensus but on which company's alignment team made the training decisions.

Who Decides What AI Is Allowed to Do?

The benchmark forces an uncomfortable question into the open. Every AI company makes value-laden decisions during training, but few publish detailed accounts of which ethical frameworks they prioritize or why.

Anthropic has been the most transparent, publishing its constitutional AI principles. OpenAI has shared some of its content policy reasoning. Google and Meta have released model cards and responsible AI documentation. But none of these disclosures fully explain how models are trained to navigate the gray areas this benchmark tests.

The lack of transparency matters because these models increasingly function as moral advisors at scale. A single model's ethical lean could influence millions of decisions daily across healthcare, finance, education, and law.

Cultural Bias Adds Another Layer

The problem compounds when considering global deployment. Ethical norms vary significantly across cultures, yet most frontier models are primarily aligned using feedback from English-speaking, Western-educated evaluators.

A model trained to prioritize individual autonomy — a value dominant in American ethical thinking — may give advice that conflicts with communitarian values prevalent in East Asian or African contexts. The benchmark's 100 scenarios were themselves designed from a particular cultural vantage point, meaning even the questions embed assumptions about what counts as an ethical dilemma.

What Comes Next for AI Ethics Benchmarks

This research points toward several developments the industry will need to address:

First, standardized ethical benchmarking could become as routine as performance benchmarks like MMLU or HumanEval. Companies may face pressure to disclose how their models perform on moral reasoning tests.

Second, the results could accelerate demand for user-configurable ethical frameworks — allowing individuals or organizations to adjust a model's moral reasoning to match their own values or professional codes of conduct.

Third, regulators in the EU and US are already watching. The EU AI Act's requirements for transparency in high-risk AI systems could eventually extend to documenting how models handle ethical reasoning.

The benchmark doesn't claim to identify which model gets ethics 'right.' That may be the most important finding of all — there is no single right answer, and pretending otherwise is itself an ethical choice. As AI models become embedded in more consequential decisions, the question of whose morals they encode will only grow louder.