📑 Table of Contents

Study Reveals 'Persuadability' Risks of LLMs as Legal Adjudication Tools

📅 · 📁 Research · 👁 9 views · ⏱️ 7 min read
💡 A new study examines the 'persuadability' problem of large language models in legal decision-making scenarios, finding that LLMs are easily swayed to change their positions when faced with adversarial argumentation — raising serious warnings about their prospects as judicial assistants or first-instance adjudicators.

When AI Takes the Judge's Bench: A Risk Too Important to Ignore

As large language models (LLMs) rapidly penetrate judicial and administrative domains worldwide — from legal consultation assistants to first-instance adjudication tools — AI is being granted increasing legal decision-making authority. However, a new paper from arXiv (arXiv:2604.26233v1) is sounding the alarm. Researchers have systematically revealed the problem of "persuadability" in LLMs during legal decision-making: when confronted with contentious legal issues, models are prone to shifting their rulings based on the argumentation strategies employed by the parties involved.

This finding carries significant implications for the global push toward "AI judiciary" practices.

Core Finding: LLMs Are Easily 'Persuaded' by Arguments

A defining feature of legal decision-making is that adjudicators must respond to arguments from both disputing parties while maintaining independent judgment amid adversarial argumentation. The research team designed a series of experiments targeting this characteristic to test how mainstream LLMs perform in legal scenarios.

The study found that when facing complex or ambiguous legal questions, LLM rulings are highly dependent on the wording, order, and rhetorical force of the input arguments. Specifically:

  • Argument order effects: Arguments presented later tend to exert greater influence on the model's final judgment, exhibiting a clear "recency bias"
  • Rhetorical sensitivity: Even when the legal substance remains unchanged, more persuasive language packaging can significantly shift the model's inclination
  • Positional instability: In gray-area cases that are difficult to adjudicate, models lack stable legal reasoning anchors and tend to vacillate between different directions

These characteristics differ fundamentally from human judges' "biases." While human judges are also influenced by rhetoric, legal training typically equips them with the ability to distinguish between argument quality and rhetorical technique — an area where LLMs appear to have structural deficiencies.

From a technical perspective, the persuadability problem of LLMs is rooted in their underlying architecture. As probabilistic models trained on massive text corpora, LLMs are essentially predicting "the most likely next token" rather than engaging in genuine legal reasoning. This means:

First, models lack normative positions. Legal adjudication requires decision-makers to form independent judgments within a legal framework, but LLMs have no inherent legal value system. Their outputs are highly dependent on input context. When arguments from opposing sides are asymmetric, the model will almost inevitably lean toward the party whose argumentation is more "fluent" rather than the party with stronger legal standing.

Second, adversarial scenarios amplify vulnerabilities. In ordinary Q&A scenarios, LLM "compliance" may be only a minor issue; but in legal adversarial settings, this trait can be strategically exploited by parties. It is foreseeable that once LLMs are widely used for legal adjudication, "AI-targeted argument optimization" will emerge as a new litigation strategy.

Third, fairness risks cannot be overlooked. If LLMs are more easily persuaded by particular styles of argumentation, then parties who can afford professional "AI argument optimization" services will gain a systematic advantage — running directly counter to the fundamental principles of judicial fairness.

Global Practice and Controversies

Notably, multiple countries and regions have already explored integrating AI into judicial processes to varying degrees. Estonia once planned to use AI for handling small claims disputes, some Chinese courts are already using AI-assisted sentencing recommendations, and risk assessment tools like COMPAS in the United States have long been widely deployed in criminal justice.

The value of this study lies not only in identifying the problem but also in providing a concrete, quantifiable dimension for evaluating LLM legal decision-making capabilities — "persuadability." Previous evaluations largely focused on models' legal knowledge coverage and reasoning accuracy, while overlooking whether models can maintain judgment stability and independence in real-world adversarial scenarios.

Future Outlook: AI in the Judiciary Requires a More Cautious Path

This research draws an important line of caution for AI applications in the legal field. In the short term, the findings offer at least the following insights:

  1. LLMs should not directly assume adjudicative roles, especially in scenarios involving adversarial argumentation. Their role should be strictly limited to auxiliary reference rather than final decision-maker
  2. Dedicated evaluation frameworks are needed that incorporate "persuadability" into standardized testing for LLM legal applications, rather than focusing solely on knowledge reserves and logical reasoning
  3. Technical improvement directions may include enhancing models' "argument quality assessment" capabilities, enabling them to distinguish between rhetorical persuasiveness and legal argument strength

In the long run, this study reminds us that when pushing AI into high-stakes decision-making scenarios, we cannot rush deployment simply based on impressive performance in standardized tests. The essence of legal decision-making is the exercise of power, and the exercise of power demands accountability, independence, and stability — precisely the qualities that current LLMs most sorely lack.

The prospects for combining AI and law are vast, but the road to that future requires far more prudence and restraint than we might imagine.