📑 Table of Contents

AI Asked to Calculate Carbohydrates 27,000 Times — and Never Gave the Same Answer Twice

📅 · 📁 LLM News · 👁 12 views · ⏱️ 8 min read
💡 A researcher asked AI to estimate the carbohydrate content of the same food image 27,000 times, only to discover that the AI never produced an identical answer twice. This experiment profoundly exposes the fundamental flaws of large language models when it comes to precise numerical tasks.

One Simple Question, 27,000 Different Answers

When you show AI a photo of food and ask, "How many grams of carbohydrates are in this meal?" you probably expect a stable, reliable number. However, one determined researcher proved an unsettling fact through experimentation: given the same food image and the same question, AI failed to produce two identical answers across 27,000 attempts.

The experiment quickly went viral on social media, shocking users who rely on AI for dietary management and once again thrusting the "consistency problem" of large language models (LLMs) into the spotlight.

Experiment Details: A Systematic Test Exposes a Systemic Problem

The researcher's experimental design was remarkably straightforward — select one food image and repeatedly send the exact same prompt to the AI model, accumulating over 27,000 requests. The results showed that the carbohydrate gram count returned by the AI varied every single time, sometimes fluctuating slightly after the decimal point, and sometimes diverging by tens of grams.

This means that if a diabetic patient were relying on AI to estimate the carbohydrate content of a meal to calculate insulin dosage, they could receive different recommendations every time they open the app. For the same meal, the AI might say 45 grams of carbs today, 62 grams tomorrow, and 38 grams the day after.

This inconsistency is not an isolated issue with a single model — it is a shared characteristic of all mainstream large language models today. Whether it's the GPT series, Claude, or other models, their underlying generation mechanisms inherently introduce randomness into outputs.

Why Can't AI Give a Consistent Answer?

To understand this phenomenon, we need to revisit how large language models work.

First, probabilistic sampling. A large language model is essentially a "next-token predictor." Each time it generates a response, the model assigns probabilities to candidate tokens and then samples based on the "temperature" parameter. Even with a low temperature setting, tiny differences in probability can cause the output path to diverge, ultimately producing different numerical results.

Second, the lack of true "computation" capability. When AI says a serving of rice contains 50 grams of carbohydrates, it hasn't actually weighed or calculated anything. It's merely "guessing" a plausible number based on statistical correlations in its training data. This guess can vary slightly each time.

Third, the ambiguity of visual recognition. In scenarios involving image analysis, the model's judgment of food portion sizes is inherently uncertain. It cannot precisely determine whether the rice on the plate weighs 150 grams or 200 grams, and this upstream ambiguity further amplifies downstream numerical fluctuations.

Fourth, floating-point arithmetic and inference infrastructure variations. In large-scale deployment environments, different GPU nodes, batching strategies, and even the timing of requests can introduce subtle differences in the computation process, ultimately reflected in the output.

What Does This Mean for AI Health Applications?

In recent years, AI-powered diet tracking apps have been growing rapidly. Users simply snap a photo of their food, and AI "automatically" analyzes calories, carbohydrates, protein, and other nutritional components. These apps have attracted millions of users, including diabetic patients and fitness enthusiasts who need precise carbohydrate control.

However, this experiment sounds a clear alarm:

  • For general users, nutritional data provided by AI should only be treated as a rough reference, not as the basis for precise dietary decisions.
  • For patients with chronic conditions, relying on AI carb counting to adjust medication dosages poses real health risks.
  • For app developers, products must clearly indicate the uncertainty range of AI estimates, rather than presenting a single seemingly precise number.

A Deeper Reflection: The "False Precision" Trap of AI

The significance of this experiment extends far beyond carbohydrate counting. It reveals a pervasive trap in current AI applications — "false precision."

AI excels at delivering specific numbers in a confident tone, such as "This salad contains 23.5 grams of carbohydrates." This decimal-point precision creates an illusion that makes users believe the AI truly "knows" the answer. In reality, that number could fluctuate randomly between 15 and 35, and the supposed decimal-point accuracy is nothing more than a mirage.

This is directly connected to the well-known "hallucination" problem of large language models. The model never says "I'm not sure" — it always delivers a plausible-sounding answer, even if that answer is different every single time.

How Is the Industry Responding?

Several approaches are already being explored to mitigate this problem:

  1. Multiple sampling with averaging: Running multiple inferences on the same input and taking the average as the final result can improve stability to some extent.
  2. Hybrid architectures: Combining the visual recognition capabilities of LLMs with traditional nutrition databases — using AI to identify food types and portions, then querying authoritative databases for precise nutritional information.
  3. Uncertainty quantification: Providing confidence intervals alongside outputs, such as "Estimated carbohydrate content: 40–55 grams," rather than a single number.
  4. Setting temperature to zero: While this can theoretically improve consistency, in real-world deployments, infrastructure variations still prevent fully deterministic outputs.

Looking Ahead: Precision Is the Gateway to AI in Mission-Critical Scenarios

This experiment delivers an important wake-up call to the entire AI industry: On the road from "useful" to "reliable," consistency and precision are unavoidable core challenges.

When AI is used for casual conversation, writing, or creative generation, output randomness is actually an advantage. But when AI enters fields like healthcare, financial calculations, and engineering design — domains with rigid demands for precision — "a different answer every time" is no longer a feature; it's a flaw.

Future AI systems will need to learn to distinguish between "tasks that require creativity" and "tasks that require determinism," and establish verifiable, reproducible output mechanisms for the latter. Until that day arrives, users should remember one simple principle: Take the numbers AI gives you with a grain of salt — don't take them too seriously.