10x Token Spend Yields Only 2x Output, CFOs Crack Down
Heavy AI Token Usage Delivers Diminishing Returns, New Data Shows
A new study from engineering management platform Jellyfish reveals a stark inefficiency in enterprise AI spending: the top 10% of Claude Code users consume roughly 10 times more AI tokens than the median developer, yet produce only about twice the output. The finding is fueling a growing movement among chief financial officers to impose strict controls on runaway AI expenditures across their organizations.
The research, published in early May 2025, draws on behavioral data from hundreds of thousands of software engineers across hundreds of companies. It arrives at a critical inflection point for the tech industry, as enterprises shift from an 'AI at all costs' mindset toward demanding measurable return on investment from their AI tooling budgets.
Key Takeaways
- Top 10% of Claude Code users consume approximately 10x more tokens than the median developer
- Output gains from that heavy usage amount to only about 2x — a 5:1 inefficiency ratio
- CFOs are now requiring expense justification for AI tool usage across teams
- Token-based billing (typically priced per million tokens) makes overconsumption directly visible on balance sheets
- Jellyfish tracks coding behavior data from hundreds of companies and hundreds of thousands of engineers
- The finding challenges the assumption that more AI usage automatically equals better productivity
The Token Efficiency Gap Explained
AI tokens are the small text units that large language models use to process inputs and generate outputs. Every prompt, every code suggestion, and every AI-assisted refactor costs tokens — and those tokens cost real money. Most AI providers, including Anthropic (maker of Claude) and OpenAI, bill on a per-million-token basis, with prices varying by model tier.
The Jellyfish data paints a clear picture of diminishing returns. When a developer burns through 10x the tokens but only delivers 2x the results, the cost-per-unit-of-output balloons by a factor of 5. For an individual developer, this might translate to hundreds of extra dollars per month. Across a 500-person engineering organization, the waste can quickly reach six or even seven figures annually.
Nicholas Alcorano, Jellyfish's head of AI and research, described this pattern as 'extreme token abuse' and warned it is not a sustainable usage strategy. His assessment underscores a growing consensus in the industry: simply throwing more AI compute at a problem does not linearly scale results.
CFOs Step In to Control AI Spending
The financial implications are impossible to ignore. Alcorano noted in a recent interview that 'CFOs have already started strictly managing these expenditures.' He emphasized that in most enterprises today, employees must provide expense justification for AI-related costs, just as they would for travel or software subscriptions.
This shift represents a maturation of enterprise AI adoption. In 2023 and early 2024, many companies gave teams open-ended budgets to experiment with AI tools like GitHub Copilot, ChatGPT Enterprise, Claude for Business, and various coding assistants. The prevailing logic was simple: any productivity boost justified the spend.
Now, with 18+ months of real-world usage data, finance teams are asking harder questions:
- What is the actual ROI per dollar spent on AI tokens?
- Which teams are generating measurable output gains?
- Are there usage patterns that indicate waste rather than productivity?
- Can we set per-developer or per-team token budgets without hurting performance?
Alcorano acknowledged that clients 'want to accelerate their business and are willing to invest in AI,' but added a critical caveat: 'the prerequisite is proving that spending is justified and generates real value.'
Why More Tokens Don't Equal Better Code
The Jellyfish findings raise a fundamental question: why does heavy token consumption fail to scale output proportionally? Industry experts point to several contributing factors.
First, prompt inefficiency plays a major role. Less experienced AI users often submit verbose, poorly structured prompts that consume large numbers of tokens while generating mediocre results. They may iterate repeatedly on the same task, burning through tokens with each attempt rather than refining their approach.
Second, there is the problem of over-reliance on AI for tasks it handles poorly. Developers who use AI coding assistants for complex architectural decisions or nuanced debugging often end up in lengthy back-and-forth conversations that consume massive token volumes without reaching satisfactory conclusions. A senior developer might solve the same problem in 15 minutes of manual work.
Third, context window stuffing — the practice of feeding entire codebases or lengthy documentation into an AI model — dramatically inflates token usage. While sometimes necessary, this approach is frequently used indiscriminately, pushing costs up without proportional gains.
Finally, the data suggests that the most productive developers are not necessarily the heaviest AI users. Instead, they tend to be strategic about when and how they deploy AI assistance, using it for well-defined tasks like boilerplate generation, test writing, and documentation rather than open-ended exploration.
Industry Context: The AI ROI Reckoning
The Jellyfish research fits into a broader pattern of AI spending scrutiny that has intensified throughout 2025. Gartner predicted earlier this year that at least 30% of generative AI projects would be abandoned after the proof-of-concept stage by the end of 2025, citing unclear business value as a primary reason.
Microsoft, which has invested over $13 billion in OpenAI, reported in its latest earnings call that GitHub Copilot now generates over $1 billion in annual recurring revenue. However, the company also acknowledged that enterprise customers are increasingly demanding usage analytics and cost controls for AI tools deployed across their engineering organizations.
Amazon Web Services and Google Cloud have both introduced more granular billing dashboards for AI services in recent months, responding to customer demand for better visibility into token-level spending. This infrastructure-level shift suggests that the cost management trend is industry-wide, not limited to any single platform.
Compared to the early days of cloud computing — when companies similarly struggled to control spend before tools like CloudHealth and Spot.io emerged — the AI cost management cycle appears to be compressing. Enterprises are demanding accountability faster, likely because AI costs can scale more unpredictably than traditional cloud infrastructure.
What This Means for Developers and Engineering Leaders
For individual developers, the message is clear: strategic AI usage outperforms brute-force consumption. Engineers who invest time in learning effective prompting techniques, understanding model strengths and limitations, and choosing the right tool for each task will deliver better results at lower cost.
Engineering leaders should consider implementing several practical measures:
- Set team-level token budgets with monthly caps and alerting thresholds
- Track output metrics alongside consumption to identify high-efficiency users and share their practices
- Invest in prompt engineering training to help developers get better results from fewer tokens
- Evaluate model selection — not every task requires the most expensive, most capable model
- Establish usage policies that distinguish between productive experimentation and wasteful overconsumption
- Benchmark teams internally using efficiency ratios rather than raw output numbers
The Jellyfish data also implies that companies should resist the temptation to simply cut AI budgets across the board. The median users in the study were still generating meaningful productivity gains — the problem lies specifically with unmanaged, excessive consumption at the top end of the distribution.
Looking Ahead: AI Spending Enters Its Rational Phase
The era of unchecked AI spending appears to be ending. As CFOs gain better visibility into token-level costs and output metrics, expect a new generation of AI FinOps tools to emerge — platforms specifically designed to optimize, monitor, and govern enterprise AI spending.
Several startups are already building in this space, and established players like Datadog and New Relic are adding AI cost observability features to their platforms. The market for AI spend management could mirror the trajectory of cloud cost optimization, which grew into a multi-billion-dollar industry over the past decade.
For AI providers like Anthropic, OpenAI, and Google, this trend creates both risk and opportunity. On one hand, tighter budgets could slow revenue growth. On the other, providers who offer better efficiency tools, tiered pricing, and usage analytics may win long-term enterprise loyalty.
The bottom line from the Jellyfish research is deceptively simple but profoundly important: in enterprise AI, efficiency beats volume. The companies that master this principle will gain a genuine competitive advantage. Those that continue to equate higher token consumption with better outcomes will find their CFOs — and eventually their boards — asking uncomfortable questions about where all that money went.
📌 Source: GogoAI News (www.gogoai.xin)
🔗 Original: https://www.gogoai.xin/article/10x-token-spend-yields-only-2x-output-cfos-crack-down
⚠️ Please credit GogoAI when republishing.