Turing Winner: LLMs Write SQL at 0% Accuracy
Turing Award Winner Declares LLMs 'Not Remotely Qualified' for SQL
Mike Stonebraker, the legendary database researcher behind Ingres and Postgres and recipient of the 2014 Turing Award, has delivered a blistering assessment of the AI industry's claims about large language models and database technology. In a wide-ranging interview, Stonebraker — whose Chinese nickname '石破天' roughly translates to 'heaven-shattering' — argued that LLMs achieve effectively 0% accuracy on real-world SQL tasks, called out Oracle, Google, and AWS by name, and made a startling admission: if starting over today, he is not sure he would advise an 18-year-old to study computer science.
The comments represent one of the most pointed critiques of AI-driven database tooling from a figure of Stonebraker's stature. At a time when vendors from Microsoft to Databricks are racing to embed natural-language-to-SQL features into their products, his warning carries significant weight.
Key Takeaways
- LLM-generated SQL fails catastrophically on production-grade queries, achieving what Stonebraker characterizes as 0% accuracy on complex, real-world workloads
- Oracle's early success was built on selling features that did not yet exist — a strategy Stonebraker attributes directly to Larry Ellison
- Google's MapReduce and eventual consistency were 'stupid' detours that the company itself corrected with Spanner
- AWS maintains roughly 15 database products when only about 3 are genuinely needed, according to Stonebraker
- Computer science may no longer be a growth industry, making it a risky default career recommendation
- Traditional database fundamentals — transactions, ACID compliance, strong consistency — remain irreplaceable despite decades of attempts to circumvent them
Why LLMs Fail at Real SQL: The Accuracy Problem
The headline claim — 0% accuracy — sounds provocative, but Stonebraker's argument is more nuanced than a soundbite. LLMs can handle simple, textbook-style queries reasonably well. The problem emerges when models face production-scale schemas with hundreds of tables, complex joins, business-specific naming conventions, and edge cases that require deep domain understanding.
Most benchmarks used to evaluate text-to-SQL capabilities, such as Spider and Bird, rely on relatively clean, well-documented schemas with limited complexity. Real enterprise databases look nothing like these benchmarks. They contain legacy naming conventions, undocumented relationships, and implicit business logic that no amount of pattern matching can reliably infer.
Stonebraker's critique aligns with growing skepticism among database engineers. While tools like GitHub Copilot and ChatGPT can generate syntactically valid SQL, the semantic correctness of those queries — whether they actually return the right data — remains deeply unreliable for anything beyond trivial use cases. For a hospital querying patient records or a bank calculating risk exposure, 'almost right' SQL is arguably worse than no SQL at all.
Stonebraker Takes Aim at Oracle, Google, and AWS
The interview was not limited to AI criticism. Stonebraker systematically dismantled what he sees as decades of misguided decisions by the industry's biggest players.
Oracle: 'Selling Lies as Features'
Stonebraker accused Larry Ellison of building Oracle's early market dominance by telling customers that unfinished features already worked. The strategy, as Stonebraker describes it, was straightforward: promise functionality, close the deal, and then let paying customers serve as unpaid beta testers and debuggers. While aggressive sales tactics are hardly unusual in enterprise software, Stonebraker frames this as outright deception — and argues it set a toxic precedent for the industry.
Google: 'MapReduce Was Stupid'
Perhaps the most technically pointed criticism was directed at Google's MapReduce paradigm and its embrace of eventual consistency. Stonebraker argues that the broader industry adopted these ideas not because they were sound, but because 'Google is smart' and people assumed the company must know what it was doing.
The result was the Hadoop ecosystem — a sprawling, resource-intensive framework that Stonebraker considers grotesquely inefficient compared to well-designed relational databases. When Google eventually released Spanner, its globally distributed database with full ACID transaction support, the company effectively conceded Stonebraker's point: you cannot escape the need for strong consistency and proper transactions.
AWS: 15 Databases When 3 Would Suffice
Amazon Web Services maintains approximately 15 distinct database services, including DynamoDB, Aurora, Redshift, Neptune, DocumentDB, Timestream, and others. Stonebraker views this as bloat driven by market positioning rather than genuine technical need. In his estimation, most workloads could be served by roughly 3 well-designed database systems — a relational OLTP engine, an analytical data warehouse, and possibly a document store.
The proliferation of specialized databases, he suggests, creates unnecessary complexity for customers and fragments engineering talent within Amazon itself.
'Computer Science Is No Longer a Growth Industry'
Stonebraker's most consequential claim may have nothing to do with databases at all. His suggestion that computer science is no longer a reliable growth career challenges one of the most deeply held assumptions in modern education and workforce planning.
For the past 2 decades, computer science has been the default 'safe' major — the field virtually guaranteed to produce high-paying jobs. University enrollment in CS programs has surged, coding bootcamps have proliferated, and governments worldwide have invested billions in STEM education pipelines.
Stonebraker's reasoning appears to center on AI-driven automation of traditional software engineering tasks. If LLMs can generate boilerplate code, write unit tests, and handle routine development work, the demand for entry-level and mid-level programmers could contract significantly. The jobs that remain would require deep specialization — precisely the kind of expertise that takes decades to build and that AI is furthest from replicating.
This view echoes concerns raised by other prominent technologists. Jensen Huang, CEO of NVIDIA, made waves in early 2024 by suggesting that children might not need to learn to code because AI would handle programming. While many in the industry pushed back on Huang's comments, Stonebraker's version of the argument carries additional credibility given his 50-year track record in systems research.
The Broader Implications for AI-Database Integration
Stonebraker's criticisms arrive at a pivotal moment for the intersection of AI and database technology. Several major trends are converging:
- Natural-language-to-SQL tools are being embedded into products from Microsoft (Copilot in Power BI), Google (Gemini in BigQuery), and dozens of startups
- Vector databases like Pinecone, Weaviate, and Milvus have attracted hundreds of millions in venture funding to support AI workloads
- RAG (Retrieval-Augmented Generation) architectures increasingly depend on database queries to ground LLM outputs in factual data
- Autonomous database management — using AI to tune, optimize, and repair databases without human DBAs — is a growing research area
If Stonebraker is right that LLMs fundamentally cannot handle complex SQL, the implications ripple across all of these categories. RAG systems that rely on AI-generated queries to retrieve context could silently return wrong information. Natural-language interfaces could give business users false confidence in answers derived from incorrect queries. And the dream of fully autonomous database operations would need significant re-scoping.
What This Means for Developers and Businesses
For practitioners, Stonebraker's message boils down to several actionable points:
- Do not trust LLM-generated SQL for mission-critical queries without rigorous human review and testing against known-correct outputs
- Invest in database fundamentals — understanding relational algebra, query optimization, and transaction isolation levels remains essential and may become more valuable, not less
- Be skeptical of vendor hype around AI-powered database features; demand benchmarks that reflect real-world schema complexity, not sanitized academic datasets
- Consolidate database infrastructure where possible; running fewer, well-understood systems is almost always better than managing a zoo of specialized engines
- Recognize that AI augments but does not replace deep domain expertise in data engineering
Companies building products on top of text-to-SQL capabilities should pay particular attention. The gap between demo-quality performance and production-quality reliability is enormous — and it is precisely the gap that Stonebraker is highlighting.
Looking Ahead: Where Database Technology Goes From Here
Stonebraker has been right before — often decades before the rest of the industry caught up. His criticisms of the relational monoculture in the 2000s anticipated the NoSQL movement. His subsequent critique of NoSQL's abandonment of transactions anticipated the 'NewSQL' correction. His work on column-store databases (C-Store, later commercialized as Vertica) predated the modern analytics warehouse boom by years.
Whether his current skepticism about LLMs and SQL proves equally prescient remains to be seen. The technology is evolving rapidly, and newer approaches — including fine-tuned models trained on specific enterprise schemas, agentic SQL generation with self-correction loops, and hybrid systems that combine LLM suggestions with formal verification — may eventually close the accuracy gap.
But Stonebraker's core insight is unlikely to be invalidated anytime soon: databases are hard, the problems they solve are fundamental, and there are no shortcuts to correctness. For an industry perpetually chasing the next paradigm shift, that is a message worth hearing — especially from someone who has spent 50 years proving it.
The question is not whether AI will transform how we interact with data. It will. The question is whether the industry will repeat the mistakes Stonebraker has spent his career cataloging — mistaking novelty for progress, hype for capability, and demo magic for production readiness.
📌 Source: GogoAI News (www.gogoai.xin)
🔗 Original: https://www.gogoai.xin/article/turing-winner-llms-write-sql-at-0-accuracy
⚠️ Please credit GogoAI when republishing.