📑 Table of Contents

New Tool 'Mainline' Stops AI Agents From Breaking Code They Don't Understand

📅 · 📁 AI Applications · 👁 7 views · ⏱️ 13 min read
💡 A staff engineer built Mainline to solve a recurring problem: AI coding agents make plausible but wrong changes based on outdated context buried in PRs and Slack.

AI Agents Write 'Reasonable but Wrong' Code — One Engineer Built a Fix

A staff engineer has built a new developer tool called Mainline that aims to solve one of the most insidious problems with AI-powered coding: agents that produce code that looks perfectly reasonable but is based on outdated or incorrect assumptions. The tool is currently in dogfooding and early beta, with the creator seeking testers from the developer community.

The problem Mainline addresses isn't about AI writing buggy code. It's about AI writing plausible code that violates decisions the team already made — decisions that live in PR comments, Slack threads, and engineers' heads, not in the codebase itself.

Key Takeaways

  • AI coding agents like Claude Code and Cursor frequently make changes based on stale artifacts in repositories
  • The tool addresses 'context gaps' — team decisions that exist outside the codebase
  • Existing solutions like AGENTS.md and ADR (Architecture Decision Records) only cover predictable scenarios
  • Mainline is currently in early beta and seeking testers
  • The creator is a staff engineer who wrote internal AI coding guidelines for their team
  • The problem grows worse as codebases accumulate technical debt and abandoned experiments

The Redis Problem: A Case Study in AI Misunderstanding

The creator shared a specific example that perfectly illustrates the issue. Their repository contained a half-finished Redis queue implementation — a redis.go file, TODO comments, and a Redis service configured in docker-compose.yml. When Claude Code encountered this code, it did something entirely logical: it tried to complete the implementation.

The problem? The team had abandoned Redis 3 weeks earlier. Replication latency was causing duplicate billing events — a serious production issue that led to the architectural decision to drop Redis entirely. But that decision wasn't documented in the code. It lived scattered across a PR comment thread, a few Slack messages, and the institutional memory of several engineers.

Code search tools can find the Redis files. They can surface the TODO comments. What they cannot find is the decision to stop using Redis. This is the fundamental gap that Mainline attempts to bridge.

Why Existing Solutions Fall Short

The developer tried several established approaches before building Mainline, and each revealed significant limitations in practice.

AGENTS.md and CLAUDE.md files — the convention of placing instruction files in repository roots — can certainly include directives like 'don't use Redis.' But this approach only works for problems you can anticipate. New architectural decisions happen continuously, and nobody consistently updates these files. The maintenance burden scales linearly with the pace of decision-making, which in active teams can mean multiple significant choices per week.

Architecture Decision Records (ADRs) represent another commonly suggested solution. ADRs are structured documents that capture the context, decision, and consequences of architectural choices. In theory, they're perfect. In practice, adoption remains notoriously low. Most engineering teams start writing ADRs with enthusiasm and abandon the practice within months. The overhead of formally documenting every decision simply doesn't fit into fast-moving development cycles.

Other approaches the community has explored include:

  • Embedding context directly in code comments (doesn't scale, quickly becomes stale)
  • Using AI-generated summaries of PR discussions (misses Slack and verbal decisions)
  • Creating comprehensive onboarding documents (static, rarely updated)
  • Relying on code review to catch AI mistakes (reactive, not preventive)

Each of these methods addresses a slice of the problem but none captures the full picture of team knowledge that lives outside the repository.

The Deeper Problem: Institutional Knowledge Is Invisible to AI

This challenge highlights a fundamental limitation in how current AI coding assistants interact with codebases. Tools like GitHub Copilot, Cursor, Claude Code, and Amazon Q Developer are remarkably capable at understanding code syntax, patterns, and even architectural conventions. But they operate almost exclusively on what's visible in the repository.

The reality of software development is that a significant portion of critical context never makes it into the code. Research from the University of Victoria found that up to 50% of architectural knowledge in software projects exists only as tacit knowledge — understood by team members but never formally documented. This 'dark knowledge' includes abandoned approaches, performance constraints discovered in production, compliance requirements discussed in meetings, and vendor-specific limitations learned through painful experience.

When a human engineer joins a team, they gradually absorb this context through code reviews, pair programming, team discussions, and occasionally making the same mistakes. An AI agent has none of these channels. It sees the repository as a static snapshot and makes inferences based solely on what the code appears to be doing.

Why This Matters Now More Than Ever

The timing of Mainline's development coincides with a massive surge in AI agent adoption across the software industry. According to GitHub's 2024 developer survey, over 70% of developers now use AI coding tools in some capacity. More importantly, the trend is shifting from simple autocomplete toward autonomous agents that can plan and execute multi-file changes independently.

This shift dramatically amplifies the risk of context-blind modifications:

  • Autocomplete suggests a line — low risk, easy to review
  • Chat-based AI generates a function — moderate risk, reviewable in isolation
  • Autonomous agents modify multiple files across a codebase — high risk, harder to audit
  • Multi-step agents chain decisions together — compounding risk if early assumptions are wrong

As agents become more autonomous, the cost of incorrect assumptions multiplies. A wrong premise in step 1 of a 10-step plan doesn't just produce one bad file — it produces a coherent but fundamentally flawed implementation across the entire feature.

Companies like Anthropic, OpenAI, and Google DeepMind are actively working on improving agent reasoning capabilities. But better reasoning doesn't help when the premises themselves are wrong. An agent that reasons perfectly from flawed assumptions will produce even more convincingly wrong code.

The Growing Market for AI Development Guardrails

Mainline enters a nascent but rapidly growing category of tools designed to add guardrails around AI coding agents. The market includes several adjacent solutions:

Greptile offers codebase-aware AI that indexes repositories to provide better context. Sourcegraph Cody similarly uses code intelligence to ground AI responses in actual codebase patterns. LinearB and Swarmia focus on engineering metrics that could potentially flag when AI-generated code diverges from team norms.

However, none of these tools specifically address the problem of capturing and surfacing negative decisions — the things a team chose NOT to do and why. This is the specific niche Mainline appears to target.

The broader AI developer tools market reached an estimated $5.1 billion in 2024, according to Gartner, with projections suggesting it could exceed $22 billion by 2028. Within this market, tools that improve AI code quality and safety represent a growing subsegment as enterprises move past initial adoption and confront the maintenance challenges of AI-generated code.

What This Means for Development Teams

For engineering teams currently using or evaluating AI coding agents, the problem Mainline highlights deserves immediate attention regardless of whether they adopt this specific tool. Several practical steps can mitigate context-blind AI modifications:

  • Audit your repository for 'ghost artifacts' — dead code, abandoned experiments, and half-finished features that could mislead AI agents
  • Establish a lightweight decision log — even a simple markdown file tracking 'what we decided not to do and why' provides valuable signal
  • Configure agent constraints proactively — use AGENTS.md files as a starting point, even if they require ongoing maintenance
  • Review AI-generated code with 'context skepticism' — ask not just 'is this code correct?' but 'does this code reflect our current architectural direction?'
  • Clean up dead code aggressively — the presence of abandoned code is one of the strongest misleading signals for AI agents

The fundamental insight from the Mainline project is that code quality for AI agents isn't just about the code that exists — it's about making the absence of code equally informative. When a team decides to remove a technology from their stack, the removal itself needs to be documented as clearly as the original addition.

Looking Ahead: Context-Aware AI Development

The challenge Mainline addresses will likely intensify as AI agents grow more capable and autonomous. The industry is moving toward agents that can operate across entire codebases for hours at a time, making dozens of interconnected changes. In this future, the cost of missing context becomes exponentially higher.

Several trends suggest this problem space will attract significant attention in 2025 and beyond. First, enterprise adoption of AI coding agents is accelerating, bringing these issues to larger and more complex codebases. Second, regulatory pressure around AI-generated code — particularly in healthcare, finance, and defense — will demand better provenance and decision tracking. Third, the sheer volume of AI-generated code in repositories will create a feedback loop where agents increasingly build on top of other agents' potentially flawed work.

Mainline is currently seeking beta testers, particularly engineers working on teams that have experienced the 'reasonable but wrong' pattern with AI coding tools. The tool represents an early attempt to solve what may become one of the defining challenges of the AI-assisted development era: ensuring that artificial intelligence doesn't just understand what code does, but understands why the team chose to write it that way — and what they deliberately chose not to build.