📑 Table of Contents

US Gets Pre-Release AI Access From 5 Major Labs

📅 · 📁 Industry · 👁 7 views · ⏱️ 12 min read
💡 The US Department of Commerce secures agreements with Anthropic, OpenAI, Google DeepMind, Microsoft, and xAI for national security AI testing.

The US Department of Commerce has secured pre-release access to frontier AI models from all 5 major American AI labs, marking the most comprehensive government-industry safety testing arrangement in the history of artificial intelligence. Google DeepMind, Microsoft, and xAI have now joined Anthropic and OpenAI in signing formal agreements with the Center for AI Standards and Innovation (CASI), granting federal researchers the ability to probe unreleased models — with reduced safety guardrails — inside classified government environments.

The move represents a dramatic escalation in Washington's efforts to stay ahead of both AI-related cybersecurity threats and an intensifying technology race with China. It also signals a new phase of cooperation between Silicon Valley and the national security establishment that would have been nearly unthinkable just a few years ago.

Key Takeaways

  • 5 companies — Anthropic, OpenAI, Google DeepMind, Microsoft, and xAI — now provide pre-release model access to the US government
  • Testing occurs in classified environments with models that have reduced safety guardrails
  • The program is managed through CASI, which operates under the National Institute of Standards and Technology (NIST)
  • The agreements are driven by growing cybersecurity risks and the US-China AI competition
  • Models are evaluated before public release, giving the government an unprecedented early-warning system
  • This marks the first time all major US AI developers have simultaneously participated in government safety testing

Five Labs, One National Security Mission

The expansion from 2 participating labs to 5 did not happen overnight. Anthropic and OpenAI were the first to sign agreements with CASI, establishing the template for what government access would look like in practice. Their early participation gave federal researchers initial experience in evaluating frontier models for potential national security risks.

Google DeepMind, Microsoft, and Elon Musk's xAI followed in a second wave of agreements. The inclusion of these 3 additional companies means that virtually every cutting-edge large language model developed in the United States now passes through government hands before reaching the public.

This is significant because each lab brings different architectural approaches and capability profiles. Google DeepMind's Gemini models, Microsoft's partnership-derived systems, and xAI's Grok series each present unique risk surfaces that require independent evaluation. By testing all of them, CASI can build a comprehensive picture of where frontier AI capabilities — and dangers — actually stand.

What Happens Inside Classified Testing

The most striking aspect of these agreements is the testing environment itself. The AI companies provide versions of their models with reduced safety guardrails, essentially stripping away the alignment layers and content filters that normally prevent models from generating harmful outputs.

This is by design. Government researchers need to understand what these models are truly capable of — not what they do when constrained by commercial safety wrappers. Testing areas likely include:

  • Cybersecurity vulnerability discovery: Can the model identify or exploit zero-day vulnerabilities?
  • Biological and chemical weapons knowledge: Does the model provide actionable synthesis pathways?
  • Disinformation generation: How effectively can it produce convincing propaganda at scale?
  • Critical infrastructure attacks: Can it assist in planning attacks on power grids, water systems, or communications networks?
  • Persuasion and manipulation: How capable is the model at social engineering or psychological influence operations?

By conducting these evaluations in classified settings, the government ensures that findings about model vulnerabilities do not themselves become a security risk. The results stay behind closed doors, informing both regulatory decisions and national defense strategies.

The China Factor Driving Urgency

No discussion of this program is complete without addressing the geopolitical backdrop. The US-China AI race has accelerated sharply over the past 18 months, with Chinese labs like DeepSeek, Baidu, and Alibaba's Qwen team releasing increasingly competitive models. DeepSeek's R1 reasoning model, released in early 2025, sent shockwaves through the American AI industry by demonstrating near-frontier performance at a fraction of the training cost.

Washington's concern is twofold. First, Chinese-developed AI systems may contain capabilities that pose direct threats to US national security. Second, American AI models — if they contain exploitable weaknesses — could be reverse-engineered or manipulated by foreign adversaries.

The pre-release testing program addresses the second concern directly. By identifying dangerous capabilities or vulnerabilities before models reach the open market, the government creates a buffer zone between development and deployment. This is particularly important given that many frontier models are eventually open-sourced or made available through APIs that anyone in the world can access.

Compared to China's approach — where companies like Baidu must submit models for government review before launch under the Interim Measures for Generative AI regulations — the US program is voluntary rather than mandatory. But the practical effect is increasingly similar: government eyes on frontier AI before it reaches the public.

Industry Implications: Cooperation or Coercion?

The voluntary nature of these agreements raises interesting questions about the dynamics at play. On the surface, the 5 labs are cooperating willingly. In practice, the incentive structure is more complex.

AI companies depend on government contracts worth billions of dollars. Microsoft alone holds cloud computing agreements with the Department of Defense and intelligence agencies valued in the tens of billions. Google and Amazon (through AWS) compete fiercely for similar contracts. Refusing to participate in CASI testing could jeopardize these lucrative relationships.

There is also a regulatory dimension. The AI industry has been lobbying aggressively against binding federal AI legislation, preferring voluntary commitments and self-regulation. Participating in government safety testing helps companies argue that formal regulation is unnecessary — they are already cooperating with the government on safety.

For xAI, the calculus may be slightly different. Elon Musk has been publicly critical of AI safety regulations while simultaneously positioning himself as a government ally through his work with DOGE and his relationship with the Trump administration. Signing a CASI agreement allows xAI to demonstrate good faith on safety without accepting external regulatory constraints.

What This Means for Developers and Businesses

For the broader AI ecosystem, these agreements have several practical implications:

  • Delayed releases are possible: If government testing reveals critical vulnerabilities, companies may face pressure to delay model launches or modify capabilities before release
  • Safety standards may tighten: Findings from classified testing could inform future safety benchmarks that all AI companies — not just the big 5 — will be expected to meet
  • Enterprise customers gain confidence: Businesses deploying frontier AI models can take some comfort in knowing these systems have undergone government-level security evaluation
  • Open-source models face scrutiny: The gap between tested commercial models and untested open-source alternatives may become a policy flashpoint
  • International precedent: Other Western governments, particularly in the EU and UK, are likely to seek similar arrangements with AI labs operating in their jurisdictions

The UK AI Safety Institute has already conducted its own evaluations of frontier models, and the EU AI Act establishes a framework for high-risk AI assessment. The US program adds another layer to an increasingly complex global governance landscape.

Looking Ahead: From Testing to Regulation

The big question is whether voluntary pre-release testing evolves into something more binding. The current arrangement depends on the goodwill of 5 companies. If a new lab emerges — or if an existing participant decides to withdraw — the system has no enforcement mechanism.

Several scenarios could change this dynamic in the coming months:

First, if government testing reveals a genuinely dangerous capability that a company refuses to mitigate, the political pressure for mandatory review could become irresistible. Second, as AI models grow more powerful — with GPT-5, Gemini 2.5, and Claude 4 all expected in 2025 — the stakes of pre-release testing will only increase.

Third, the 2025-2026 congressional calendar includes multiple AI-related bills that could codify testing requirements into law. The bipartisan interest in AI safety, combined with national security framing, makes this one of the few tech policy areas where legislation could actually advance.

For now, the CASI program represents a pragmatic middle ground: meaningful government oversight without the bureaucratic overhead of formal regulation. Whether that balance holds as AI capabilities continue to accelerate remains the central question in American AI governance.

The 5-lab testing framework is not just a safety measure. It is the foundation of what could become the United States' primary mechanism for managing the most powerful technology of the 21st century.