NVIDIA Blackwell Ultra GPUs Ship to Cloud Giants
NVIDIA has officially begun shipping its Blackwell Ultra GPU accelerators to major cloud providers, marking a pivotal moment in the race to build next-generation AI infrastructure. The B300-series chips — which deliver up to 1.5x the inference performance of the standard Blackwell B200 — are now en route to data centers operated by Microsoft Azure, Google Cloud, Amazon Web Services, and Oracle Cloud Infrastructure.
This rollout positions NVIDIA to capture an even larger share of the estimated $150 billion AI accelerator market in 2025, as hyperscalers scramble to deploy hardware capable of training and serving increasingly massive foundation models.
Key Facts at a Glance
- Blackwell Ultra B300 GPUs deliver up to 1.5x inference throughput compared to the B200, NVIDIAs previous flagship
- The chips feature 288 GB of HBM3e memory per GPU, a 50% increase over the B200's 192 GB
- Major recipients include Microsoft, Google, Amazon, Meta, and Oracle
- NVIDIA expects Blackwell Ultra to generate billions in revenue during Q3 and Q4 of fiscal year 2026
- The GB300 NVL72 rack-scale system packs 72 Blackwell Ultra GPUs into a single liquid-cooled enclosure
- Shipments arrive roughly 6 months ahead of the anticipated Rubin architecture, expected in 2026
Blackwell Ultra Delivers a Generational Leap in AI Memory
The most significant upgrade in the Blackwell Ultra lineup is its HBM3e memory capacity. Each B300 GPU ships with 288 GB of high-bandwidth memory, up from 192 GB on the standard B200. This 50% memory boost is not merely incremental — it fundamentally changes what a single GPU can accomplish.
Larger memory means AI models with hundreds of billions of parameters can reside on fewer chips. For cloud providers running inference workloads at scale, this translates directly into lower cost per query and higher throughput. A single GB300 NVL72 rack now delivers over 20 TB of aggregate HBM3e memory, enough to host even the largest commercially deployed language models without complex model parallelism strategies.
NVIDIA CEO Jensen Huang has repeatedly emphasized that inference — not training — will drive the majority of AI compute demand going forward. The Blackwell Ultra architecture appears purpose-built for this reality, with enhanced FP4 and FP8 precision support that accelerates token generation for large language models like GPT-class systems and Google's Gemini family.
Cloud Providers Race to Deploy First
The competitive dynamics among hyperscalers are intensifying. Microsoft Azure is expected to be among the first to offer Blackwell Ultra instances to enterprise customers, likely through its existing ND-series GPU virtual machines. The company has deepened its NVIDIA partnership significantly, driven in part by its need to power OpenAI's increasingly compute-hungry models.
Google Cloud is taking a dual-track approach, deploying Blackwell Ultra alongside its own TPU v6 (Trillium) chips. This gives Google customers the flexibility to choose between NVIDIA's CUDA ecosystem and Google's proprietary JAX/XLA stack. Industry analysts expect Google to price Blackwell Ultra instances competitively to attract workloads that are already optimized for CUDA.
Amazon Web Services, the world's largest cloud provider by revenue, plans to integrate Blackwell Ultra into its P-series EC2 instances. AWS also continues to invest in its custom Trainium2 chips, but NVIDIA GPUs remain the default choice for the vast majority of AI startups and enterprises building on the platform.
Other confirmed recipients include:
- Oracle Cloud Infrastructure (OCI) — which has emerged as a surprising AI infrastructure contender
- Meta — for internal model training, including Llama 4 and beyond
- CoreWeave — the NVIDIA-backed GPU cloud startup valued at over $35 billion
- Lambda Labs — a specialized AI cloud provider popular with researchers
Technical Breakdown: B300 vs. B200 vs. H100
Understanding where Blackwell Ultra sits in NVIDIA's product stack is essential for developers and infrastructure architects making purchasing decisions.
| Specification | H100 (Hopper) | B200 (Blackwell) | B300 (Blackwell Ultra) |
|---|---|---|---|
| HBM Memory | 80 GB HBM3 | 192 GB HBM3e | 288 GB HBM3e |
| FP8 Performance | 3,958 TFLOPS | ~9,000 TFLOPS | ~13,500 TFLOPS (est.) |
| FP4 Support | No | Yes | Yes (enhanced) |
| NVLink Bandwidth | 900 GB/s | 1,800 GB/s | 1,800 GB/s |
| TDP | 700W | 1,000W | 1,200W |
The B300 represents a roughly 3.4x improvement in FP8 compute over the H100, which remains the workhorse GPU in most production AI data centers today. For organizations still running H100 clusters, the upgrade path to Blackwell Ultra offers dramatic efficiency gains — potentially reducing the number of GPUs needed for a given workload by 60% or more.
Power consumption is a growing concern, however. Each B300 draws approximately 1,200 watts, requiring advanced liquid cooling infrastructure that many existing data centers lack. This has sparked a parallel boom in data center construction and retrofit projects, with companies like Equinix, Digital Realty, and Vantage Data Centers racing to build liquid-cooled facilities.
Pricing Signals and Supply Constraints
NVIDIA has not publicly disclosed per-unit pricing for the B300, but industry sources suggest the GB300 NVL72 rack — the primary configuration for cloud deployments — carries a price tag between $2.5 million and $3.5 million. This represents a premium of roughly 20-30% over the equivalent B200-based GB200 NVL72 system.
Supply remains a critical bottleneck. TSMC, which fabricates NVIDIA's chips on its advanced 4NP process node, is reportedly allocating the majority of its CoWoS advanced packaging capacity to NVIDIA. Despite this prioritization, demand continues to outstrip supply.
Key supply chain dynamics include:
- TSMC CoWoS capacity expanded by roughly 60% in 2025, but remains insufficient for total market demand
- SK Hynix and Micron are the sole suppliers of HBM3e memory, with SK Hynix holding an estimated 70% market share
- Lead times for Blackwell Ultra systems are currently estimated at 12-16 weeks for priority customers
- Smaller cloud providers and enterprises may face 6-9 month wait times for allocation
This scarcity has created a secondary market dynamic where GPU cloud providers with early access to Blackwell Ultra can command premium pricing, reinforcing the competitive advantage of hyperscalers who placed orders early.
What This Means for Developers and Businesses
For AI developers, Blackwell Ultra's expanded memory is the headline feature. Models that previously required tensor parallelism across 4 or 8 GPUs can now fit on 2, simplifying deployment and reducing inter-GPU communication overhead. This is particularly impactful for mixture-of-experts (MoE) architectures like those used in GPT-4 and Mixtral, where expert routing benefits from having more model parameters accessible in local memory.
For enterprise buyers evaluating cloud GPU instances, the key question is cost-efficiency. Early benchmarks suggest Blackwell Ultra delivers approximately 40-50% better performance per dollar compared to H100 instances for inference workloads. However, training workloads see more modest gains of 20-30%, since training is more compute-bound than memory-bound.
For startups building AI-native products, the availability of Blackwell Ultra on major cloud platforms lowers the barrier to serving large models in production. A startup can now serve a 70-billion parameter model with sub-100ms latency using a fraction of the GPU count that was required just 18 months ago.
The Broader AI Infrastructure Arms Race
Blackwell Ultra's shipment comes at a time when global AI infrastructure investment is accelerating at an unprecedented pace. Capital expenditure on AI data centers across the top 5 hyperscalers is projected to exceed $250 billion in 2025, according to estimates from Bank of America and Morgan Stanley.
NVIDIA's dominance in this market remains formidable, but competition is emerging on multiple fronts. AMD's MI350X, expected later in 2025, promises competitive performance with 288 GB of HBM3e — matching Blackwell Ultra's memory capacity. Intel's Falcon Shores GPU is also on the horizon, though it has faced repeated delays.
Custom silicon from cloud providers adds another dimension. Google's TPU v6, Amazon's Trainium2, and Microsoft's Maia 100 chip all aim to reduce dependence on NVIDIA for specific workloads. Yet NVIDIA's CUDA software ecosystem — with over 4 million developers — creates a powerful moat that custom chips have struggled to breach.
Looking Ahead: Rubin Looms on the Horizon
Even as Blackwell Ultra begins its commercial ramp, NVIDIA is already telegraphing its next move. The Rubin architecture, expected to arrive in the second half of 2026, will feature next-generation HBM4 memory and a new GPU microarchitecture built on TSMC's 3nm process.
Jensen Huang has described NVIDIA's roadmap as a 'one-year rhythm,' with major new architectures arriving annually. This cadence puts enormous pressure on cloud providers to continuously refresh their fleets — and on competitors to keep pace.
For now, Blackwell Ultra represents the cutting edge of commercially available AI compute. Its shipment to hyperscalers signals that the next wave of AI capabilities — from more responsive chatbots to real-time video generation — is being built on infrastructure that is shipping today. The companies that secure early access to these chips will have a meaningful head start in the increasingly competitive AI services market.
The message from NVIDIA is clear: the AI infrastructure buildout is far from over, and the company intends to remain at its center.
📌 Source: GogoAI News (www.gogoai.xin)
🔗 Original: https://www.gogoai.xin/article/nvidia-blackwell-ultra-gpus-ship-to-cloud-giants
⚠️ Please credit GogoAI when republishing.