Two Paths to Gigawatt Machines

The Gigawatt Machine: NVIDIA, Google, and the Engineering of Scale 1/12

Dec 03, 2025

1. Introduction

When AI Labs like OpenAI, Microsoft, Meta, and xAI announce gigawatt-scale data centers such as “Stargate,” “Fairwater,” “Prometheus,” and “Colossus,” they are describing a new class of computing: the AI Supercomputer.

These are not just large data centers; they are city-sized machines, purpose-built with hundreds of thousands of accelerators to train a single, massive AI model.

But what does an AI Supercomputer look like? There is no single answer. The industry has split into two competing philosophies on how to address the physics of scaling, driving two completely different physical anatomies.

Strategy A (NVIDIA): “Scale-Up the Node.” Increase the density of the chip and compress the computer into a dense monolith to make the wires shorter.
Strategy B (Google): “Scale-Out the Fabric.” Connect thousands of simpler chips into a massive, uniform optical mesh.

This 12-part series is a comprehensive guide to deconstructing these systems.

We will primarily examine the Google (TPU) and NVIDIA (GPU) ecosystem. This article explores the fundamental engineering trade-offs that define these two anatomies.

2. The Physics of the Problem: Density vs. Fabric

To train a trillion-parameter model, you must distribute the workload across thousands of chips. However, sending data between chips takes time (latency). To train faster, you must minimize this latency. Engineers have two levers to pull:

Lever 1: Extreme Density (The NVIDIA Strategy)

If you can pack more compute power into a smaller physical space, the electrons don’t have to travel as far.

The Tactic: NVIDIA pushed the limits of physics to increase per-chip density. By moving to 4-bit precision (FP4), the Blackwell B200 GPU delivers a massive 9x generational leap in performance.
The Consequence: A chip this powerful becomes a “gravity well” that demands massive amounts of data instantly. It requires a hierarchical, ultra-fast network to feed it.

Lever 2: Massive Fabric (The Google Strategy)

If you cannot compress the computer, you must build a faster network.

The Tactic: Historically, Google accepted lower per-chip density (using standard precision), choosing instead to spread the workload across a much larger physical footprint.
The Consequence: To make this “power sprawl” work, they built a massive, flat, optical network (ICI) that connects thousands of chips directly, making the distance between them virtually irrelevant.

3. Anatomy A: The “Super-Node” (NVIDIA)

NVIDIA’s pursuit of extreme density created the Hierarchical Architecture. The defining characteristic is the “Scale-Up Node”—a system where the atomic unit of the data center is no longer a server, but an entire rack.

The Physical Node: The GB200 NVL72

This is the new building block. It is not a server; it is a 120 kW, liquid-cooled, 72-GPU rack.

The Density: By compressing 72 GPUs into a single cabinet, NVIDIA keeps them close enough to connect via Copper Cables. This passive copper backplane saves ~20kW of power per rack compared to optical transceivers.
The Topology: The rack functions as one single, massive accelerator with a 31 TB unified memory pool.

The Network: Two-Tier Hierarchy

Because the node is so dense, the network must be hierarchical.

Tier 1 (Intra-Pod): The NVLink Fabric. A proprietary, copper-based, packet-switched network that connects up to 576 GPUs (8 racks) into a single “Pod” (NVLink Domain). Inside this Pod, bandwidth is massive (1.8 TB/s), allowing for efficient Tensor Parallelism (splitting a model across chips).
Tier 2 (Inter-Pod): The Scale-Out Fabric. To build a supercomputer (like “Stargate”), you connect hundreds of these Pods using a standard InfiniBand or Ethernet network.

The Look: A dense forest of “monoliths.” Extremely tall, heavy (3,700 lbs), liquid-cooled cabinets that require reinforced concrete floors and industrial-scale plumbing.

4. Anatomy B: The “Flat-Mesh” (Google)

Google and Amazon’s pursuit of fabric scale created the Flat-Fabric Architecture. The defining characteristic is the “Mesh”—a massive, continuous grid of accelerators.

The Physical Node: Abstracted

The physical node (a server tray with 4-8 chips) is small and architecturally irrelevant. The true unit of scale is the Pod.

The Network: Single-Tier Mesh

Instead of a hierarchy, the system is designed as one massive, uniform web.

Tier 1 (Intra-Pod): The ICI Fabric. Google uses Optical Circuit Switches (OCS) to connect 8,960 TPUs (in a v5p Pod) into a 3D Torus Mesh.
The Difference: In this mesh, every chip connects directly to its neighbors using optical fibers. There is no central switch. The “Pod” is 15x larger than NVIDIA’s, meaning massive workloads can run without ever hitting a slower Tier-2 network.

The Look: A sprawling “field.” Rows and rows of standard-height racks connected by a visible canopy of yellow optical fibers (the OCS fabric).

5. The “Brains” of the Anatomy: Software Stacks

The software must match the shape of the hardware. The divergence in network topology (Hierarchy vs. Mesh) forces a divergence in software strategy.

NVIDIA: Dynamic Runtime Orchestration (Library-Based)

NVIDIA’s network is packet-Switched and dynamic. Data traffic is unpredictable and bursty.

The Software: CUDA / cuDNN.
The Strategy: Runtime Flexibility. NVIDIA uses a “toolbox” of pre-compiled libraries. When congestion happens, smart switches and software (NCCL) adapt in real-time, routing packets around traffic jams. This “eager execution” model offers maximum flexibility for researchers.

Google: Deterministic Static Scheduling (Compiler-Based)

Google’s network is Circuit-Switched and static. The OCS mirrors must be physically pointed to the right destination.

The Software: XLA (Compiler).
The Strategy: Compile-Time Scheduling. The XLA compiler analyzes the entire AI model before it runs. It pre-calculates the exact path of every data packet and orchestrates a perfect, collision-free flow. It doesn’t react to traffic; it prevents it. This offers maximum efficiency for known, massive workloads.

6. The Industrial Reality: Gigawatt Infrastructure

Regardless of the anatomy chosen, the sheer scale of these systems has forced a transition from “Data Center” to “Industrial Plant.”

Power: A 100,000-chip cluster consumes Gigawatts of power—roughly the output of a nuclear reactor powering a city. The primary engineering challenge shifts from IT administration to grid-scale energy logistics.
Cooling: Managing 120kW per rack (NVIDIA) or massive mesh density (Google) makes air cooling physically impossible. The facility becomes a massive hydraulic system, circulating millions of gallons of coolant to manage thermal loads.

7. Conclusion: The Asymmetric Shift

The story of the next generation is one of divergence: NVIDIA stays the course, while Google pivots.

NVIDIA has remained consistent: build the most powerful, dense node possible, and then arrange those nodes in a hierarchy.

The Evolution: They haven’t changed their philosophy; they’ve just scaled the physics. They went from an 8-GPU node (DGX) to a 72-GPU node (GB200 NVL72), creating a “Super-Node” that is 9x more powerful.² They accept the complexity of a two-tier network (NVLink + InfiniBand) as the necessary cost of this extreme density.

Google, however, has altered its silicon strategy.

The Old Way: For years, Google relied on a “Fabric-First” approach—using massive meshes of moderately powerful chips (TPU v4/v5).
The Pivot: With the TPU v7 (”Ironwood”), Google effectively admitted that fabric scale alone is no longer sufficient. By driving a 10x leap in per-chip performance, they are also chasing density now, attempting to combine both strategies: NVIDIA-class per-chip density deployed on a Google-class flat optical mesh.

As we enter the Gigawatt era, the architectural battle lines are drawn.

NVIDIA bets that the hierarchical super-node (the 120kW Rack) is the ultimate building block.
Google bets that a dense flat mesh (9,000+ high-power chips) can eliminate the hierarchy entirely.

In the next article, “Article 2: Two Silicon Foundations for Scale,” we will zoom into the silicon die itself, exploring the engines that power these competing visions.

The Gigawatt Machine

Discussion about this post

Ready for more?