The sinews of war: compute, chips and data centers

8.1Why AI devours compute

8.2GPUs, TPUs and specialized chips

Its dominance does not rest on silicon alone: it also relies on CUDA, the software layer that the entire AI ecosystem uses to program its chips, a competitive "moat" that is hard to cross. At CES in January 2026, NVIDIA detailed its new Vera Rubin generation (succeeding the Blackwell architecture): each Rubin GPU delivers around 50 petaFLOP in FP4 precision, carries ultra-fast HBM4 memory, and is assembled into "NVL72" racks combining 72 GPUs and 36 Vera processors. The next generation, "Feynman," has already been announced.

In context

NVIDIA's competitors in inference

NVIDIA's near-monopoly (Section 8.3) does not prevent the emergence of challengers betting on a different architecture, optimized not for training but for inference (running an already-trained model, as fast and as cheaply as possible). Groq has designed a specialized chip (which it calls an LPU, for Language Processing Unit) banking on very low latency to generate text almost instantly. Cerebras takes the opposite tack from miniaturization with a giant chip the size of an entire silicon wafer (wafer-scale), as large as a dinner plate, to avoid the slowdowns of chip-to-chip communication. SambaNova offers a reconfigurable architecture sold turnkey to businesses. These players remain marginal against NVIDIA for training, but the explosion of inference (Section 8.1), driven by reasoning models and agents, opens a real niche for them, alongside the in-house chips of the cloud giants (Google's TPUs, Amazon's Trainium).

In context

Moore's law and the race to miniaturize

For half a century, the progress of chips followed Moore's law: the number of transistors per chip roughly doubled every two years, at constant cost, by etching them ever smaller. That is the meaning of "nanometers" (the etching fineness: 5 nm, 3 nm, 2 nm); the smaller the figure, the more transistors — hence power — packed onto the same surface. But this pace is slowing: at the scale of a few atoms, physics (current leakage, heat, the astronomical cost of fabs) makes each gain harder and more expensive. Hence a twofold shift. On one hand, the bet is no longer on fineness alone, but on specialized chips (GPUs, TPUs, accelerators) and on advanced packaging (stacking and linking several pieces of silicon, see the next section). On the other hand, most of the gains now come as much from software and architecture as from the transistor itself. Miniaturization is not dead, but it has ceased to be the sole engine of progress.

In context

Neuromorphic chips and spiking networks

Beyond GPUs and TPUs, one line of research draws directly on the brain: neuromorphic computing. Rather than separating memory and computation (the classic so-called von Neumann architecture, whose constant shuttling between the two is a bottleneck), these chips bring them closer together, like biological neurons. They often rely on spiking neural networks: instead of computing continuously, neurons emit a signal only when they are activated, as in the brain, which promises a radically lower energy consumption. Prototypes exist (at major manufacturers and in laboratories), but the technology remains emerging: it runs into the difficulty of programming and training these chips with the usual methods. Its potential value is immense for edge AI (sensors, connected objects, robots) running on batteries, where frugality matters more than raw power. It is one of the long-term hardware bets for breaking out of the all-GPU paradigm.

In context

Beyond classic silicon (optical, analog, reservoir)

Several hardware bets explore radically different principles. Optical computing (or photonics) replaces electrons with photons: light passes through components that perform certain operations (notably matrix multiplications, ubiquitous in neural networks) at the speed of light and with very little heat; startups are working on it, but integration remains difficult. Analog computing gives up the all-digital approach: instead of coding in 0s and 1s, it lets continuous physical quantities (a voltage, a current) directly represent the numbers and perform the computation "through physics," which can be very energy-frugal, at the cost of lower precision. Reservoir computing, finally, is an appealing trick: a large random recurrent network (the "reservoir") is frozen and only a thin readout layer at the output is trained, which makes learning very cheap; better still, the reservoir can be any physical system (optical, even hydraulic), which ties back to the two previous avenues. None of these paths threatens the GPU in the short term, but all of them remind us that AI's computation is not condemned to remain forever electronic and digital.

In context

Biological computing and the energy wall

The contrast is striking: the human brain achieves general intelligence on about 20 watts (Chapter 2), whereas training a single large model is measured in gigawatt-hours and a single data-center GPU consumes thousands of watts (Chapter 10). This efficiency gap feeds a path even more radical than the previous ones: biological computing (or wetware), which has computation performed not by silicon but by genuine living neurons. Two players embody it. The Australian startup Cortical Labs first taught a culture of human neurons to play Pong (2022), faster than a classic algorithm in equal real time, then in 2025 launched the CL1, billed as the first commercial biological computer: about 800,000 neurons cultured on a chip, kept alive for several months by a built-in life-support system, on about 30 watts. In early 2026, these neurons learned to play Doom (as clumsy beginners), and the company opened the first biological data centers. The Swiss firm FinalSpark, for its part, gives online access to brain organoids and claims an energy efficiency up to a million times greater than silicon (a company figure, not verified at scale). All of this differs from the neuromorphic computing seen above, which mimics the brain in silicon: here, the substrate is genuinely alive.

How do you train living neurons? The process has nothing to do with training an AI in silicon: no backpropagation and no gradient descent (Chapter 2). The neurons are cultured on a grid of electrodes that serves as both their senses and their muscle, able to stimulate them and read their activity. For Pong, the ball's position is translated into electrical stimulations; then, depending on whether the neurons return the right activity (the paddle intercepts the ball) or not, they are sent back a signal that is either regular and predictable, or chaotic noise. Following the so-called free energy principle (formulated by neuroscientist Karl Friston), a neural network seeks to minimize the unpredictability of what it perceives: the cultures therefore reorganize spontaneously to escape the chaos — that is, to play better — and manage it within a few minutes. Nothing is programmed: an environment is shaped, and the tissue adapts to it on its own.

Two major reservations then impose themselves. First, maturity: these systems are very slow, tiny in capacity, and the cultures survive only a few months; several leading neuroscientists judge the idea of competing with silicon this way to be premature, even doomed to fail.

The ethical vertigo. The second reservation is deeper (Chapter 23). The horizon embraced by some is to replace, one day, the artificial neurons used to build AI (mere numbers in a matrix) with genuine biological neurons, far more energy-efficient. But this horizon opens an abyss: as these cultures grew larger, could they develop a form of consciousness, even of suffering? No one today knows how to define or measure the consciousness of such a system, and it is precisely this fuzziness that worries (when the word "sentience" was used about Pong, dozens of researchers published a rebuttal). Three concrete questions follow. Consent first: these neurons come from the cells of human donors, who did not necessarily agree to become a thinking computer. Moral status next: if such a culture could feel anything at all, would we have the right to exploit it, and to switch it off? Cutting the power to a server is trivial; unplugging a potentially sentient brain tissue would no longer be. Regulation, finally: in the absence of scientific consensus, researchers are already calling for safeguards inspired by the ethics committees of animal research. At this stage, then, biological computing is less a credible alternative to chips than a fascinating and uncomfortable field of research, to be followed with as much caution as curiosity.

8.3The semiconductor value chain

Diagram8.1. The AI chip value chain. NVIDIA designs the GPUs but does not manufacture them (the "fabless" model): it is the Taiwanese firm TSMC that etches them, using machines that only the Dutch company ASML knows how to produce. Each link is a mandatory chokepoint, and therefore a point of vulnerability.

ASML (Netherlands) holds a global monopoly on extreme ultraviolet (EUV) photolithography machines, the only ones capable of etching the finest circuits. Without ASML, no cutting-edge chips.
TSMC (Taiwan) manufactures about 90% of the most advanced chips in the world (etched at 3 nanometers and below) and holds two-thirds of the global foundry market. Its geographic concentration in Taiwan makes it a nerve center of the world economy.
NVIDIA (United States) designs the GPUs but does not manufacture them itself (the so-called "fabless" model): it entrusts their etching to TSMC.

In context

The very shape of the transistor (from FinFET to GAA)

Continuing to miniaturize (Section 8.2) has forced a rethinking of the shape of the transistor, the chip's elementary switch. For decades, transistors were "flat" (planar). Around 2011, the industry moved to the FinFET, where the current channel stands up like a small "fin" that the gate wraps on three sides, to better control current leakage at very small scales. At the most advanced nodes (3 and 2 nanometers), a new architecture takes over, the Gate-All-Around (GAA, sometimes called "nanosheet") transistor: this time the gate surrounds the channel on all four sides, offering even better control. These developments, invisible to the user, are what allow foundries like TSMC and Samsung to extend Moore's law as mere size reduction reaches its physical limits. The race is therefore no longer only about fineness, but about geometric ingenuity.

8.4Mega-data centers

Worldwide, the power dedicated to data centers is estimated to reach about 132 GW in 2026, and it is estimated that about 10 GW of new AI compute capacity (i.e., 13 to 15 million accelerators) will be added in that single year.

In context

The consumer backlash

This rush has a very concrete downside for the consumer. Because AI accelerators are far more lucrative, NVIDIA and AMD are redirecting their production toward data centers, relegating video game graphics cards to the status of a secondary market: postponed launches (NVIDIA's RTX 50 SUPER series, AMD's RDNA 5 architecture pushed back), sharply rising prices (sometimes more than 70% above launch price for the high end) and degraded availability. The main cause is a global memory shortage: snapped up by AI servers (notably ultra-fast HBM memory), main memory (DRAM) saw its prices jump by about 90% in the first quarter of 2026 according to TrendForce, the steepest quarterly rise ever recorded. Analysts anticipate a 10 to 20% rise in consumer electronics prices in 2026 and a contraction of the PC market. The appetite of the "AI factories" thus ripples all the way down to the consumer's bill, gamers and computer buyers first (the case of Macs is detailed in Chapter 9).

In context

Data centers in space?

As AI factories run up against limits of land, water and, above all, electricity on Earth (Chapter 10), an idea long dismissed as far-fetched has become a serious topic: placing compute in orbit. The argument is twofold. First, energy: in a well-chosen orbit (sun-synchronous, sunlit almost without interruption), a solar panel produces several times more than on Earth, with no night and no weather. Second, cooling: the vacuum of space allows heat to be evacuated by radiation, without water. All of it without consuming land or weighing on terrestrial power grids. The first milestones are concrete: in November 2025, the startup Starcloud placed an NVIDIA H100 GPU in orbit, the first data-center-class processor in space, and ran a language model on it; around the same time, Google unveiled Project Suncatcher, which aims to deploy its TPU chips on constellations of solar satellites linked by optical links, with two test satellites announced for early 2027. SpaceX and other players are also interested. But the obstacles remain considerable: launch cost (which must fall by an order of magnitude), heat dissipation (paradoxically difficult in a vacuum, for lack of air for convection), radiation resistance, the impossibility of repairing the hardware, and the cluttering of orbit by debris. At this stage, then, it is a frontier bet, made of promising demonstrators rather than large-scale installations. Not to be confused with AI in the service of space, covered in Chapter 14.

8.5The geopolitics of chips

The 2025-2026 sequence illustrates an eventful game of chess. After the 2025 repeal of the regulatory framework inherited from the previous administration (creating a period of looser control during which hundreds of thousands of chips reportedly transited through third countries), the US administration tightened the rules in late May 2026: any sale of advanced accelerators (NVIDIA's Blackwell and Rubin lines, AMD's MI350x) to a foreign subsidiary of a Chinese company now requires a license. In parallel, the saga of the H200 chip (authorized, then blocked at times by Washington, at times by Beijing, which is pushing toward self-sufficiency) led NVIDIA to reallocate its capacity at TSMC toward the new Vera Rubin generation. More recently, in mid-2026, US pressure shifted higher up the chain, onto ASML itself, suspected by Washington of having let a cutting-edge machine reach China.

Key takeaways (Chapter 8)

AI is first of all a matter of compute: models are trained on tens of thousands of GPUs, and inference (each request) dominates the bill over time.
NVIDIA dominates thanks to its chips (the Vera Rubin generation in 2026) and above all its CUDA software; Google (TPU), AMD and the cloud giants are developing alternatives.
The value chain is ultra-concentrated: ASML (EUV machines, Netherlands), TSMC (manufacturing, Taiwan, ~90% of cutting-edge chips), NVIDIA (design, "fabless" model).
Mega-data centers are measured in gigawatts (Stargate 10 GW, Colossus, Hyperion); global capex exceeds 400 billion dollars a year.
One frontier path is to place data centers in orbit (near-continuous solar energy, cooling by radiation): first demonstrators in 2025 (an H100 GPU in orbit, Google's Suncatcher project), but major obstacles (launch cost, heat, radiation).
An even more radical path, biological computing (computing with living neurons, or "wetware"), draws on the brain's efficiency (about 20 W) but remains slow, tiny and laden with ethical questions.
The US-China "chip war," founded on a "chokepoint strategy," is fragmenting the world into two technological blocs.

Faced with this dependence on a few giants and their immense data centers, an alternative is gaining ground: running AI at home, with open models. That is the subject of Chapter 9.

8.1Why AI devours compute#

8.2GPUs, TPUs and specialized chips#

8.3The semiconductor value chain#

8.4Mega-data centers#

8.5The geopolitics of chips#

Key takeaways (Chapter 8)

8.1Why AI devours compute

8.2GPUs, TPUs and specialized chips

8.3The semiconductor value chain

8.4Mega-data centers

8.5The geopolitics of chips