Part I · FOUNDATIONS: UNDERSTANDING AI BEFORE THE LLMS

Learning from data: machine learning & deep learning

Chapter 220 min readUpdated: June 2026

2.1The paradigm shift: programming or learning

Machine learning (in French apprentissage automatique) inverts the logic. We no longer supply the rules: we supply examples (thousands of emails already labeled "spam" or "not spam"), and the machine discovers the rules itself that allow it to tell them apart. We no longer program the what to do; we program the how to learn.

Diagram2.1. The fundamental reversal. The machine no longer receives the rules: it learns them from examples. The product of this learning is called a model.

2.2Three ways to learn

Machine learning comes in three broad families, which must be carefully distinguished because they recur everywhere in what follows.

Diagram2.2. The three broad families of learning.

2.3The artificial neuron and networks

Diagram2.3. A "deep" neural network. Information flows from left to right, layer by layer. "Deep" simply means: having many hidden layers. That is where the term deep learning comes from.

2.4How a machine learns: cost and backpropagation

Through repetition, the error decreases, and the network becomes competent. The most telling image is that of a hike through fog to descend into a valley: you cannot see the bottom, but you feel the slope underfoot, and you take a step downward. By repeating, you eventually reach a low point. That "slope," in mathematics, is called the gradient, and the method is called gradient descent.

Diagram2.4. The learning loop. Repeated billions of times over enormous data sets, it transforms a random network into a competent model.

2.52012: the big bang of deep learning

Why 2012 and not before? Because the three missing fuels (chapter 1) were finally brought together:

  • Data: ImageNet provided the gigantic set of labeled images that had been missing.
  • Compute: AlexNet was trained on GPUs from the company NVIDIA. These chips, designed to compute the pixels of video games in parallel, turned out to be ideal for the massive multiplications of neural networks. This technical detail would have colossal geopolitical consequences: it would make NVIDIA one of the most valuable companies in the world (chapter 8).
  • Algorithms: refinements (the ReLU activation function, the dropout regularization technique) made it possible to train deeper networks without their going off the rails.

2.6Seeing and reading: CNNs and RNNs

2.7Representing meaning: embeddings

The brilliant trick: these numbers are learned in such a way that words close in meaning occupy nearby positions in the space. "Cat" and "dog" end up neighbors; "king" and "banana" are far apart. Meaning becomes geometry.

Better still: the directions of the space capture relationships. The example that has become famous (from the word2vec model, 2013) is almost magical:

king − man + woman ≈ queen

In other words, the vector linking "man" to "king" is roughly the same as the one linking "woman" to "queen." The machine discovered, all on its own and without being told, the abstract concept of royalty and that of gender, simply by observing how words are used across billions of sentences.

Diagram2.5. A fragment of a knowledge graph. Knowledge here is explicit and verifiable: each fact is a named relation between two entities, readable by a machine as well as by a human.

This is the modern form of the symbolic representation of knowledge (chapter 1), and it is what structures many search engines behind the scenes (their answer panels). Its strength is precision and traceability (you know where each fact comes from); its weakness, that it must be built and maintained by hand. Hence the growing interest in neuro-symbolic approaches, which marry the flexibility of neural networks with the rigor of graphs: an LLM can query a knowledge graph to anchor its answers in verified facts (a structured variant of retrieval-augmented generation, chapter 6), and thereby reduce its hallucinations.

2.8The three ingredients of modern AI

Diagram2.6. The fundamental triad. None of the three suffices on its own. It is their conjunction, from the 2010s onward, that made modern AI possible, and it is the race for these three resources that today structures the economics and geopolitics of the sector.

This triad illuminates the rest of the course:

  • The quest for data raises questions of intellectual property and privacy (chapters 21 and 25).
  • The quest for compute explains NVIDIA's valuation, the chip war, and the energy bill (chapters 8 and 10).
  • The quest for algorithms is the focus of the fierce competition between labs (chapter 7), and its next great leap, the Transformer, is the subject of the following chapter.

2.9The brain and the machine: a fruitful and misleading analogy


Key takeaways (chapter 2)

  • Machine learning reverses classical programming: we no longer supply the rules, we supply examples, and the machine learns the rules. The result is called a model.
  • Three families: supervised learning (with an answer key), unsupervised (without an answer key), reinforcement (trial and error).
  • A neural network stacks artificial neurons in layers; "deep" means "having many layers" (deep learning).
  • Learning happens through gradient descent and backpropagation: we measure the error, then correct each weight by a small step to reduce it.
  • 2012 (AlexNet/ImageNet) marks the big bang of deep learning, made possible by the conjunction of data + GPUs + algorithms.
  • Embeddings transform meaning into geometry: this is the conceptual bridge to large language models.
  • Every modern AI rests on a triad: data, compute, algorithms.

We are now ready to cross the threshold. In chapter 3, we tell the story of the 2017 innovation that broke open the locks of language and gave birth to the era of large models: the Transformer.