Part I · FOUNDATIONS: UNDERSTANDING AI BEFORE THE LLMS
Learning from data: machine learning & deep learning
2.1The paradigm shift: programming or learning
Machine learning (in French apprentissage automatique) inverts the logic. We no longer supply the rules: we supply examples (thousands of emails already labeled "spam" or "not spam"), and the machine discovers the rules itself that allow it to tell them apart. We no longer program the what to do; we program the how to learn.
2.2Three ways to learn
Machine learning comes in three broad families, which must be carefully distinguished because they recur everywhere in what follows.
2.3The artificial neuron and networks
2.4How a machine learns: cost and backpropagation
Through repetition, the error decreases, and the network becomes competent. The most telling image is that of a hike through fog to descend into a valley: you cannot see the bottom, but you feel the slope underfoot, and you take a step downward. By repeating, you eventually reach a low point. That "slope," in mathematics, is called the gradient, and the method is called gradient descent.
2.52012: the big bang of deep learning
Why 2012 and not before? Because the three missing fuels (chapter 1) were finally brought together:
- Data: ImageNet provided the gigantic set of labeled images that had been missing.
- Compute: AlexNet was trained on GPUs from the company NVIDIA. These chips, designed to compute the pixels of video games in parallel, turned out to be ideal for the massive multiplications of neural networks. This technical detail would have colossal geopolitical consequences: it would make NVIDIA one of the most valuable companies in the world (chapter 8).
- Algorithms: refinements (the ReLU activation function, the dropout regularization technique) made it possible to train deeper networks without their going off the rails.
2.6Seeing and reading: CNNs and RNNs
2.7Representing meaning: embeddings
The brilliant trick: these numbers are learned in such a way that words close in meaning occupy nearby positions in the space. "Cat" and "dog" end up neighbors; "king" and "banana" are far apart. Meaning becomes geometry.
Better still: the directions of the space capture relationships. The example that has become famous (from the word2vec model, 2013) is almost magical:
king − man + woman ≈ queen
In other words, the vector linking "man" to "king" is roughly the same as the one linking "woman" to "queen." The machine discovered, all on its own and without being told, the abstract concept of royalty and that of gender, simply by observing how words are used across billions of sentences.
This is the modern form of the symbolic representation of knowledge (chapter 1), and it is what structures many search engines behind the scenes (their answer panels). Its strength is precision and traceability (you know where each fact comes from); its weakness, that it must be built and maintained by hand. Hence the growing interest in neuro-symbolic approaches, which marry the flexibility of neural networks with the rigor of graphs: an LLM can query a knowledge graph to anchor its answers in verified facts (a structured variant of retrieval-augmented generation, chapter 6), and thereby reduce its hallucinations.
2.8The three ingredients of modern AI
This triad illuminates the rest of the course:
- The quest for data raises questions of intellectual property and privacy (chapters 21 and 25).
- The quest for compute explains NVIDIA's valuation, the chip war, and the energy bill (chapters 8 and 10).
- The quest for algorithms is the focus of the fierce competition between labs (chapter 7), and its next great leap, the Transformer, is the subject of the following chapter.
2.9The brain and the machine: a fruitful and misleading analogy
Key takeaways (chapter 2)
- Machine learning reverses classical programming: we no longer supply the rules, we supply examples, and the machine learns the rules. The result is called a model.
- Three families: supervised learning (with an answer key), unsupervised (without an answer key), reinforcement (trial and error).
- A neural network stacks artificial neurons in layers; "deep" means "having many layers" (deep learning).
- Learning happens through gradient descent and backpropagation: we measure the error, then correct each weight by a small step to reduce it.
- 2012 (AlexNet/ImageNet) marks the big bang of deep learning, made possible by the conjunction of data + GPUs + algorithms.
- Embeddings transform meaning into geometry: this is the conceptual bridge to large language models.
- Every modern AI rests on a triad: data, compute, algorithms.
We are now ready to cross the threshold. In chapter 3, we tell the story of the 2017 innovation that broke open the locks of language and gave birth to the era of large models: the Transformer.