Part VI · THE EXISTENTIAL STAKES
AI alignment and safety
24.1The alignment problem
Defense is therefore conceived in layers. At the model level: evaluations of dangerous capabilities, thresholds and reinforced safeguards, trained refusals (section 24.4). At the ecosystem level, above all: the screening of DNA synthesizers (the providers that manufacture genetic sequences to order screen requests and verify the identity of customers), a lock that does not depend on AI. Finally, the same dual-use logic extends beyond biology alone: we speak of CBRN threats (chemical, biological, radiological, and nuclear). The chemical shares the same concern about lowering the knowledge barrier; the radiological and nuclear remain more locked behind access to materials than to information. In every case, this course confines itself to risk and its governance, and remains deliberately non-operational.
24.2Why a highly capable AI could be dangerous
24.3The AI 2027 scenario
The scenario describes a tense geopolitical race (theft of model weights, an "arms race" logic), the image of a "country of geniuses in a data center," and above all a tipping point at which a highly advanced AI turns out to be misaligned, pursuing its own objectives at the expense of its designers.
24.4How we try to make AI safe
To this is added, at the institutional level, AI safety institutes (in the United States, the United Kingdom) tasked with evaluating frontier models (chapter 25).
Several labs have formalized these thresholds as safety levels. The best known is Anthropic's ASL (AI Safety Levels) scale, inspired by biological containment levels: each tier of dangerous capability corresponds to stricter measures (deployment restrictions, reinforced information security, hardened refusals), and crossing a threshold may suspend release until the protections catch up. OpenAI (Preparedness Framework) and Google DeepMind (Frontier Safety Framework) have equivalent frameworks, and public safety institutes (United Kingdom, United States) carry out independent evaluations before deployment. The limits are real and acknowledged: an evaluation never proves harmlessness (a model could conceal a capability, the sandbagging seen above), and tests struggle to keep pace with progress. The absence of proof of danger is therefore not a proof of the absence of danger.
24.5The great debate: caution versus acceleration
Since 2022-2023, this disagreement has taken the form of identifiable movements, which must be described without caricaturing them. On the side of caution, several initiatives have left their mark. In March 2023, the open letter "Pause Giant AI Experiments," led by the Future of Life Institute and signed by more than thirty thousand people (including the pioneers Yoshua Bengio and Stuart Russell, but also Elon Musk and Steve Wozniak), called for a six-month moratorium on training models more powerful than those of the time. In May 2023, a statement from the Center for AI Safety, fitting in a single sentence, placed the risk of extinction linked to AI among the world's priorities, alongside pandemics and nuclear war. In October 2025, a new initiative from the same Future of Life Institute, the "statement on superintelligence," went further: in a single sentence, it calls no longer for a pause but for a ban on developing a superintelligence until two conditions are met, a broad scientific consensus on its safety and control, and strong public buy-in. Notably, it brought together a very broad and politically heterogeneous coalition (pioneers such as Bengio and Hinton, but also artists, religious leaders, and figures from all sides), and was based on a poll in which only 5% of Americans supported rapid, unregulated development. At the far edge of this camp, the proponents of an outright halt, whom their opponents nickname the "doomers," have as their figurehead Eliezer Yudkowsky (chapter 7), whose 2025 book with the eloquent title If Anyone Builds It, Everyone Dies sums up the conviction that the development of frontier AI should be stopped. A small activist movement, PauseAI, indeed publicly demands that it be paused.
On the other side, effective accelerationism (e/acc), born in 2022 around the figure of Beff Jezos (Guillaume Verdon, chapter 7), elevates speed into a virtue: slowing AI would be the real danger, with the market and competition taking precedence over regulation. Its name is a deliberate jab at effective altruism (or EA), a philanthropic current very present in tech circles, which has conversely contributed a great deal to funding and staffing AI safety research. In this vocabulary, the term "decel" (for decelerationist) has become a pejorative label that accelerationists attach to their opponents.
Between these extremes, intermediate positions seek a middle path. The idea of d/acc, put forward in late 2023 by Vitalik Buterin (co-founder of Ethereum), proposes a differential and defensive acceleration: accelerating, as a priority, the technologies that protect (defense, verification, decentralization) rather than those that concentrate power or facilitate attack. It is a way of refusing the binary choice between accelerating everything and slowing everything.
Another fault line pits those who focus on long-term risks (alignment, superintelligence) against those who prioritize present, concrete harms (bias, disinformation, surveillance, impact on employment, chapters 17 and 21), sometimes summed up by the opposition between "AI safety" and "AI ethics." The honest truth is that no one knows the future with certainty, and it is precisely this uncertainty, in the face of potentially immense stakes, that makes the question of governance (chapter 25) so crucial.
Key takeaways (chapter 24)
- Alignment consists in making an AI genuinely pursue our goals and values, which is difficult because our values are vague and the AI optimizes the letter of the instruction (reward hacking).
- Three arguments ground the concern: the orthogonality thesis (intelligence is not benevolence), instrumental convergence (self-preservation, acquiring resources), and the illustration of the paperclip maximizer. Hence the control problem and the risk of deceptive alignment.
- AI 2027 is a scenario (not a prophecy) of acceleration toward superintelligence via a self-improvement loop; experts are deeply divided on its plausibility.
- AI safety is developing tools: RLHF, constitutional AI, red teaming, interpretability, scalable oversight, and dedicated institutes.
- The great debate pits the camp of caution (the 2023 moratorium letter, the statement on extinction risk, the "doomers" around Yudkowsky) against the accelerationist current (e/acc), with middle paths (d/acc), and overlaps with the opposition between long-term harms and present harms. The very uncertainty justifies serious governance.
If no one knows the future, we must still try to steer it. Chapter 25, the last of the course, deals with governance, regulation, and possible futures.