Robotics and embodied AI

13.1A brief history: from the industrial arm to the robot that learns

13.2Embodied AI and Vision-Language-Action (VLA) models

Diagram13.1. The principle of a VLA model. The robot perceives its environment (vision), receives an instruction in natural language, and the model translates the whole into motions. It is the equivalent, for the body, of what LLMs did for language.

In context

Moravec's paradox

An old observation sheds light on why robotics progresses more slowly than "desk" AI. Formulated in the 1980s by Hans Moravec, the paradox can be stated in a single sentence: for a machine, the tasks that humans judge hard (playing chess, calculating, reasoning) turn out to be relatively easy, while those they find trivial (grasping an object, walking on uneven ground, recognizing a cluttered scene) are extraordinarily difficult. The reason is evolutionary: our sensorimotor abilities rest on hundreds of millions of years of evolution, deeply hardwired and unconscious, whereas abstract reasoning is a recent and superficial acquisition. The direct consequence: a model can write a brilliant essay long before a robot can fold a towel properly. This is one of the keys to understanding why the "brain" (VLA models) has progressed faster than fine mastery of the body, and why physical data (see below) is so precious.

Under the hood

The robot's body (actuators, joints, hands and touch)

Behind the VLA "brain" lies a body whose physical constraints are decisive. The actuators are the robot's muscles: long hydraulic (powerful but heavy, noisy and prone to leaks, as on the early Atlas robots), they are today most often electric (cleaner, more precise and quieter). One counts a robot's degrees of freedom, that is, the number of its independent joints: the more there are, the richer the movement, but the harder it is to coordinate. The crux of the matter remains the hand: a fine and versatile grip (grasping an egg as readily as a wrench) demands articulated fingers and, above all, a sense of touch, those tactile sensors that measure force and slippage, still far below human sensitivity. Finally, to move about, a mobile robot must localize and map its environment at the same time (a technique known as SLAM). So many hardware challenges that serve as a reminder that, in robotics, software intelligence is only worth as much as the quality of the body that executes it.

In context

The crux of the matter, physical data

Why does robotics progress more slowly than text models? Because there exists, for motions, no equivalent of the immense corpus of text and images on the web. Teaching a robot to grasp, fold or tidy demands sensorimotor data (what its cameras see, what its motors do) that exists almost nowhere and is expensive to produce. Hence three complementary sources, at the heart of every player's strategy: simulation (millions of virtual trials, sim-to-real, Chapter 5); teleoperation, where humans remotely pilot the robot to show it the motion and create demonstrations; and the real-world deployment of a fleet, each robot in service sending back data to improve the next ones (the "data flywheel" of section 13.3). Mastery of this data chain, even more than the mechanics, is what today separates the leaders from the followers.

13.3The humanoid race (overview, June 2026)

In plain terms

Why want human-shaped robots? Because our world (door handles, stairs, tools, workstations) is designed for the human body: a humanoid robot can, in theory, slot into it without redesigning everything. Hence an intense global race pitting American and Chinese players against each other. Here is an indicative overview as of mid-2026 (the figures change very fast):

Robot (company, country)	Status mid-2026	Indicative price	Distinctive feature
Figure 03 (Figure, United States)	Around forty units deployed at BMW (Spartanburg)	rental model, on the order of $25/robot-hour	in-house "Helix" VLA; "BotQ" factory targeting a cadence of about one robot per hour; has left the partnership with OpenAI
Atlas (Boston Dynamics, United States)	Production version unveiled at CES in January 2026; deployment at Hyundai	around $140-150k	Now 100% electric; partnership with Google DeepMind; "Orbit" fleet software
Optimus Gen 3 (Tesla, United States)	Internal deployment; conversion of a factory in Fremont	target below $20-30k at scale	"Data flywheel" via Tesla's factories; in-house AI5 chip; external sales targeted around 2027
G1 / H2 (Unitree, China)	More than 5,500 units delivered in 2025; target of 10-20,000 in 2026	G1 around $16k; H2 around $70k	"Price-first" strategy; planned IPO in Shanghai
NEO (1X, United States/Norway)	More than 10,000 pre-orders (home robot)	around $20k or $499/month	Designed for the home; tendon-driven actuators; human teleoperation as backup
Apollo (Apptronik, United States)	Industrial pilots	n/a	Nearly $935M raised; logistics partnerships
Walker S2 (UBTECH, China)	On the order of 1,000 units delivered	n/a	Factory deployments
IRON (XPeng, China)	Pre-production	n/a	Backed by an automaker

13.4Beyond humanoids

In context

Care and companion robots

Beyond industrial humanoids, one category responds to the aging of populations and the shortage of caregivers: assistance and companion robots, in particular for the elderly. These include affective robots (such as the Paro seal, used for years to soothe patients with dementia), presence assistants (medication reminders, video calls, fall detection, conversation) and physical-assistance robots (getting up, moving about, carrying a load). Japan, faced with extreme aging, stands as a pioneer and a testbed. Generative AI gives these robots a far more natural conversation, capable of relieving loneliness. But the subject is sensitive: a robot must not replace human connection (at the risk of worsening isolation instead of filling it), it raises questions of dignity and consent (notably for vulnerable or disoriented people) and of privacy (cameras and microphones in the home). The stakes are therefore not only technical: it is a matter of deciding what place a society wants to grant machines in the care of its most fragile members.

In context

Other forms of robots (soft robotics and exoskeletons)

The popular imagination associates the robot with rigid metal, but two families depart from this. Soft robotics designs robots made of soft, deformable materials (silicone, polymers, inflatable structures), often inspired by living things (octopus, trunk, worm). Their appeal: handling fragile objects without damaging them, slipping into narrow spaces, and interacting with the human body without danger, where a rigid arm would injure; on the other hand, they are harder to control with precision. At the other extreme, the exoskeleton is not an autonomous robot but a structure worn by a human to augment their strength or endurance, or even to restore a function. It is found in three domains: medical rehabilitation (relearning to walk after an accident), industry and logistics (sparing the back of an operator carrying loads) and the military. These two avenues remind us that robotics does not boil down to humanoids: the right form depends on the task, and safe contact with humans becomes a design criterion as decisive as raw performance.

In context

Surgical robots, teleoperation and haptics

One family of robots has already entered critical use for years now: surgical robots. The best known, the da Vinci system, is not autonomous: it is teleoperated, with the surgeon remotely commanding arms of superhuman precision and stability (no tremor, motions scaled down and filtered). These devices illustrate two key notions. Teleoperation consists in piloting a robot remotely, with the human remaining the head and the robot the hands, also useful in hazardous environments (nuclear, bomb disposal, the seabed) and, as we have seen, for collecting demonstration data (section 13.3). Haptics, for its part, refers to force feedback: giving the operator back the sensation of touch (the resistance of a tissue, the contact of an object), something still missing from current teleoperated surgery and the subject of intense research. AI is gradually being added to this, not to replace the surgeon, but to assist them: stabilizing a motion, flagging a structure to avoid, or even automating repetitive subtasks under supervision. Here, in a very high-risk domain, we find again the centaur model (Chapter 17): the human at the controls, augmented by the machine.

13.5Stakes: safety, employment, acceptability

Key takeaways (Chapter 13)

Robotics long rested on programmed and rigid machines. The turning point of 2024-2026 comes from the arrival of large models inside the body of robots.
Vision-Language-Action (VLA) models (Figure's Helix, Physical Intelligence's "pi," Google's Gemini Robotics, NVIDIA's GR00T, Unitree's UnifoLM) translate perception and instruction into motions, often after training in simulation.
The humanoid race pits American players (Figure, Tesla, Boston Dynamics, 1X, Apptronik) against Chinese ones (Unitree, AgiBot, UBTECH, XPeng), along distinct strategies (capability, model, integration or price).
The stake: crossing the tipping point (around $20-25k), where the data flywheel and the supply chain (strong dependence on Chinese components) come into play.
"Physical AI" goes beyond humanoids: cobots, warehouse robots, quadrupeds, drones, autonomous vehicles.
The stakes around safety (certifying a robot that learns), employment, acceptability and regulation are major (Chapters 24, 21 and 25).

Thus ends Part IV. We have explored the convergences of AI with blockchain, quantum and robotics. It is time to leave the technologies behind and observe their concrete effects on the world: science and health, work and the economy, law and society. That is the subject of Part V.

13.1A brief history: from the industrial arm to the robot that learns#

13.2Embodied AI and Vision-Language-Action (VLA) models#

13.3The humanoid race (overview, June 2026)#

13.4Beyond humanoids#

13.5Stakes: safety, employment, acceptability#

Key takeaways (Chapter 13)

13.1A brief history: from the industrial arm to the robot that learns

13.2Embodied AI and Vision-Language-Action (VLA) models

13.3The humanoid race (overview, June 2026)

13.4Beyond humanoids

13.5Stakes: safety, employment, acceptability