Glossary

Plain-language definitions of the terms you keep running into in modern robotics and embodied AI.

Affordance: What a robot can actually do in a given situation, as distinct from what it might want to do.
Behavior cloning: Train a model to predict the action a human would take, given the same observation.
Diffusion Policy: A behavior-cloning policy that predicts short action trajectories via iterative denoising — the dominant imitation-learning baseline in modern robotics.
Domain randomization: Randomize simulation parameters during training so the policy works across many possible realities — including the real one.
Embodied AI: AI that perceives, reasons about, and acts in the physical world through a body — usually a robot.
End-effector: The business end of a robot arm — the gripper, hand, suction cup, or specialized tool that interacts with the world.
Foundation model: A single large model trained on a lot of data that's then adapted to many downstream tasks.
Inverse kinematics: Given a target pose for the end-effector, find joint angles that produce it.
Proprioception: A robot's sense of its own body — joint angles, velocities, motor currents, gripper state.
ROS 2: Robot Operating System 2 — the publish/subscribe message bus and tooling that most modern robots use to wire components together.
Sim-to-real: The practice of training a robot policy in simulation and deploying it on a real robot.
SLAM: Simultaneous Localization and Mapping — building a map of an unknown environment while keeping track of your position in it.
Teleoperation: A human controlling a robot in real time, usually via VR headset, joystick, or leader-follower puppetry.
URDF: Unified Robot Description Format — the XML-based file that describes a robot's links, joints, kinematics, and visual / collision meshes.
Vision-language-action (VLA) model: A robot policy that takes a camera image + natural-language instruction and emits motor actions.