Simulation-to-real, explained
Why simulation is the dominant training environment for modern robots, what makes a sim policy survive deployment, and where the gap still bites.
Training a robot in the real world is slow, expensive, and hard to parallelize. Training it in simulation is fast, cheap, and trivially parallel — but the trained policy has to survive the move to the real world. The set of techniques for making that move work is called sim-to-real, and it's the workhorse behind almost every quadruped locomotion paper and a growing fraction of manipulation work.
Why sim has won training
The numbers are stark. A modern GPU-based simulator (Isaac Sim, MuJoCo MJX, Genesis) can run thousands of robot instances in parallel at faster-than-real-time. A reinforcement learning agent that would need years of real-world experience to converge can converge in hours of wall-clock time in sim. The cost per episode is roughly the cost of the electricity.
In the real world, the same training would be impossible. You'd need thousands of robots, mechanical wear-and-tear, safety supervision, instrumented test rigs, and time. Even at the largest labs, real-world data collection is measured in tens of thousands of episodes, not millions.
So almost every reinforcement-learning result on a robot today started in simulation. The interesting question is what bridges the gap to the real robot.
The gap
A policy trained in sim sees a particular distribution of physics, geometry, sensor noise, and timing. The real robot lives in a different distribution. The mismatch is what people call the "sim-to-real gap." Common offenders:
- Friction. Sim friction is a few coefficients; real friction depends on humidity, surface wear, and dust.
- Actuator dynamics. A real motor has backlash, finite torque, latency, and saturation. Sim motors usually behave perfectly.
- Sensor noise. Real cameras drop frames, suffer rolling shutter, and pick up motion blur. Real IMUs drift.
- Contact. Multi-body contact is hard to simulate precisely; small errors compound during a manipulation.
- Latency. The real perception → control loop has milliseconds of variable delay that the sim usually ignores.
Some of these are getting better quickly (Isaac Sim now models actuator latency natively, MuJoCo's MJX models multi-contact well). Others — especially contact and deformable-object manipulation — are still hard.
How to close the gap
There are four main techniques, often combined.
Domain randomization
The dominant approach for locomotion. At training time, randomize everything you can — friction, mass, motor strength, observation noise, terrain — over wide ranges. The policy that survives has learned to be robust to all of those variations, including the real-world values. ANYmal, Unitree's quadrupeds, and most humanoid locomotion policies use this.
The catch: you randomize what you can simulate. You can't randomize over phenomena your simulator doesn't model at all.
System identification
Measure the real robot's parameters carefully (motor constants, friction, masses), then make sim match. The opposite of domain randomization in spirit — instead of training over a wide distribution and hoping reality is in it, you point sim at reality.
In practice, full system identification is rarely sufficient by itself but is often the first thing you do before adding randomization on top.
Real-world fine-tuning
Train in sim, then fine-tune on a small amount of real-world data. This shows up as sim-to-real-to-sim loops, or as bootstrapping a real-world data collection from a sim-trained starting point. Most production deployments use some version of this.
Real2Sim2Real
A more recent twist: scan the real environment (3D Gaussian Splatting, NeRFs, photogrammetry), build a sim that matches it, train in that sim, and deploy. Useful when the environment is fixed (a warehouse, a factory cell) and you can afford a one-time scan.
What works in 2026
A rough taxonomy of what tends to transfer well today:
- Legged locomotion. Sim-to-real is essentially solved for quadrupeds on standard terrain. Humanoid locomotion is harder but mostly works with enough randomization compute.
- Reaching and pushing. Coarse manipulation transfers if you randomize enough.
- Grasping rigid objects. Mostly works with the right action space and observation pipeline.
- In-hand manipulation, deformable objects, contact-rich assembly. Still hard. Sim physics for these tasks isn't accurate enough yet, and randomization stops being a substitute for accuracy.
What to use
- Isaac Sim / Isaac Lab is the default for modern RL — GPU-parallel, good physics, integrates with NVIDIA's training stack.
- MuJoCo (and MJX) is the standard for research that needs fast, accurate physics for arms and legs. MJX runs on GPU.
- Genesis is the newest entrant, marketed for generative-AI workflows and fast iteration.
- Gazebo / Ignition is what ROS-native projects use. Good for system integration testing, not as fast for training.
The Robot Brain Index tools tab tracks each of these with what's actively maintained, what license they're under, and what they're best at.
Pitfalls
- Overfitting the simulator. A policy that's perfect in sim and useless on the real robot is the classic failure mode. Always test on real hardware as early as you can.
- Reward hacking. RL agents exploit simulator quirks (clipping through walls, glitchy contacts) that give them reward without doing the task. Watch your training videos.
- Too much randomization. If you randomize too widely, the policy becomes overly conservative — it walks slowly because it's hedging against motors that might be weak. Tune.
- The "last 10%" trap. A 90% transfer success rate looks great but isn't deployable. The last 10% is where the long-tail of sim-to-real gaps shows up.
Where to look next
- What is embodied AI? — the broader context.
- Best robotics simulation tools — a head-to-head on the major simulators.
- The Robot Brain Index simulation tag — all entries tagged with simulation tooling.