Success rate drops by half on surfaces with friction below 0.3. Objects under 4cm see 83%+ failure rates. RoboGate runs 50,000+ Isaac Sim experiments across 4 robots (Franka Panda, UR5e, UR3e, UR10e) to uncover these risks before deployment.
GitHub Dataset
50,000+ experiments Β· 4 robots Β· MIT License
HuggingFace Dataset
robogate-failure-dictionary Β· Robotics
50,000+ Isaac Sim experiments Β· Franka Panda + UR5e + UR3e + UR10e Β· NVIDIA RTX
$ robogate test --policy v2.3 --baseline v2.2
[1/5] Verifying artifacts... β
[2/5] Loading config... β
[3/5] Running 68 scenarios...
ββββββββββββββββββββββββ 68/68
[4/5] Computing metrics... β
[5/5] Generating reports... ββ PASS β Confidence: 92/100
Success Rate: 95.4% β² +2.2%
Collisions: 0
Cycle Time: 4.1s
Making Physical AI Practical Requires a Validation Layer
The best VLA models score near-perfect on academic benchmarks. The same models score 0% on industrial Isaac Sim scenarios.
GR00T N1.6 β NVIDIAβs own robot foundation model β achieves 97.65% on LIBERO. The same model, fine-tuned on the same LIBERO-Spatial dataset, scores 0% on RoboGateβs 68 industrial scenarios.
LIBERO (MuJoCo)
97.65%
RoboGate (Isaac Sim)
0%
This 97.65 percentage point gap is not a bug. It is the reason deployment validation must happen in the target environment β before the model ever touches a physical robot.
RoboGate is that validation layer for Physical AI.
Source: LIBERO 97.65% = NVIDIA official benchmark. Isaac Sim 0/68 = RoboGate direct evaluation.
NVIDIA Isaac Sim + RTX Β· Real Physics Simulation Β· 4 Robots (Franka Panda, UR5e, UR3e, UR10e)
Experiments
Risk Model AUC
Friction Threshold
friction Γ mass interaction
NVIDIA built the simulator. RoboGate built deployment validation on top of it.
You only test under normal conditions. Dark environments, tiny objects, surfaces with 0.1 friction β your robot encounters these for the first time in production.
Edge case failure rate 83%+Senior engineers run tests manually. Criteria vary by person. Results are rarely documented.
RoboGate: 68 scenarios, fully automatedNo alerts when success rate drops by 5%. Problems surface only after production stops.
Drift detection: under 5 minAutomatically test new AI policies across 68 scenarios in Isaac Sim.
nominal 20 Β· edge 15 Β· adversarial 10 Β· DR 23
good_policy 68/68 PASS Β· confidence 76
Data-driven PASS/FAIL decisions with Confidence Score 0-100.
FAIL report + failure evidence auto-generated when thresholds are breached
bad_policy 0/68 FAIL Β· confidence 25
Automatic performance drift detection + alerts after deployment.
5% drop warning Β· 10% drop critical Β· rollback recommendation
Slack Β· Telegram Β· Teams
Franka Panda performing Pick & Place tasks in Isaac Sim
good_policy Β· 68/68 PASS Β· confidence 76 Β· 100% success
bad_policy Β· 0/68 FAIL Β· confidence 25 Β· 6 collisions
Send us your policy file and we'll run a free Isaac Sim validation test.
50,000+ Isaac Sim experiments completed Β· MIT License Β· NVIDIA Isaac Sim 5.1