Skip to content

Pre-flight validation

Two checks ship as reflex validate. Both are designed to fail cleanly with exit codes a CI pipeline can act on.

Terminal window
reflex validate dataset /path/to/lerobot_data --embodiment franka --strict

8 falsifiable checks against your LeRobot v3.0 corpus. Each flags a known training-time failure mode (mismatched action dimensions, NaN in state vectors, malformed BDDL, missing camera streams, etc.). Each failure carries a stable error slug and a remediation hint.

--strict upgrades warnings to failures. Use this in CI.

Terminal window
reflex validate export ./p0 --model lerobot/pi0_base --threshold 1e-4

Round-trip ONNX vs PyTorch parity at machine-precision threshold. Sample passing output:

Per-fixture results
fixture_idx max_abs_diff mean_abs_diff passed
0 3.21e-06 8.40e-07 PASS
1 2.98e-06 7.92e-07 PASS
...
Summary
max_abs_diff_across_all 3.21e-06
passed PASS
CodeMeaning
0Pass — every fixture matched within threshold
1Fail — at least one fixture exceeded threshold
2Error — missing ONNX, bad config, or other setup issue (NOT a parity failure)

Distinguishing 1 from 2 lets CI pipelines differentiate “the model regressed” from “the test runner is broken.”

Terminal window
reflex validate --init-ci

Scaffolds .github/workflows/reflex-validate.yml that runs both checks on every PR, with annotated failure messages. Pipe --output-json for machine-readable output.

  • ONNX export drifted from PyTorch — the most common cause of “it worked in training and now my robot does something subtly wrong”
  • Wrong action dimensions — embodiment config doesn’t match the model’s action_dim
  • Tokenizer mismatch — VLM tokenizer config didn’t make it into the export
  • Image shape mismatch — embodiment expected 224×224 but the export was traced at 256×256
  • Float64 state inputs — Python defaults to float64 but ONNX-GPU expects float32; silent truncation drops fps to ~0.3
  • Missing camera streams — embodiment declares wrist + front but the export was traced single-camera

All eight dataset checks and the export parity check have at least one pass test and one fail test in the regression suite.