Pre-flight validation
Two checks ship as reflex validate. Both are designed to fail cleanly with exit codes a CI pipeline can act on.
Validate the dataset
Section titled “Validate the dataset”reflex validate dataset /path/to/lerobot_data --embodiment franka --strict8 falsifiable checks against your LeRobot v3.0 corpus. Each flags a known training-time failure mode (mismatched action dimensions, NaN in state vectors, malformed BDDL, missing camera streams, etc.). Each failure carries a stable error slug and a remediation hint.
--strict upgrades warnings to failures. Use this in CI.
Validate the export
Section titled “Validate the export”reflex validate export ./p0 --model lerobot/pi0_base --threshold 1e-4Round-trip ONNX vs PyTorch parity at machine-precision threshold. Sample passing output:
Per-fixture resultsfixture_idx max_abs_diff mean_abs_diff passed0 3.21e-06 8.40e-07 PASS1 2.98e-06 7.92e-07 PASS...Summarymax_abs_diff_across_all 3.21e-06passed PASSExit codes
Section titled “Exit codes”| Code | Meaning |
|---|---|
| 0 | Pass — every fixture matched within threshold |
| 1 | Fail — at least one fixture exceeded threshold |
| 2 | Error — missing ONNX, bad config, or other setup issue (NOT a parity failure) |
Distinguishing 1 from 2 lets CI pipelines differentiate “the model regressed” from “the test runner is broken.”
CI integration
Section titled “CI integration”reflex validate --init-ciScaffolds .github/workflows/reflex-validate.yml that runs both checks on every PR, with annotated failure messages. Pipe --output-json for machine-readable output.
What validation catches
Section titled “What validation catches”- ONNX export drifted from PyTorch — the most common cause of “it worked in training and now my robot does something subtly wrong”
- Wrong action dimensions — embodiment config doesn’t match the model’s action_dim
- Tokenizer mismatch — VLM tokenizer config didn’t make it into the export
- Image shape mismatch — embodiment expected 224×224 but the export was traced at 256×256
- Float64 state inputs — Python defaults to float64 but ONNX-GPU expects float32; silent truncation drops fps to ~0.3
- Missing camera streams — embodiment declares
wrist + frontbut the export was traced single-camera
All eight dataset checks and the export parity check have at least one pass test and one fail test in the regression suite.