Pre-flight validation

Two checks ship as tether validate. Both are designed to fail cleanly with exit codes a CI pipeline can act on.

Validate the dataset

tether validate dataset /path/to/lerobot_data --embodiment franka --strict

8 falsifiable checks against your LeRobot v3.0 corpus. Each flags a known training-time failure mode (mismatched action dimensions, NaN in state vectors, malformed BDDL, missing camera streams, etc.). Each failure carries a stable error slug and a remediation hint.

--strict upgrades warnings to failures. Use this in CI.

Validate the export

tether validate export ./p0 --model lerobot/pi0_base --threshold 1e-4

Round-trip ONNX vs PyTorch parity at machine-precision threshold. Sample passing output:

Per-fixture results
fixture_idx  max_abs_diff  mean_abs_diff  passed
0            3.21e-06      8.40e-07       PASS
1            2.98e-06      7.92e-07       PASS
...
Summary
max_abs_diff_across_all  3.21e-06
passed                   PASS

Exit codes

Code	Meaning
0	Pass — every fixture matched within threshold
1	Fail — at least one fixture exceeded threshold
2	Error — missing ONNX, bad config, or other setup issue (NOT a parity failure)

Distinguishing 1 from 2 lets CI pipelines differentiate “the model regressed” from “the test runner is broken.”

CI integration

tether validate --init-ci

Scaffolds .github/workflows/reflex-validate.yml that runs both checks on every PR, with annotated failure messages. Pipe --output-json for machine-readable output.

What validation catches

ONNX export drifted from PyTorch — the most common cause of “it worked in training and now my robot does something subtly wrong”
Wrong action dimensions — embodiment config doesn’t match the model’s action_dim
Tokenizer mismatch — VLM tokenizer config didn’t make it into the export
Image shape mismatch — embodiment expected 224×224 but the export was traced at 256×256
Float64 state inputs — Python defaults to float64 but ONNX-GPU expects float32; silent truncation drops fps to ~0.3
Missing camera streams — embodiment declares wrist + front but the export was traced single-camera

All eight dataset checks and the export parity check have at least one pass test and one fail test in the regression suite.