reflex doctor

reflex doctor --model ./my-export/

10 falsifiable checks. Each maps to a known LeRobot GitHub issue or systemic VLA deploy failure mode. Every check has at least one pass test and one fail test in tests/test_doctor_diagnostics.py per the falsifiability gate.

The 10 checks

ID	Name	What it tests
`check_model_load`	Model load	Export dir exists + contains ONNX + fits in available RAM (×1.4 overhead, 20% headroom)
`check_onnx_provider`	ONNX provider	onnxruntime importable + CPU EP present (always required) + GPU EP noted
`check_vlm_tokenization`	VLM tokenization	Tokenizer config loads + 5 probe prompts produce in-range token IDs
`check_image_dims`	Image dim mismatch	`embodiment.cameras[*].resolution` appears in ONNX image input shape
`check_action_denorm`	Action denormalization	`embodiment.normalization.mean_action` / `std_action` length == `action_dim`, no NaN/Inf, std > 0
`check_gripper`	Gripper config	`gripper.component_idx` < `action_dim`, `close_threshold` ∈ [0, 1], `inverted` flag sanity
`check_state_proprio`	State/proprio dtype	ONNX `state` input is float32 (not float64 — silent truncation drops fps to ~0.3)
`check_gpu_memory`	GPU memory	`nvidia-smi` reports ≥ 90% headroom over estimated model footprint (×1.6 file size for KV + activations)
`check_rtc_chunks`	RTC chunk boundary	`chunk_size` ≥ `frequency_hz × rtc_execution_horizon` (one horizon’s worth of actions)
`check_hardware_compat`	Hardware compat	CUDA driver ≥ 12.x + ORT GPU EP present when CUDA detected

Each check links back to a load-bearing LeRobot GitHub issue. The full table with issue links is in docs/doctor_check_list.md in the source repo.

CheckResult contract

Every check returns a CheckResult (see src/reflex/diagnostics/__init__.py):

Field	Type	Notes
`check_id`	str	Stable ID (e.g. `check_model_load`); used by `--skip`
`name`	str	Human-readable name
`status`	enum	`pass` / `fail` / `warn` / `skip`
`expected`	str	What the check wanted to see
`actual`	str	What it actually saw
`remediation`	str	Required when `status="fail"`. Empty otherwise.
`duration_ms`	float	Wall-clock for the check
`github_issue`	str or None	URL to the load-bearing LeRobot issue

Falsifiability gate: CheckResult.__post_init__ raises ValueError if status="fail" and remediation is empty. Enforced at construction time so a check with no fix-it suggestion can never ship.

Status semantics

pass — verified the expected condition. No action.
fail — verified a known-broken condition. Doctor exits 1. Caller should follow remediation.
warn — non-blocking concern (e.g. CPU-only on a system that should have GPU). Doctor exits 0 but the warning is surfaced.
skip — couldn’t run because a precondition wasn’t met (e.g. embodiment is custom so embodiment-dependent checks have nothing to compare against).

Example output

Text format (default)

Reflex Doctor v0.7.0
Checking: ./my-export/

✓ Model load                         (12 ms)
✓ ONNX provider                      (8 ms)   — TensorRT, CUDA, CPU EPs available
✓ VLM tokenization                   (43 ms)
✓ Image dim mismatch                 (3 ms)   — 224×224 matches export
✓ Action denormalization             (2 ms)
✓ Gripper config                     (2 ms)
✗ State/proprio dtype                (1 ms)
    Expected: state input is float32
    Actual:   state input is float64
    Fix:      Cast state to np.float32 before sending to /act. Float64 silently
              truncates to float32, dropping fps to ~0.3 in production.
              See https://github.com/huggingface/lerobot/issues/2458
✓ GPU memory                         (15 ms)  — 18 GB available, 4 GB needed
⚠ RTC chunk boundary                 (1 ms)
    Expected: chunk_size ≥ frequency_hz × rtc_execution_horizon (15 ≥ 30 × 0.5)
    Actual:   chunk_size = 15, frequency_hz = 30, rtc_execution_horizon = 0.5
              (15 = 15 — passes by hair; recommend chunk_size = 25 for headroom)
✓ Hardware compat                    (84 ms)

Summary: 8 pass / 1 warn / 1 fail
Exit: 1 (fail)

JSON format (CI-friendly)

reflex doctor --model ./my-export/ --format json | jq .

{
  "schema_version": 1,
  "reflex_version": "0.7.0",
  "model_path": "./my-export/",
  "embodiment": "franka",
  "checks": [
    {
      "check_id": "check_model_load",
      "name": "Model load",
      "status": "pass",
      "expected": "...", "actual": "...",
      "remediation": "",
      "duration_ms": 12.4,
      "github_issue": "https://github.com/huggingface/lerobot/issues/386"
    },
    /* ... */
  ],
  "summary": {"pass": 8, "warn": 1, "fail": 1, "skip": 0}
}

Schema v1 is locked — additive fields don’t bump version, breaking changes do.

Adding check #11

Create src/reflex/diagnostics/check_<name>.py with a _run(model_path, embodiment_name, **kwargs) -> CheckResult function
At the bottom: register(Check(check_id=..., name=..., severity=..., github_issue=..., run_fn=_run))
Import the new module in _ensure_registry_loaded() in src/reflex/diagnostics/__init__.py
Add at least 1 pass test + 1 fail test to tests/test_doctor_diagnostics.py
Update the canonical doc table

The registry is auto-loaded; no other wiring needed.

Skipping checks

# Skip specific checks (CSV, by check_id)
reflex doctor --model ./my-export/ --skip check_gpu_memory,check_hardware_compat

# Run only environment checks (no model needed)
reflex doctor

Skipped checks return status=skip with a reason. Don’t silently drop them — operators want to see what wasn’t verified.