auto-calibrate
reflex serve --auto-calibrate picks the right pre-shipped configuration for your hardware + embodiment, then passively learns latency_compensation_ms from real /act traffic. One-flag first-deploy DX win.
Per ADR 2026-04-25-auto-calibration-architecture — selection, not tuning. Strict partial-order resolver, passive actuator-latency observation (no boot-time probe).
Quick start
Section titled “Quick start”# Calibrate + serve. First run: ~5-7s for the measurement pass.# Subsequent runs hit the cache and start instantly.reflex serve ./my-export/ --embodiment franka --auto-calibrate
# Override cache location (e.g., to ship a frozen cache inside a container)reflex serve ./my-export/ --auto-calibrate \ --calibration-cache /opt/reflex/calibration.json
# Force re-measurement on next start (after hardware swap or driver upgrade)reflex serve ./my-export/ --auto-calibrate --calibrate-forceWhen --auto-calibrate is unset, behavior is unchanged from baseline. Phase 1 is opt-in; Phase 1.5 will flip to default-on.
What gets selected
Section titled “What gets selected”Five parameters in strict partial order — each downstream choice narrows by the upstream choice:
| Parameter | Choices | Selected by |
|---|---|---|
variant | fp16, int8, fp8 | Hardware + which .onnx files exist. fp8 only on sm_89+ (Hopper / Ada Lovelace / Blackwell). |
provider | TensorRT-EP, CUDA-EP, CPU-EP | Variant + --max-batch. TRT-EP requires fp16 + batch=1. |
nfe (denoise steps) | 1, 2, 4, 8, 10 | Largest NFE such that nfe × measured_expert_step_ms ≤ chunk_period × 0.7. Falls back to NFE=1 (forces SnapFlow path) when no candidate fits. |
chunk_size | embodiment default | Default unless even NFE=1 doesn’t fit; then halves. |
latency_compensation_ms | embodiment cold-start (franka=40, so100=60, ur5=40) | Cold-start at startup; warm-update from real /act p95 after 30s of traffic. |
The resolver runs single-pass at startup. The warm-up tracker continuously refines latency_compensation_ms and writes back to the cache when stable.
The cache
Section titled “The cache”Lives at ~/.reflex/calibration.json by default. JSON schema v1:
{ "schema_version": 1, "reflex_version": "0.7.0", "calibration_date": "2026-04-25T16:30:00.000000Z", "hardware_fingerprint": { "gpu_uuid": "GPU-abc123...", "gpu_name": "NVIDIA A10G", "driver_version_major": 535, "driver_version_minor": 129, "cuda_version_major": 12, "cuda_version_minor": 2, "cpu_count": 8, "ram_gb": 32 }, "entries": { "franka::pi05_decomposed_libero_v3": { "chunk_size": 50, "nfe": 4, "latency_compensation_ms": 42.5, "provider": "TensorrtExecutionProvider", "variant": "fp16", "measurement_quality": { "warmup_iters": 10, "measurement_iters": 100, "median_ms": 38.2, "p99_ms": 47.1, "n_outliers_dropped": 10, "quality_score": 0.94 } } }}Keyed by {embodiment}::{model_hash} so multi-embodiment / multi-model deployments coexist in one cache file.
Inspecting
Section titled “Inspecting”# Pretty-printreflex doctor --show-calibration
# Machine-readable (for CI / scripts)reflex doctor --show-calibration --format json | jq .Staleness
Section titled “Staleness”The cache is stale when ANY of these is true:
- Hardware fingerprint mismatches the current host (GPU swap, driver major-version bump, reflex_version change)
- Cache age > 30 days
--calibrate-forceis set
Driver patch-version bumps are ignored.
Override ladder
Section titled “Override ladder”Explicit CLI flags > calibration cache > embodiment_config default.
If you pass --chunk-size 40 AND --auto-calibrate, the explicit 40 wins. A yellow stderr warning makes this visible at startup:
[auto-calibrate] ignoring cached chunk_size=50 because --chunk-size=40What’s the warm-up?
Section titled “What’s the warm-up?”Per the ADR, no active probe at boot — that would risk an unintended first-move on a robot in an unsafe pose. Instead:
- Server starts with the embodiment cold-start default for
latency_compensation_ms - As real
/acttraffic flows, the warmup tracker records each request’s wall-clock latency in a rolling 100-sample window - Once 30+ samples accumulate AND the rolling p95 has been stable within 5 ms for 3 consecutive checks, the tracker writes the new value to the cache + persists atomically
- Subsequent server restarts pick up the warmed-up value
Customer experience: works correctly from the first request (cold-start default is a reasonable bound), gets steadily more accurate over the first ~30 seconds.
Hardware-tier expectations
Section titled “Hardware-tier expectations”Approximate expectations on Franka:
| Hardware | variant | provider | NFE | chunk_size | latency_comp_ms (warmed) |
|---|---|---|---|---|---|
| Cloud A100-80GB | fp16 | TRT-EP | 10 | 50 | 25-40 |
| Cloud A10G-24GB | fp16 | TRT-EP | 4-8 | 50 | 35-50 |
| Jetson AGX Orin (64GB) | fp16 | TRT-EP | 4 | 50 | 50-70 |
| Jetson Orin Nano (8GB) | fp16 | CUDA-EP | 1 (SnapFlow path) | 50 | 80-120 |
| H100 (Hopper) | fp8 | TRT-EP | 10 | 50 | 20-30 |
Illustrative. Your numbers depend on the export, the model, and customer-specific traffic shape.
Troubleshooting
Section titled “Troubleshooting”auto-calibrate: cache stale (fingerprint mismatch ...) — expected after hardware swap or driver upgrade. Re-measurement runs on the next start.
No NFE fits budget — falling back to NFE=1 — your hardware × embodiment × model has no legal NFE that fits the chunk-period budget. Switch to a SnapFlow-distilled student (single-step inherently), or lower the embodiment’s control.frequency_hz.
reflex doctor --show-calibration says “No calibration cache found” — you haven’t run reflex serve --auto-calibrate yet on this host. Or the cache lives at a different path.