Skip to content

auto-calibrate

reflex serve --auto-calibrate picks the right pre-shipped configuration for your hardware + embodiment, then passively learns latency_compensation_ms from real /act traffic. One-flag first-deploy DX win.

Per ADR 2026-04-25-auto-calibration-architectureselection, not tuning. Strict partial-order resolver, passive actuator-latency observation (no boot-time probe).

Terminal window
# Calibrate + serve. First run: ~5-7s for the measurement pass.
# Subsequent runs hit the cache and start instantly.
reflex serve ./my-export/ --embodiment franka --auto-calibrate
# Override cache location (e.g., to ship a frozen cache inside a container)
reflex serve ./my-export/ --auto-calibrate \
--calibration-cache /opt/reflex/calibration.json
# Force re-measurement on next start (after hardware swap or driver upgrade)
reflex serve ./my-export/ --auto-calibrate --calibrate-force

When --auto-calibrate is unset, behavior is unchanged from baseline. Phase 1 is opt-in; Phase 1.5 will flip to default-on.

Five parameters in strict partial order — each downstream choice narrows by the upstream choice:

ParameterChoicesSelected by
variantfp16, int8, fp8Hardware + which .onnx files exist. fp8 only on sm_89+ (Hopper / Ada Lovelace / Blackwell).
providerTensorRT-EP, CUDA-EP, CPU-EPVariant + --max-batch. TRT-EP requires fp16 + batch=1.
nfe (denoise steps)1, 2, 4, 8, 10Largest NFE such that nfe × measured_expert_step_ms ≤ chunk_period × 0.7. Falls back to NFE=1 (forces SnapFlow path) when no candidate fits.
chunk_sizeembodiment defaultDefault unless even NFE=1 doesn’t fit; then halves.
latency_compensation_msembodiment cold-start (franka=40, so100=60, ur5=40)Cold-start at startup; warm-update from real /act p95 after 30s of traffic.

The resolver runs single-pass at startup. The warm-up tracker continuously refines latency_compensation_ms and writes back to the cache when stable.

Lives at ~/.reflex/calibration.json by default. JSON schema v1:

{
"schema_version": 1,
"reflex_version": "0.7.0",
"calibration_date": "2026-04-25T16:30:00.000000Z",
"hardware_fingerprint": {
"gpu_uuid": "GPU-abc123...",
"gpu_name": "NVIDIA A10G",
"driver_version_major": 535,
"driver_version_minor": 129,
"cuda_version_major": 12,
"cuda_version_minor": 2,
"cpu_count": 8,
"ram_gb": 32
},
"entries": {
"franka::pi05_decomposed_libero_v3": {
"chunk_size": 50,
"nfe": 4,
"latency_compensation_ms": 42.5,
"provider": "TensorrtExecutionProvider",
"variant": "fp16",
"measurement_quality": {
"warmup_iters": 10, "measurement_iters": 100,
"median_ms": 38.2, "p99_ms": 47.1,
"n_outliers_dropped": 10, "quality_score": 0.94
}
}
}
}

Keyed by {embodiment}::{model_hash} so multi-embodiment / multi-model deployments coexist in one cache file.

Terminal window
# Pretty-print
reflex doctor --show-calibration
# Machine-readable (for CI / scripts)
reflex doctor --show-calibration --format json | jq .

The cache is stale when ANY of these is true:

  • Hardware fingerprint mismatches the current host (GPU swap, driver major-version bump, reflex_version change)
  • Cache age > 30 days
  • --calibrate-force is set

Driver patch-version bumps are ignored.

Explicit CLI flags > calibration cache > embodiment_config default.

If you pass --chunk-size 40 AND --auto-calibrate, the explicit 40 wins. A yellow stderr warning makes this visible at startup:

[auto-calibrate] ignoring cached chunk_size=50 because --chunk-size=40

Per the ADR, no active probe at boot — that would risk an unintended first-move on a robot in an unsafe pose. Instead:

  1. Server starts with the embodiment cold-start default for latency_compensation_ms
  2. As real /act traffic flows, the warmup tracker records each request’s wall-clock latency in a rolling 100-sample window
  3. Once 30+ samples accumulate AND the rolling p95 has been stable within 5 ms for 3 consecutive checks, the tracker writes the new value to the cache + persists atomically
  4. Subsequent server restarts pick up the warmed-up value

Customer experience: works correctly from the first request (cold-start default is a reasonable bound), gets steadily more accurate over the first ~30 seconds.

Approximate expectations on Franka:

HardwarevariantproviderNFEchunk_sizelatency_comp_ms (warmed)
Cloud A100-80GBfp16TRT-EP105025-40
Cloud A10G-24GBfp16TRT-EP4-85035-50
Jetson AGX Orin (64GB)fp16TRT-EP45050-70
Jetson Orin Nano (8GB)fp16CUDA-EP1 (SnapFlow path)5080-120
H100 (Hopper)fp8TRT-EP105020-30

Illustrative. Your numbers depend on the export, the model, and customer-specific traffic shape.

auto-calibrate: cache stale (fingerprint mismatch ...) — expected after hardware swap or driver upgrade. Re-measurement runs on the next start.

No NFE fits budget — falling back to NFE=1 — your hardware × embodiment × model has no legal NFE that fits the chunk-period budget. Switch to a SnapFlow-distilled student (single-step inherently), or lower the embodiment’s control.frequency_hz.

reflex doctor --show-calibration says “No calibration cache found” — you haven’t run reflex serve --auto-calibrate yet on this host. Or the cache lives at a different path.