serve — runtime + safety
reflex serve <export_dir> is the production entry point. It reads the export, picks the right ONNX provider, applies the embodiment config, and listens on :8000 for /act, /health, /config, and /metrics.
Minimal
Section titled “Minimal”reflex serve ./my-export/Returns unscaled raw actions on /act. Fine for smoke tests.
With every wedge
Section titled “With every wedge”reflex serve ./my-export/ \ --embodiment franka \ --safety-config ./robot_limits.json \ --adaptive-steps \ --deadline-ms 33 \ --cloud-fallback http://cloud:8000 \ --inject-latency-ms 0 \ --record /tmp/traces \ --max-consecutive-crashes 5 \ --cuda-graphs \ --auto-calibrate \ --slo p99=150ms \ --robot-id warehouse-01| Flag | Wedge | Doc |
|---|---|---|
--embodiment | per-robot action ranges + ActionGuard clamping | embodiments |
--safety-config | URDF-derived joint limits + EU AI Act audit log | guard |
--adaptive-steps | stop denoise loop early on velocity convergence | this page |
--deadline-ms | return last-known-good action if over budget | this page |
--cloud-fallback | edge-first with cloud backup | this page |
--inject-latency-ms | synthetic delay (matches A2C2 paper methodology) | a2c2 |
--record | JSONL request/response capture | record & replay |
--max-consecutive-crashes | circuit breaker (503 + Retry-After: 60 on trip) | this page |
--cuda-graphs | capture and replay ORT sessions | cuda graphs |
--auto-calibrate | hardware-tier auto-config | auto-calibrate |
--slo | rolling p99 enforcement | SLO |
--robot-id | per-robot Prometheus identity | fleet |
Every response surfaces telemetry from each enabled wedge. Example:
{ "actions": [[...], [...], ...], "latency_ms": 11.9, "inference_mode": "onnx_trt_fp16", "guard_clamped": false, "guard_violations": [], "adaptive_enabled": true, "adaptive_steps_used": 7, "injected_latency_ms": 0, "robot_id": "warehouse-01"}Endpoints
Section titled “Endpoints”| Path | Method | Returns |
|---|---|---|
/act | POST | Action chunk + telemetry |
/health | GET | {status, model_loaded, inference_mode, robot_id} |
/config | GET | Saved reflex_config.json |
/metrics | GET | Prometheus exposition format |
/health returns HTTP 503 during the cold-start warmup window (10-70 seconds depending on model). Load balancers correctly skip the server during this period.
Adaptive denoising (--adaptive-steps)
Section titled “Adaptive denoising (--adaptive-steps)”For flow-matching models (SmolVLA, pi0, pi0.5), the denoise loop normally runs 10 fixed Euler steps. --adaptive-steps measures velocity-field convergence between consecutive steps and stops early when velocity stabilizes. Per-step telemetry exposes how many steps were actually taken.
Typical wins: 30-40% fewer steps on cache-hit calls, 0% wins on cache-miss. Composes with --cuda-graphs (the captured graph still benefits when the loop exits early).
Deadline guard (--deadline-ms)
Section titled “Deadline guard (--deadline-ms)”A hard wall on per-/act latency. If inference doesn’t return within the budget, the server returns the last-known-good action chunk and emits a Prometheus counter. Use when your downstream control loop has a strict tick budget and a stale action is better than a missed tick.
reflex serve ./my-export/ --deadline-ms 33 # 30 Hz tick budgetCloud fallback (--cloud-fallback)
Section titled “Cloud fallback (--cloud-fallback)”Edge-first deployment with a cloud backup. If the local inference fails (OOM, crashed engine, timeout), the request transparently retries against the cloud URL. Useful for “my Jetson is the primary but I want a cloud safety net.”
reflex serve ./my-export/ --cloud-fallback https://cloud.example.comThe cloud endpoint must speak the same /act API as the local server (run reflex serve there too).
Circuit breaker (--max-consecutive-crashes)
Section titled “Circuit breaker (--max-consecutive-crashes)”If the model raises N consecutive exceptions, the server enters a degraded state, returns HTTP 503 with Retry-After: 60, and reports degraded on /health. Reset by a successful /act call after the cooldown.
Default 5. Set lower (2) for a paranoid early-warning posture, higher (10) for a permissive one.