Getting started
This guide assumes a Linux box with an NVIDIA GPU. CPU-only deployments work too — every example below applies, just install with [serve,onnx] instead of [serve,gpu].
1. Pick a model and deploy it
Section titled “1. Pick a model and deploy it”The fastest path: reflex go does everything in one command.
# Smallest, fastest to download — good for first tryreflex go --model smolvla-base
# pi0 — Physical Intelligence's flagship 3.5B modelreflex go --model pi0-base
# pi0.5 — newer pi0 variant with AdaRMSNorm time conditioningreflex go --model pi05-base
# GR00T N1.6 — NVIDIA's humanoid model (3.29B)reflex go --model groot-n16Under the hood, reflex go:
- Probes hardware (
reflex doctor) and picks per-platform tuning (FP16 on Orin, FP8 on Thor, etc.) - Downloads the checkpoint from HuggingFace (cached after first run)
- Runs the model-specific exporter to ONNX
- Validates the ONNX numerically against the PyTorch reference (cos = +1.000000)
- If TensorRT is installed, builds and caches an engine
- Starts an HTTP server on
:8000
After the first run, your export cache looks like:
~/.cache/reflex/exports/smolvla-base/├── expert_stack.onnx # the graph (1.25 MB)├── expert_stack.onnx.data # the weights (~1.3 GB for pi0)├── reflex_config.json # model meta — used by serve└── expert_stack.trt # TRT engine (only if trtexec was available)2. Send the server an action request
Section titled “2. Send the server an action request”reflex go left a server running on http://localhost:8000 with three endpoints:
POST /act— send{instruction, state, image?}, get back a 50-step action chunkGET /health— returns{status, model_loaded, inference_mode}GET /config— returns the savedreflex_config
Test it:
curl -X POST http://localhost:8000/act \ -H 'content-type: application/json' \ -d '{"instruction":"pick up the red cup","state":[0.1,0.2,0.3,0.4,0.5,0.6]}'You’ll get back:
{ "actions": [[...], [...], ...], "num_actions": 50, "action_dim": 32, "latency_ms": 47.3, "hz": 21.1, "denoising_steps": 10, "inference_mode": "onnx_trt_fp16", "guard_clamped": false}3. Add per-robot normalization
Section titled “3. Add per-robot normalization”reflex go without an embodiment returns raw, unscaled actions — fine for a smoke test. For real deployment, add --embodiment:
reflex go --model pi05-base --embodiment frankaReflex ships embodiment configs for franka, so100, and ur5. Custom robots use --custom-embodiment-config <path>.
The embodiment maps action ranges to robot joint limits and engages ActionGuard for clamping unsafe outputs. The response now includes guard_clamped and guard_violations fields.
4. Talk to it instead
Section titled “4. Talk to it instead”For non-experts, reflex chat is a natural-language wrapper around the entire CLI:
reflex chatyou › what version am I running and what hardware can I deploy to?
→ show_version({}) → reflex --version → "reflex 0.7.0" → list_targets({}) → reflex targets → [orin-nano, orin, orin-64, thor, desktop]
You're running reflex 0.7.0. Supported targets: - orin-nano — Jetson Orin Nano: 8 GB, fp16 - orin — Jetson AGX Orin 32GB: 32 GB, fp16 - orin-64 — Jetson AGX Orin 64GB: 64 GB, fp16 - thor — Jetson Thor: 128 GB, fp8 - desktop — Desktop GPU (RTX 4090 / A100 / H100): 24 GB, fp16Chat understands 17 commands and runs them as subprocess on your behalf. Powered by GPT-5 Mini through a hosted proxy at chat.fastcrest.com — free tier is 100 calls/day per machine, no signup, no API key.
- Deploy a model in one command — the
reflex goflow in depth - Supported models — full parity table and per-model notes
- Supported hardware — Jetson + desktop GPU compatibility matrix