Getting started

This guide assumes a Linux box with an NVIDIA GPU. CPU-only deployments work too — every example below applies, just install with [serve,onnx] instead of [serve,gpu].

1. Pick a model and deploy it

The fastest path: reflex go does everything in one command.

# Smallest, fastest to download — good for first try
reflex go --model smolvla-base

# pi0 — Physical Intelligence's flagship 3.5B model
reflex go --model pi0-base

# pi0.5 — newer pi0 variant with AdaRMSNorm time conditioning
reflex go --model pi05-base

# GR00T N1.6 — NVIDIA's humanoid model (3.29B)
reflex go --model groot-n16

Under the hood, reflex go:

Probes hardware (reflex doctor) and picks per-platform tuning (FP16 on Orin, FP8 on Thor, etc.)
Downloads the checkpoint from HuggingFace (cached after first run)
Runs the model-specific exporter to ONNX
Validates the ONNX numerically against the PyTorch reference (cos = +1.000000)
If TensorRT is installed, builds and caches an engine
Starts an HTTP server on :8000

After the first run, your export cache looks like:

~/.cache/reflex/exports/smolvla-base/
├── expert_stack.onnx       # the graph (1.25 MB)
├── expert_stack.onnx.data  # the weights (~1.3 GB for pi0)
├── reflex_config.json      # model meta — used by serve
└── expert_stack.trt        # TRT engine (only if trtexec was available)

2. Send the server an action request

reflex go left a server running on http://localhost:8000 with three endpoints:

POST /act — send {instruction, state, image?}, get back a 50-step action chunk
GET /health — returns {status, model_loaded, inference_mode}
GET /config — returns the saved reflex_config

Test it:

curl -X POST http://localhost:8000/act \
  -H 'content-type: application/json' \
  -d '{"instruction":"pick up the red cup","state":[0.1,0.2,0.3,0.4,0.5,0.6]}'

You’ll get back:

{
  "actions": [[...], [...], ...],
  "num_actions": 50,
  "action_dim": 32,
  "latency_ms": 47.3,
  "hz": 21.1,
  "denoising_steps": 10,
  "inference_mode": "onnx_trt_fp16",
  "guard_clamped": false
}

3. Add per-robot normalization

reflex go without an embodiment returns raw, unscaled actions — fine for a smoke test. For real deployment, add --embodiment:

reflex go --model pi05-base --embodiment franka

Reflex ships embodiment configs for franka, so100, and ur5. Custom robots use --custom-embodiment-config <path>.

The embodiment maps action ranges to robot joint limits and engages ActionGuard for clamping unsafe outputs. The response now includes guard_clamped and guard_violations fields.

4. Talk to it instead

For non-experts, reflex chat is a natural-language wrapper around the entire CLI:

reflex chat

you › what version am I running and what hardware can I deploy to?

  → show_version({})    → reflex --version    → "reflex 0.7.0"
  → list_targets({})    → reflex targets      → [orin-nano, orin, orin-64, thor, desktop]

You're running reflex 0.7.0. Supported targets:
  - orin-nano — Jetson Orin Nano: 8 GB, fp16
  - orin — Jetson AGX Orin 32GB: 32 GB, fp16
  - orin-64 — Jetson AGX Orin 64GB: 64 GB, fp16
  - thor — Jetson Thor: 128 GB, fp8
  - desktop — Desktop GPU (RTX 4090 / A100 / H100): 24 GB, fp16

Chat understands 17 commands and runs them as subprocess on your behalf. Powered by GPT-5 Mini through a hosted proxy at chat.fastcrest.com — free tier is 100 calls/day per machine, no signup, no API key.

Deploy a model in one command — the reflex go flow in depth
Supported models — full parity table and per-model notes
Supported hardware — Jetson + desktop GPU compatibility matrix