Skip to content

Getting started

This guide assumes a Linux box with an NVIDIA GPU. CPU-only deployments work too — every example below applies, just install with [serve,onnx] instead of [serve,gpu].

The fastest path: reflex go does everything in one command.

Terminal window
# Smallest, fastest to download — good for first try
reflex go --model smolvla-base
# pi0 — Physical Intelligence's flagship 3.5B model
reflex go --model pi0-base
# pi0.5 — newer pi0 variant with AdaRMSNorm time conditioning
reflex go --model pi05-base
# GR00T N1.6 — NVIDIA's humanoid model (3.29B)
reflex go --model groot-n16

Under the hood, reflex go:

  1. Probes hardware (reflex doctor) and picks per-platform tuning (FP16 on Orin, FP8 on Thor, etc.)
  2. Downloads the checkpoint from HuggingFace (cached after first run)
  3. Runs the model-specific exporter to ONNX
  4. Validates the ONNX numerically against the PyTorch reference (cos = +1.000000)
  5. If TensorRT is installed, builds and caches an engine
  6. Starts an HTTP server on :8000

After the first run, your export cache looks like:

~/.cache/reflex/exports/smolvla-base/
├── expert_stack.onnx # the graph (1.25 MB)
├── expert_stack.onnx.data # the weights (~1.3 GB for pi0)
├── reflex_config.json # model meta — used by serve
└── expert_stack.trt # TRT engine (only if trtexec was available)

reflex go left a server running on http://localhost:8000 with three endpoints:

  • POST /act — send {instruction, state, image?}, get back a 50-step action chunk
  • GET /health — returns {status, model_loaded, inference_mode}
  • GET /config — returns the saved reflex_config

Test it:

Terminal window
curl -X POST http://localhost:8000/act \
-H 'content-type: application/json' \
-d '{"instruction":"pick up the red cup","state":[0.1,0.2,0.3,0.4,0.5,0.6]}'

You’ll get back:

{
"actions": [[...], [...], ...],
"num_actions": 50,
"action_dim": 32,
"latency_ms": 47.3,
"hz": 21.1,
"denoising_steps": 10,
"inference_mode": "onnx_trt_fp16",
"guard_clamped": false
}

reflex go without an embodiment returns raw, unscaled actions — fine for a smoke test. For real deployment, add --embodiment:

Terminal window
reflex go --model pi05-base --embodiment franka

Reflex ships embodiment configs for franka, so100, and ur5. Custom robots use --custom-embodiment-config <path>.

The embodiment maps action ranges to robot joint limits and engages ActionGuard for clamping unsafe outputs. The response now includes guard_clamped and guard_violations fields.

For non-experts, reflex chat is a natural-language wrapper around the entire CLI:

Terminal window
reflex chat
you › what version am I running and what hardware can I deploy to?
→ show_version({}) → reflex --version → "reflex 0.7.0"
→ list_targets({}) → reflex targets → [orin-nano, orin, orin-64, thor, desktop]
You're running reflex 0.7.0. Supported targets:
- orin-nano — Jetson Orin Nano: 8 GB, fp16
- orin — Jetson AGX Orin 32GB: 32 GB, fp16
- orin-64 — Jetson AGX Orin 64GB: 64 GB, fp16
- thor — Jetson Thor: 128 GB, fp8
- desktop — Desktop GPU (RTX 4090 / A100 / H100): 24 GB, fp16

Chat understands 17 commands and runs them as subprocess on your behalf. Powered by GPT-5 Mini through a hosted proxy at chat.fastcrest.com — free tier is 100 calls/day per machine, no signup, no API key.