Troubleshooting

If you hit something not on this page, run reflex doctor first — 10 falsifiable checks against your environment + export catch most issues. Then check GitHub issues or ask in Discord.

Install

`pip install reflex-vla[serve,gpu]` succeeds but `reflex go` fails with TensorRT errors

The [serve,gpu] extra pulls tensorrt>=10 and the NVIDIA CUDA libraries, but on some hosts the runtime can’t find them. Verify with reflex doctor:

reflex doctor

Look for these specific checks:

TensorRT runtime (libnvinfer.so.10) — loadable
CUDA cuBLAS (libcublas.so.12) — loadable
CUDA cuDNN (libcudnn.so.9) — loadable
ORT-TRT EP active

If any are red ✗, the hint says exactly which pip install to run. Most common cause is [serve,gpu-min] (no TRT) or an old release that didn’t pull tensorrt automatically.

Easiest fix on a fresh GPU box: use NVIDIA’s container image as your environment.

docker run --gpus all -it nvcr.io/nvidia/tensorrt:24.10-py3
apt-get install -y clang   # for lerobot → evdev
pip install 'reflex-vla[serve,gpu,monolithic]'

`--device cuda` errors with “CUDAExecutionProvider not available”

The pip wheel for ONNX Runtime is the CPU-only version. Either:

# Get the GPU build
pip uninstall onnxruntime
pip install onnxruntime-gpu==1.20.*

Or just install with the right extra: pip install 'reflex-vla[serve,gpu]'. The [gpu] extra excludes the CPU wheel and includes the GPU one.

Install fails on Apple Silicon Mac

CPU-only path:

pip install 'reflex-vla[serve,onnx,monolithic]'

The [gpu] extra requires NVIDIA hardware; it’ll fail on macOS regardless.

`clang` errors on Linux

lerobot → evdev requires a compiler:

apt-get install -y clang
# or on Debian/Ubuntu:
apt-get install -y build-essential

Export

`reflex export <model>` fails partway through

Run with reflex doctor first to verify the environment. Then look at the error class:

Error message	Cause	Fix
`Where Cast missing`	post-export Where Cast wasn’t inserted	v0.5+ does this automatically. If you’re on older, re-run with `--patch-where-cast`.
`InsufficientVRAMError` (decomposed parallel)	Not enough VRAM for parallel export of two ONNX sessions	Drop `--export-mode parallel`, or use a higher-VRAM host
`max_abs_diff > 1e-4`	PyTorch and ONNX disagree at the threshold	File a GitHub issue — this is a real bug, not user error
`transformers.Cache type incompatible`	Wrong transformers version	Pin: `pip install 'transformers==5.3.0'`
`OOM during torch.export trace`	RAM not VRAM — the trace itself is heavy	Run on a host with ≥ 32 GB RAM, or use `--device cpu --mode monolithic` (slower trace, less RAM)

Export succeeds but parity check fails

max_abs_diff_across_all  3.21e-04
passed                   FAIL

This means the exported ONNX disagrees with PyTorch at the third decimal place. Don’t deploy this artifact to a robot. It will silently misbehave.

Most common causes:

Float64 vs float32 in inputs. Cast all inputs to float32 before serving. ONNX Runtime silently truncates float64 to float32, dropping fps to ~0.3.
Custom transformers version. Reflex pins transformers==5.3.0 for a reason — earlier versions produced incorrect ONNX exports for pi0/pi0.5.
You’re comparing against the wrong PyTorch reference. reflex validate export ./p0 --model lerobot/pi0_base validates against lerobot/pi0_base specifically. Use --reference <hf_id> to compare against a different model.

Serve

Server starts but `/act` returns 503 forever

The server is in degraded state. Check /health:

curl http://localhost:8000/health

If status is degraded, the circuit breaker tripped — N consecutive /act failures. Look at the server logs to see the underlying error. After fixing, restart the server (or wait for the cooldown if --max-consecutive-crashes is N > 5).

If status is warming, the cold-start hasn’t finished. First serve takes 10-70s. /health returns 503 until ready (load balancers correctly skip the server during this window).

Latency is much higher than expected

In order of likelihood:

TRT EP isn’t actually active. Run reflex doctor and check for “ORT-TRT EP active.” If it’s not, you’re falling back to ORT-CUDA, which is 5.55× slower.
The export is monolithic on a model where decomposed would help. For pi0.5, re-export with --mode decomposed.
You’re on a hardware tier that doesn’t capture CUDA graphs. A10G vlm_prefix falls back to eager. Expected; expert_denoise still captures.
CUDA context contention. Another process is using the same GPU. Either move it or use a GPU not shared.
Cold start. First /act per session is 100-400ms slower than subsequent ones (capture + warmup costs). Use --prewarm (default) so this happens at startup.

`/act` returns “queue_full” 503 errors under load

The cost-budget batching scheduler is dropping requests because the runtime queue hit capacity (default 1000 pending).

{
  "error": "queue_full",
  "message": "policy runtime queue at capacity",
  "policy_id": "prod",
  "max_queue": 1000
}

Two fixes:

# Allow more pending requests (memory cost; usually fine on edge GPUs)
reflex serve ./my-export/ --max-queue-size 5000

# Or tune the batching scheduler for higher throughput
reflex serve ./my-export/ --max-batch-cost-ms 250

For sustained traffic above what one GPU can handle, scale horizontally — one Reflex process per robot, not one process serving N robots. See fleet telemetry.

Server crashes on first /act with Blackwell GPU (RTX 5090)

Known limitation. ORT’s bundled cuBLAS / cuDNN don’t ship sm_100 kernels yet. Workarounds:

reflex chat works without GPU — natural-language CLI mode, no inference
reflex doctor, reflex models list all work
/act itself needs a non-Blackwell GPU temporarily — Modal A10G/A100, RTX 4090, etc.

Tracking ORT upstream for Blackwell support; v0.7+ may add a TensorRT-LLM runtime path.

`--cuda-graphs` enabled but I see no speedup

You’re not on the decomposed dispatch path. The flag applies to Pi05DecomposedInference dispatch (used in Modal scripts today; production wire-up pending the decomposed-dispatch fix).

In that case, the startup log reads:

--cuda-graphs was set but this backend does not consume the flag.

Fix: re-export with --mode decomposed, or wait for the legacy backend wire-up.

Embodiments

`embodiment.normalization.mean_action length mismatch`

The mean_action / std_action arrays don’t match action_space.dim. Fix the embodiment JSON:

{
  "action_space": {"dim": 7, "ranges": [...]},
  "normalization": {
    "mean_action": [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0],   // length 7
    "std_action":  [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0]    // length 7
  }
}

`gripper-idx-out-of-range`

gripper.component_idx >= action_space.dim. Pick the right index (0-based) for whichever action component controls the gripper.

`duplicate-camera-name`

Two cameras with the same name. Rename one.

Eval

`reflex eval` fails with `osmesa-compile-hang`

Cold containers take 60-180s for first-scene compile. Bump:

reflex eval ./my-export/ --preflight-timeout 600

If reproducible, file an issue with the stderr from the smoke test.

`reflex eval` says “Estimate exceeds $50 guardrail”

The guardrail is conservative. If you really want a 1000-episode run, just hit Enter — it’ll proceed.

`All episodes returned adapter_error` (exit 5)

Either:

Modal subprocess crashed mid-run — check report.json for the per-episode error_message. The first ~500 chars of stderr surface there.
You’re on --runtime local and the local runner is broken — use --runtime modal for now.
The Modal run_libero_* script’s stdout format changed — the parser pins on ====== <suite> (ONNX monolithic) ====== headers. File a bug if the script was bumped.

When all else fails

reflex doctor --format json | jq .

Capture the full doctor output + the relevant log excerpt + your platform info, and either:

File a GitHub issue
Drop into Discord #deployments channel
Email hello@fastcrest.com for Pro / Enterprise customers

Don’t suffer alone — most common errors have known fixes; the diagnosis is usually faster with a second pair of eyes.

Troubleshooting

Install

pip install reflex-vla[serve,gpu] succeeds but reflex go fails with TensorRT errors

--device cuda errors with “CUDAExecutionProvider not available”

Install fails on Apple Silicon Mac

clang errors on Linux

Export

reflex export <model> fails partway through

Export succeeds but parity check fails

Serve

Server starts but /act returns 503 forever

Latency is much higher than expected

/act returns “queue_full” 503 errors under load