Troubleshooting
If you hit something not on this page, run reflex doctor first — 10 falsifiable checks against your environment + export catch most issues. Then check GitHub issues or ask in Discord.
Install
Section titled “Install”pip install reflex-vla[serve,gpu] succeeds but reflex go fails with TensorRT errors
Section titled “pip install reflex-vla[serve,gpu] succeeds but reflex go fails with TensorRT errors”The [serve,gpu] extra pulls tensorrt>=10 and the NVIDIA CUDA libraries, but on some hosts the runtime can’t find them. Verify with reflex doctor:
reflex doctorLook for these specific checks:
- TensorRT runtime (
libnvinfer.so.10) — loadable - CUDA cuBLAS (
libcublas.so.12) — loadable - CUDA cuDNN (
libcudnn.so.9) — loadable - ORT-TRT EP active
If any are red ✗, the hint says exactly which pip install to run. Most common cause is [serve,gpu-min] (no TRT) or an old release that didn’t pull tensorrt automatically.
Easiest fix on a fresh GPU box: use NVIDIA’s container image as your environment.
docker run --gpus all -it nvcr.io/nvidia/tensorrt:24.10-py3apt-get install -y clang # for lerobot → evdevpip install 'reflex-vla[serve,gpu,monolithic]'--device cuda errors with “CUDAExecutionProvider not available”
Section titled “--device cuda errors with “CUDAExecutionProvider not available””The pip wheel for ONNX Runtime is the CPU-only version. Either:
# Get the GPU buildpip uninstall onnxruntimepip install onnxruntime-gpu==1.20.*Or just install with the right extra: pip install 'reflex-vla[serve,gpu]'. The [gpu] extra excludes the CPU wheel and includes the GPU one.
Install fails on Apple Silicon Mac
Section titled “Install fails on Apple Silicon Mac”CPU-only path:
pip install 'reflex-vla[serve,onnx,monolithic]'The [gpu] extra requires NVIDIA hardware; it’ll fail on macOS regardless.
clang errors on Linux
Section titled “clang errors on Linux”lerobot → evdev requires a compiler:
apt-get install -y clang# or on Debian/Ubuntu:apt-get install -y build-essentialExport
Section titled “Export”reflex export <model> fails partway through
Section titled “reflex export <model> fails partway through”Run with reflex doctor first to verify the environment. Then look at the error class:
| Error message | Cause | Fix |
|---|---|---|
Where Cast missing | post-export Where Cast wasn’t inserted | v0.5+ does this automatically. If you’re on older, re-run with --patch-where-cast. |
InsufficientVRAMError (decomposed parallel) | Not enough VRAM for parallel export of two ONNX sessions | Drop --export-mode parallel, or use a higher-VRAM host |
max_abs_diff > 1e-4 | PyTorch and ONNX disagree at the threshold | File a GitHub issue — this is a real bug, not user error |
transformers.Cache type incompatible | Wrong transformers version | Pin: pip install 'transformers==5.3.0' |
OOM during torch.export trace | RAM not VRAM — the trace itself is heavy | Run on a host with ≥ 32 GB RAM, or use --device cpu --mode monolithic (slower trace, less RAM) |
Export succeeds but parity check fails
Section titled “Export succeeds but parity check fails”max_abs_diff_across_all 3.21e-04passed FAILThis means the exported ONNX disagrees with PyTorch at the third decimal place. Don’t deploy this artifact to a robot. It will silently misbehave.
Most common causes:
- Float64 vs float32 in inputs. Cast all inputs to
float32before serving. ONNX Runtime silently truncates float64 to float32, dropping fps to ~0.3. - Custom transformers version. Reflex pins
transformers==5.3.0for a reason — earlier versions produced incorrect ONNX exports for pi0/pi0.5. - You’re comparing against the wrong PyTorch reference.
reflex validate export ./p0 --model lerobot/pi0_basevalidates againstlerobot/pi0_basespecifically. Use--reference <hf_id>to compare against a different model.
Server starts but /act returns 503 forever
Section titled “Server starts but /act returns 503 forever”The server is in degraded state. Check /health:
curl http://localhost:8000/healthIf status is degraded, the circuit breaker tripped — N consecutive /act failures. Look at the server logs to see the underlying error. After fixing, restart the server (or wait for the cooldown if --max-consecutive-crashes is N > 5).
If status is warming, the cold-start hasn’t finished. First serve takes 10-70s. /health returns 503 until ready (load balancers correctly skip the server during this window).
Latency is much higher than expected
Section titled “Latency is much higher than expected”In order of likelihood:
- TRT EP isn’t actually active. Run
reflex doctorand check for “ORT-TRT EP active.” If it’s not, you’re falling back to ORT-CUDA, which is 5.55× slower. - The export is monolithic on a model where decomposed would help. For pi0.5, re-export with
--mode decomposed. - You’re on a hardware tier that doesn’t capture CUDA graphs. A10G
vlm_prefixfalls back to eager. Expected; expert_denoise still captures. - CUDA context contention. Another process is using the same GPU. Either move it or use a GPU not shared.
- Cold start. First
/actper session is 100-400ms slower than subsequent ones (capture + warmup costs). Use--prewarm(default) so this happens at startup.
/act returns “queue_full” 503 errors under load
Section titled “/act returns “queue_full” 503 errors under load”The cost-budget batching scheduler is dropping requests because the runtime queue hit capacity (default 1000 pending).
{ "error": "queue_full", "message": "policy runtime queue at capacity", "policy_id": "prod", "max_queue": 1000}Two fixes:
# Allow more pending requests (memory cost; usually fine on edge GPUs)reflex serve ./my-export/ --max-queue-size 5000
# Or tune the batching scheduler for higher throughputreflex serve ./my-export/ --max-batch-cost-ms 250For sustained traffic above what one GPU can handle, scale horizontally — one Reflex process per robot, not one process serving N robots. See fleet telemetry.
Server crashes on first /act with Blackwell GPU (RTX 5090)
Section titled “Server crashes on first /act with Blackwell GPU (RTX 5090)”Known limitation. ORT’s bundled cuBLAS / cuDNN don’t ship sm_100 kernels yet. Workarounds:
reflex chatworks without GPU — natural-language CLI mode, no inferencereflex doctor,reflex models listall work/actitself needs a non-Blackwell GPU temporarily — Modal A10G/A100, RTX 4090, etc.
Tracking ORT upstream for Blackwell support; v0.7+ may add a TensorRT-LLM runtime path.
--cuda-graphs enabled but I see no speedup
Section titled “--cuda-graphs enabled but I see no speedup”You’re not on the decomposed dispatch path. The flag applies to Pi05DecomposedInference dispatch (used in Modal scripts today; production wire-up pending the decomposed-dispatch fix).
In that case, the startup log reads:
--cuda-graphs was set but this backend does not consume the flag.Fix: re-export with --mode decomposed, or wait for the legacy backend wire-up.
Embodiments
Section titled “Embodiments”embodiment.normalization.mean_action length mismatch
Section titled “embodiment.normalization.mean_action length mismatch”The mean_action / std_action arrays don’t match action_space.dim. Fix the embodiment JSON:
{ "action_space": {"dim": 7, "ranges": [...]}, "normalization": { "mean_action": [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], // length 7 "std_action": [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0] // length 7 }}gripper-idx-out-of-range
Section titled “gripper-idx-out-of-range”gripper.component_idx >= action_space.dim. Pick the right index (0-based) for whichever action component controls the gripper.
duplicate-camera-name
Section titled “duplicate-camera-name”Two cameras with the same name. Rename one.
reflex eval fails with osmesa-compile-hang
Section titled “reflex eval fails with osmesa-compile-hang”Cold containers take 60-180s for first-scene compile. Bump:
reflex eval ./my-export/ --preflight-timeout 600If reproducible, file an issue with the stderr from the smoke test.
reflex eval says “Estimate exceeds $50 guardrail”
Section titled “reflex eval says “Estimate exceeds $50 guardrail””The guardrail is conservative. If you really want a 1000-episode run, just hit Enter — it’ll proceed.
All episodes returned adapter_error (exit 5)
Section titled “All episodes returned adapter_error (exit 5)”Either:
- Modal subprocess crashed mid-run — check
report.jsonfor the per-episodeerror_message. The first ~500 chars of stderr surface there. - You’re on
--runtime localand the local runner is broken — use--runtime modalfor now. - The Modal
run_libero_*script’s stdout format changed — the parser pins on====== <suite> (ONNX monolithic) ======headers. File a bug if the script was bumped.
When all else fails
Section titled “When all else fails”reflex doctor --format json | jq .Capture the full doctor output + the relevant log excerpt + your platform info, and either:
- File a GitHub issue
- Drop into Discord
#deploymentschannel - Email hello@fastcrest.com for Pro / Enterprise customers
Don’t suffer alone — most common errors have known fixes; the diagnosis is usually faster with a second pair of eyes.