Supported hardware
Hardware targets
Section titled “Hardware targets”| Target | Hardware | Memory | Precision |
|---|---|---|---|
orin-nano | Jetson Orin Nano | 8 GB | fp16 |
orin | Jetson Orin | 32 GB | fp16 |
orin-64 | Jetson Orin 64 | 64 GB | fp16 |
thor | Jetson Thor | 128 GB | fp8 |
desktop | RTX / A100 | 40 GB | fp16 |
reflex inspect targets lists current profiles.
Supported GPU architectures
Section titled “Supported GPU architectures”| Architecture | Compute | Status | Notes |
|---|---|---|---|
| Ampere (RTX 30-series, A10G, A100) | sm_8.0–8.6 | Supported | Tested on Modal A10G + A100, RTX 4090 |
| Ada Lovelace (RTX 40-series, L4) | sm_8.9 | Supported | |
| Hopper (H100, H200) | sm_9.0 | Supported | |
| Jetson Orin (Orin Nano / NX / AGX) | sm_8.7 | Supported | JetPack 5.x or 6.x |
| Jetson Thor | sm_10.x | Untested | Same Blackwell silicon as desktop, but ORT-bundled CUDA EP needs Blackwell support |
| Blackwell desktop (RTX 5090, RTX PRO 6000, B200, GB200) | sm_10.0 | Not yet supported | ORT’s bundled cuBLAS/cuDNN don’t ship sm_100 kernels |
| Older NVIDIA (Turing RTX 20, GTX 16) | sm_7.5 | Best-effort | Should work but not in CI matrix |
| Pre-Tensor-Core (Maxwell Jetson Nano 4 GB, GTX 9-series) | sm_5.x | Not supported | NVIDIA EOL’d this hardware at JetPack 4.6 (Python 3.6) |
Performance
Section titled “Performance”reflex-vla[serve,gpu] (v0.7+) uses ONNX Runtime’s TensorRT execution provider out of the box. Measured on Modal A10G (Ampere, sm_8.6) on 2026-04-29 against SmolVLA monolithic (5 warmup + 20 measured forward passes, batch=1):
| Provider | Mean latency | p95 |
|---|---|---|
CUDAExecutionProvider (ORT-CUDA fallback) | 108.11 ms | 108.68 ms |
TensorrtExecutionProvider (default in v0.7+) | 19.49 ms | 19.71 ms |
5.55× faster. The win comes from TensorRT’s FP16 kernels + engine fusion. Older releases silently fell back to ORT-CUDA on most installs because libnvinfer.so.10 and CUDA libs weren’t on LD_LIBRARY_PATH — v0.7’s [serve,gpu] extras pull tensorrt>=10 and reflex patches LD_LIBRARY_PATH automatically at import.
Verify the win
Section titled “Verify the win”reflex doctorLook for green ✓ on:
- TensorRT runtime (
libnvinfer.so.10) — loadable - CUDA cuBLAS (
libcublas.so.12) — loadable - CUDA cuDNN (
libcudnn.so.9) — loadable - ORT-TRT EP active — session created with TRT EP in active providers
If any are red ✗, the remediation hint says exactly which pip install to run. The most common cause is using [serve,gpu-min] or an older release that didn’t pull tensorrt automatically.