Skip to content

Supported models

ModelHF IDParamsExport status
SmolVLAlerobot/smolvla_base450MONNX validated (max_diff=3.3e-06)
pi0lerobot/pi0_base3.5BONNX validated (max_diff=6.0e-08)
pi0.5lerobot/pi05_base3.62BONNX validated (max_diff=2.38e-07)
GR00T N1.6nvidia/GR00T-N1.6-3B3.29BONNX validated (max_diff=8.34e-07, live VLM conditioning)
OpenVLAopenvla/openvla-7b7.5Boptimum-cli export onnx + reflex.postprocess.openvla.decode_actions

reflex models list browses the curated registry; reflex models info <id> shows benchmarks; reflex models pull <id> downloads.

OpenVLA is a vanilla Llama-2-7B VLM — there’s no custom action expert to reconstruct, so we defer to the standard HuggingFace export path and ship only the bin-to-continuous postprocess helper.

Four ONNX artifacts in production, measured against PyTorch on shared seeded inputs:

ArtifactReferencefirst-action max_absverdict
SmolVLA ONNX, num_steps=10 (production default)sample_actions(num_steps=10)5.96e-07machine precision
pi0 ONNX, num_steps=10 (production default)sample_actions(num_steps=10)2.09e-07machine precision
pi0.5 ONNX, num_steps=10 (production default)sample_actions(num_steps=10)2.38e-07machine precision
GR00T N1.6 ONNX, single-step DiT (DDPM, loop external)GR00TFullStack.forward8.34e-07machine precision
GR00T N1.6 end-to-end 4-step denoise loopPython loop over PyTorch ref4.77e-07machine precision
GR00T N1.6 Eagle VLM ONNX (SigLIP + Qwen3 + mlp1, 1.87B)EagleExportStack PyTorch4.25e-04machine precision
GR00T N1.6 DiT with real VLM KV (5-input expert_stack_with_vlm.onnx)GR00TFullStack(state, vlm_kv)1.78e-05machine precision
GR00T N1.6 end-to-end two-ONNX chain (Eagle → DiT)same chain in PyTorch1.90e-05parity + image-driven sensitivity verified
SmolVLA ONNX, num_steps=1sample_actions(num_steps=1)1.55e-06machine precision
pi0 ONNX, num_steps=1sample_actions(num_steps=1)1.43e-06machine precision

Flow-matching VLAs (SmolVLA, pi0, pi0.5) canonically integrate the velocity field with 10 Euler steps — the ONNX bakes in the unrolled loop. GR00T is DDPM-style diffusion with 4 canonical steps — the ONNX exports one velocity step, and reflex serve wraps it in the loop. All four match canonical PyTorch to machine precision.

Memory fit on disk (monolithic ONNX, FP32)

Section titled “Memory fit on disk (monolithic ONNX, FP32)”
ModelDisk size
SmolVLA1.6 GB
pi012.5 GB
pi0.513.0 GB
GR00T4.4 GB

SmolVLA fits comfortably on Orin Nano 8 GB. pi0 realistically needs Orin 16 GB+ or a desktop NVIDIA GPU — the 12.5 GB monolithic ONNX cannot load on the 8 GB Orin Nano even in FP16 (~6 GB weights plus activations + OS). FP16 engine rebuild + Orin Nano fit work is tracked for a future release.