Glossary
Asynchronous Action Chunk Correction. A small residual MLP that nudges policy actions when latency is high and success is low. Source paper: arXiv:2509.23224. Bolts onto a trained VLA without retraining the base model.
In Reflex: see a2c2.
Architectural Decision Record. A short markdown document capturing a decision, its alternatives, and the reasoning. Reflex’s ADRs live in the project’s research vault (reflex_context/01_decisions/) — when you see “per ADR 2026-04-25-...” in a docs page, that’s the canonical decision document.
Action chunk
Section titled “Action chunk”The output of one /act call: a sequence of N actions (typically 50) the robot can execute over the next ~1 second of control time. Diffusion-based VLAs produce action chunks rather than single-step actions because predicting a longer horizon reduces compounding error.
ActionGuard
Section titled “ActionGuard”The Reflex safety layer that clamps every action chunk to per-axis limits before returning from /act. Reads from the embodiment config; can be augmented with URDF-derived limits via --safety-config. See guard.
Auto-calibration
Section titled “Auto-calibration”The Reflex feature that selects the right (variant × provider × NFE × chunk_size) configuration for your hardware + embodiment, then passively learns latency_compensation_ms from real /act traffic. Selection, not tuning. See auto-calibrate.
Chunk size
Section titled “Chunk size”Number of actions in one /act response. Embodiment-specific — Franka defaults to 50, SO-100 to 30, UR5 to 50.
CUDA graphs
Section titled “CUDA graphs”NVIDIA feature that captures a sequence of GPU kernel launches into a single replayable graph, eliminating per-launch overhead. Reflex captures the two ONNX sessions (vlm_prefix and expert_denoise) on supported tiers. See cuda graphs.
Decomposed export
Section titled “Decomposed export”A non-default export mode that splits a VLA into vlm_prefix.onnx (vision-language backbone) and expert_denoise.onnx (action expert + one denoise step). The prefix is computed once per cache miss; the expert runs N times per /act. Yields 9× over monolithic on pi0.5. See decomposed pi0.5.
Diffusion / flow-matching policy
Section titled “Diffusion / flow-matching policy”A VLA architecture that produces actions by integrating a learned velocity field from random noise to a clean action over N steps. Flow-matching (pi0, pi0.5, SmolVLA) uses Euler integration; DDPM (GR00T) uses a different schedule. The N steps are the NFE — number of function evaluations.
Embodiment
Section titled “Embodiment”A specific robot configuration: action dimensions, joint limits, gripper config, control frequency, camera setup. Embodiment configs are JSON files; built-ins ship for franka, so100, ur5. See embodiments.
Episode
Section titled “Episode”One contiguous robot task. Within an episode, the camera image and instruction stay roughly stable, so the KV cache from one /act can be reused on the next. Episodes are identified by episode_id in the request body.
Episode cache
Section titled “Episode cache”The Reflex feature that keeps the past_kv cache from one /act call warm and reuses it on subsequent calls in the same episode. Within-call decomposed cache delivers the 9× speedup; cross-call cache is more limited because PaliGemma’s past_kv encodes vision (which changes per frame).
Expert (action expert)
Section titled “Expert (action expert)”The smaller, action-producing half of a decomposed VLA. Takes the cached past_kv from the VLM prefix plus the current denoise timestep t, returns the velocity field for one Euler step.
flow-matching
Section titled “flow-matching”A class of generative model that produces samples by integrating a learned velocity field. Used by SmolVLA, pi0, and pi0.5. Compared to DDPM, flow-matching trains faster and runs equally fast at inference (both produce a chunk in N forward passes). See also: NFE.
NVIDIA’s humanoid VLA, released 2025. 3.29B parameters, DDPM-style diffusion (4 canonical steps), Eagle 2.5 VL backbone (SigLIP + Qwen3 + mlp1, 1.87B params). Designed for the Jetson Thor.
KV cache (past_kv)
Section titled “KV cache (past_kv)”Transformer attention layers cache key/value tensors from previous tokens to avoid recomputing them. Reflex’s decomposed export externalizes this cache so the VLM prefix’s past_kv can be reused across the action expert’s denoise steps.
LIBERO
Section titled “LIBERO”A standard benchmark for vision-language-action models — four task families (spatial, object, goal, 10) on a simulated Franka arm. Reflex uses it as the canonical eval suite (see reflex eval).
LeRobot
Section titled “LeRobot”HuggingFace’s training framework for robot policies. Reflex is the deployment counterpart — reflex export <hf_id> consumes models trained or hosted via LeRobot.
Model Context Protocol. A protocol for AI agents to discover and call tools. Reflex exposes its inference engine as an MCP server so Claude Desktop, Cursor, and custom agents can call a robot policy as a tool. See MCP server.
Monolithic export
Section titled “Monolithic export”The default ONNX export mode — VLM backbone + action expert + N-step denoise loop in one graph. Simpler but slower than decomposed; used for SmolVLA (small enough that it doesn’t matter) and for cases where the decomposed path doesn’t apply.
Number of Function Evaluations. How many forward passes through the action expert per /act call. Flow-matching VLAs canonically use NFE=10; SnapFlow distillation collapses this to NFE=1 with no accuracy loss.
Open Neural Network Exchange. A portable model format that decouples training (PyTorch/JAX) from inference (any ONNX-compatible runtime). Reflex exports to ONNX and runs via ONNX Runtime with the TensorRT execution provider on NVIDIA hardware.
openpi
Section titled “openpi”Physical Intelligence’s open-source repo for pi0/pi0.5. Contains training code, weights, and reference inference. Reflex consumes their weights and adds the deployment toolchain.
Parity (cos = +1.0)
Section titled “Parity (cos = +1.0)”Numerical equivalence between PyTorch and ONNX outputs. Reflex enforces machine-precision parity (max_abs_diff between 1e-7 and 1e-6) on every export, refusing to ship if the threshold fails. See verified parity.
pi0 / pi0.5 / π0.7
Section titled “pi0 / pi0.5 / π0.7”Physical Intelligence’s flagship VLA family. pi0 is 3.5B parameters with PaliGemma backbone; pi0.5 (3.62B) adds AdaRMSNorm time conditioning; π0.7 (April 2026) shifted to closed-weights cloud-API. Reflex supports pi0 and pi0.5 at machine-precision parity.
Policy
Section titled “Policy”A trained robot brain. In Reflex’s vocabulary, “policy” and “model” are interchangeable in most contexts, with “policy” emphasizing the robot-control role.
Policy versioning
Section titled “Policy versioning”Loading two policy slots side-by-side and routing /act traffic deterministically per-episode. Substrate for canary rollouts and self-distill rollback. See policy versioning.
Reflex chat
Section titled “Reflex chat”A natural-language wrapper around the entire reflex CLI. Powered by GPT-5 Mini through a hosted proxy at chat.fastcrest.com. Free 100 calls/day. See reflex chat.
reflex doctor
Section titled “reflex doctor”10 falsifiable diagnostic checks against your environment + export. Each check maps to a known failure mode. See reflex doctor.
reflex eval
Section titled “reflex eval”LIBERO benchmark wrapper — one command, ~30 minutes, success rate + per-task numbers + cost transparency. See reflex eval.
Real-Time Chunking. A technique for streaming the next action chunk’s denoise loop while the robot is still executing the previous chunk. Source: arXiv:2506.07339. In Reflex, --no-rtc disables this; required when running 2-policy mode because RTC carry-over is per-policy.
SmolVLA
Section titled “SmolVLA”HuggingFace’s small VLA — 450M params. The smallest of the four major open VLAs Reflex supports. Fits on Jetson Orin Nano (8 GB).
SnapFlow
Section titled “SnapFlow”A 1-NFE distillation technique. Source: arXiv:2604.05656. Trains a single-step student from a 10-step flow-matching teacher; the student matches or beats the teacher on LIBERO. First public open-source reproduction shipped in Reflex reflex train distill. See distill.
TensorRT
Section titled “TensorRT”NVIDIA’s inference optimization library. Reflex uses it via ONNX Runtime’s TensorRT execution provider, which gives a 5.55× speedup over the CUDA execution provider on flow-matching VLAs.
Vision-Language-Action model. A neural net that takes images + a natural-language instruction + proprioceptive state, and outputs actions. The four major open VLAs Reflex supports: pi0, pi0.5, SmolVLA, NVIDIA GR00T.
Reflex’s term for a composable runtime feature that plugs into reflex serve via a flag. Examples: --cuda-graphs, --auto-calibrate, --policy-a/--policy-b, --a2c2-checkpoint. Each wedge is independent; combinations are well-defined.