Glossary

A2C2

Asynchronous Action Chunk Correction. A small residual MLP that nudges policy actions when latency is high and success is low. Source paper: arXiv:2509.23224. Bolts onto a trained VLA without retraining the base model.

In Reflex: see a2c2.

ADR

Architectural Decision Record. A short markdown document capturing a decision, its alternatives, and the reasoning. Reflex’s ADRs live in the project’s research vault (reflex_context/01_decisions/) — when you see “per ADR 2026-04-25-...” in a docs page, that’s the canonical decision document.

Action chunk

The output of one /act call: a sequence of N actions (typically 50) the robot can execute over the next ~1 second of control time. Diffusion-based VLAs produce action chunks rather than single-step actions because predicting a longer horizon reduces compounding error.

ActionGuard

The Reflex safety layer that clamps every action chunk to per-axis limits before returning from /act. Reads from the embodiment config; can be augmented with URDF-derived limits via --safety-config. See guard.

Auto-calibration

The Reflex feature that selects the right (variant × provider × NFE × chunk_size) configuration for your hardware + embodiment, then passively learns latency_compensation_ms from real /act traffic. Selection, not tuning. See auto-calibrate.

Chunk size

Number of actions in one /act response. Embodiment-specific — Franka defaults to 50, SO-100 to 30, UR5 to 50.

CUDA graphs

NVIDIA feature that captures a sequence of GPU kernel launches into a single replayable graph, eliminating per-launch overhead. Reflex captures the two ONNX sessions (vlm_prefix and expert_denoise) on supported tiers. See cuda graphs.

Decomposed export

A non-default export mode that splits a VLA into vlm_prefix.onnx (vision-language backbone) and expert_denoise.onnx (action expert + one denoise step). The prefix is computed once per cache miss; the expert runs N times per /act. Yields 9× over monolithic on pi0.5. See decomposed pi0.5.

Diffusion / flow-matching policy

A VLA architecture that produces actions by integrating a learned velocity field from random noise to a clean action over N steps. Flow-matching (pi0, pi0.5, SmolVLA) uses Euler integration; DDPM (GR00T) uses a different schedule. The N steps are the NFE — number of function evaluations.

Embodiment

A specific robot configuration: action dimensions, joint limits, gripper config, control frequency, camera setup. Embodiment configs are JSON files; built-ins ship for franka, so100, ur5. See embodiments.

Episode

One contiguous robot task. Within an episode, the camera image and instruction stay roughly stable, so the KV cache from one /act can be reused on the next. Episodes are identified by episode_id in the request body.

Episode cache

The Reflex feature that keeps the past_kv cache from one /act call warm and reuses it on subsequent calls in the same episode. Within-call decomposed cache delivers the 9× speedup; cross-call cache is more limited because PaliGemma’s past_kv encodes vision (which changes per frame).

Expert (action expert)

The smaller, action-producing half of a decomposed VLA. Takes the cached past_kv from the VLM prefix plus the current denoise timestep t, returns the velocity field for one Euler step.

flow-matching

A class of generative model that produces samples by integrating a learned velocity field. Used by SmolVLA, pi0, and pi0.5. Compared to DDPM, flow-matching trains faster and runs equally fast at inference (both produce a chunk in N forward passes). See also: NFE.

GR00T

NVIDIA’s humanoid VLA, released 2025. 3.29B parameters, DDPM-style diffusion (4 canonical steps), Eagle 2.5 VL backbone (SigLIP + Qwen3 + mlp1, 1.87B params). Designed for the Jetson Thor.

KV cache (past_kv)

Transformer attention layers cache key/value tensors from previous tokens to avoid recomputing them. Reflex’s decomposed export externalizes this cache so the VLM prefix’s past_kv can be reused across the action expert’s denoise steps.

LIBERO

A standard benchmark for vision-language-action models — four task families (spatial, object, goal, 10) on a simulated Franka arm. Reflex uses it as the canonical eval suite (see reflex eval).

LeRobot

HuggingFace’s training framework for robot policies. Reflex is the deployment counterpart — reflex export <hf_id> consumes models trained or hosted via LeRobot.

MCP

Model Context Protocol. A protocol for AI agents to discover and call tools. Reflex exposes its inference engine as an MCP server so Claude Desktop, Cursor, and custom agents can call a robot policy as a tool. See MCP server.

Monolithic export

The default ONNX export mode — VLM backbone + action expert + N-step denoise loop in one graph. Simpler but slower than decomposed; used for SmolVLA (small enough that it doesn’t matter) and for cases where the decomposed path doesn’t apply.

NFE

Number of Function Evaluations. How many forward passes through the action expert per /act call. Flow-matching VLAs canonically use NFE=10; SnapFlow distillation collapses this to NFE=1 with no accuracy loss.

ONNX

Open Neural Network Exchange. A portable model format that decouples training (PyTorch/JAX) from inference (any ONNX-compatible runtime). Reflex exports to ONNX and runs via ONNX Runtime with the TensorRT execution provider on NVIDIA hardware.

openpi

Physical Intelligence’s open-source repo for pi0/pi0.5. Contains training code, weights, and reference inference. Reflex consumes their weights and adds the deployment toolchain.

Parity (cos = +1.0)

Numerical equivalence between PyTorch and ONNX outputs. Reflex enforces machine-precision parity (max_abs_diff between 1e-7 and 1e-6) on every export, refusing to ship if the threshold fails. See verified parity.

pi0 / pi0.5 / π0.7

Physical Intelligence’s flagship VLA family. pi0 is 3.5B parameters with PaliGemma backbone; pi0.5 (3.62B) adds AdaRMSNorm time conditioning; π0.7 (April 2026) shifted to closed-weights cloud-API. Reflex supports pi0 and pi0.5 at machine-precision parity.

Policy

A trained robot brain. In Reflex’s vocabulary, “policy” and “model” are interchangeable in most contexts, with “policy” emphasizing the robot-control role.

Policy versioning

Loading two policy slots side-by-side and routing /act traffic deterministically per-episode. Substrate for canary rollouts and self-distill rollback. See policy versioning.

Reflex chat

A natural-language wrapper around the entire reflex CLI. Powered by GPT-5 Mini through a hosted proxy at chat.fastcrest.com. Free 100 calls/day. See reflex chat.

reflex doctor

10 falsifiable diagnostic checks against your environment + export. Each check maps to a known failure mode. See reflex doctor.

reflex eval

LIBERO benchmark wrapper — one command, ~30 minutes, success rate + per-task numbers + cost transparency. See reflex eval.

RTC

Real-Time Chunking. A technique for streaming the next action chunk’s denoise loop while the robot is still executing the previous chunk. Source: arXiv:2506.07339. In Reflex, --no-rtc disables this; required when running 2-policy mode because RTC carry-over is per-policy.

SmolVLA

HuggingFace’s small VLA — 450M params. The smallest of the four major open VLAs Reflex supports. Fits on Jetson Orin Nano (8 GB).

SnapFlow

A 1-NFE distillation technique. Source: arXiv:2604.05656. Trains a single-step student from a 10-step flow-matching teacher; the student matches or beats the teacher on LIBERO. First public open-source reproduction shipped in Reflex reflex train distill. See distill.

TensorRT

NVIDIA’s inference optimization library. Reflex uses it via ONNX Runtime’s TensorRT execution provider, which gives a 5.55× speedup over the CUDA execution provider on flow-matching VLAs.

VLA

Vision-Language-Action model. A neural net that takes images + a natural-language instruction + proprioceptive state, and outputs actions. The four major open VLAs Reflex supports: pi0, pi0.5, SmolVLA, NVIDIA GR00T.

Wedge

Reflex’s term for a composable runtime feature that plugs into reflex serve via a flag. Examples: --cuda-graphs, --auto-calibrate, --policy-a/--policy-b, --a2c2-checkpoint. Each wedge is independent; combinations are well-defined.