Skip to content

Glossary

Asynchronous Action Chunk Correction. A small residual MLP that nudges policy actions when latency is high and success is low. Source paper: arXiv:2509.23224. Bolts onto a trained VLA without retraining the base model.

In Tether: see a2c2.

Architectural Decision Record. A short markdown document capturing a decision, its alternatives, and the reasoning. Public-facing decisions surface in the CHANGELOG and these docs; internal-only ADRs (research vault) aren’t published.

The output of one /act call: a sequence of N actions (typically 50) the robot can execute over the next ~1 second of control time. Diffusion-based VLAs produce action chunks rather than single-step actions because predicting a longer horizon reduces compounding error.

The Tether safety layer that clamps every action chunk to per-axis limits before returning from /act. Reads from the embodiment config; can be augmented with URDF-derived limits via --safety-config. See guard.

The Tether feature that selects the right (variant × provider × NFE × chunk_size) configuration for your hardware + embodiment, then passively learns latency_compensation_ms from real /act traffic. Selection, not tuning. See auto-calibrate.

Number of actions in one /act response. Embodiment-specific — Franka defaults to 50, SO-100 to 30, UR5 to 50.

NVIDIA feature that captures a sequence of GPU kernel launches into a single replayable graph, eliminating per-launch overhead. Tether captures the two ONNX sessions (vlm_prefix and expert_denoise) on supported tiers. See cuda graphs.

A non-default export mode that splits a VLA into vlm_prefix.onnx (vision-language backbone) and expert_denoise.onnx (action expert + one denoise step). The prefix is computed once per cache miss; the expert runs N times per /act. Yields 9× over monolithic on pi0.5. See decomposed pi0.5.

A VLA architecture that produces actions by integrating a learned velocity field from random noise to a clean action over N steps. Flow-matching (pi0, pi0.5, SmolVLA) uses Euler integration; DDPM (GR00T) uses a different schedule. The N steps are the NFE — number of function evaluations.

A specific robot configuration: action dimensions, joint limits, gripper config, control frequency, camera setup. Embodiment configs are JSON files; built-ins ship for franka, so100, ur5. See embodiments.

One contiguous robot task. Within an episode, the camera image and instruction stay roughly stable, so the KV cache from one /act can be reused on the next. Episodes are identified by episode_id in the request body.

The Tether feature that keeps the past_kv cache from one /act call warm and reuses it on subsequent calls in the same episode. Within-call decomposed cache delivers the 9× speedup; cross-call cache is more limited because PaliGemma’s past_kv encodes vision (which changes per frame).

The smaller, action-producing half of a decomposed VLA. Takes the cached past_kv from the VLM prefix plus the current denoise timestep t, returns the velocity field for one Euler step.

A class of generative model that produces samples by integrating a learned velocity field. Used by SmolVLA, pi0, and pi0.5. Compared to DDPM, flow-matching trains faster and runs equally fast at inference (both produce a chunk in N forward passes). See also: NFE.

NVIDIA’s humanoid VLA, released 2025. 3.29B parameters, DDPM-style diffusion (4 canonical steps), Eagle 2.5 VL backbone (SigLIP + Qwen3 + mlp1, 1.87B params). Designed for the Jetson Thor.

Transformer attention layers cache key/value tensors from previous tokens to avoid recomputing them. Tether’s decomposed export externalizes this cache so the VLM prefix’s past_kv can be reused across the action expert’s denoise steps.

A standard benchmark for vision-language-action models — four task families (spatial, object, goal, 10) on a simulated Franka arm. Tether uses it as the canonical eval suite (see tether eval).

HuggingFace’s training framework for robot policies. Tether is the deployment counterpart — tether export <hf_id> consumes models trained or hosted via LeRobot.

Model Context Protocol. A protocol for AI agents to discover and call tools. Tether exposes its inference engine as an MCP server so Claude Desktop, Cursor, and custom agents can call a robot policy as a tool. See MCP server.

The default ONNX export mode — VLM backbone + action expert + N-step denoise loop in one graph. Simpler but slower than decomposed; used for SmolVLA (small enough that it doesn’t matter) and for cases where the decomposed path doesn’t apply.

Number of Function Evaluations. How many forward passes through the action expert per /act call. Flow-matching VLAs canonically use NFE=10; SnapFlow distillation collapses this to NFE=1 with no accuracy loss.

Open Neural Network Exchange. A portable model format that decouples training (PyTorch/JAX) from inference (any ONNX-compatible runtime). Tether exports to ONNX and runs via ONNX Runtime with the TensorRT execution provider on NVIDIA hardware.

Physical Intelligence’s open-source repo for pi0/pi0.5. Contains training code, weights, and reference inference. Tether consumes their weights and adds the deployment toolchain.

Numerical equivalence between PyTorch and ONNX outputs. Tether enforces machine-precision parity (max_abs_diff between 1e-7 and 1e-6) on every export, refusing to ship if the threshold fails. See verified parity.

Physical Intelligence’s flagship VLA family. pi0 is 3.5B parameters with PaliGemma backbone; pi0.5 (3.62B) adds AdaRMSNorm time conditioning; π0.7 (April 2026) shifted to closed-weights cloud-API. Tether supports pi0 and pi0.5 at machine-precision parity.

A trained robot brain. In Tether’s vocabulary, “policy” and “model” are interchangeable in most contexts, with “policy” emphasizing the robot-control role.

Loading two policy slots side-by-side and routing /act traffic deterministically per-episode. Substrate for canary rollouts and self-distill rollback. See policy versioning.

A natural-language wrapper around the entire reflex CLI. Powered by GPT-5 Mini through a hosted proxy at chat.fastcrest.com. Free 100 calls/day. See tether chat.

10 registered diagnostic checks against your environment + export, plus inline system-probe rows (Blackwell ORT version, JetPack version, cuDNN-vs-driver skew, ORT-TRT EP empirical session test, multi-GPU mixed-architecture). Each check maps to a known failure mode. See tether doctor.

LIBERO benchmark wrapper — one command, ~30 minutes, success rate + per-task numbers + cost transparency. See tether eval.

Real-Time Chunking. A technique for streaming the next action chunk’s denoise loop while the robot is still executing the previous chunk. Source: arXiv:2506.07339. In Tether, --no-rtc disables this; required when running 2-policy mode because RTC carry-over is per-policy.

HuggingFace’s small VLA — 450M params. The smallest of the four major open flow-matching VLAs Tether supports. Fits on Jetson Orin Nano (8 GB).

A 1-NFE distillation technique. Source: arXiv:2604.05656. Trains a single-step student from a 10-step flow-matching teacher; the student matches or beats the teacher on LIBERO. First public open-source reproduction shipped in Tether tether train distill. See distill.

NVIDIA’s inference optimization library. Tether uses it via ONNX Runtime’s TensorRT execution provider, which gives a 5.55× speedup over the CUDA execution provider on flow-matching VLAs.

Vision-Language-Action model. A neural net that takes images + a natural-language instruction + proprioceptive state, and outputs actions. The five major open VLAs Tether supports: pi0, pi0.5, SmolVLA, NVIDIA GR00T N1.6 (flow-matching family), plus OpenVLA-7B (autoregressive Llama-2 family, added v0.9.6).

Tether’s term for a composable runtime feature that plugs into tether serve via a flag. Examples: --cuda-graphs, --auto-calibrate, --policy-a/--policy-b, --a2c2-checkpoint. Each wedge is independent; combinations are well-defined.