Skip to content

Changelog

Shipped 2026-06-03.

Headline: Renamed from Reflex to Tether. One brand, one name. Zero functional changes.

  • pip install fastcrest-tether is the new install line. The PyPI package was renamed from reflex-vla to fastcrest-tether (the bare name tether is reserved on PyPI). The old reflex-vla package is frozen at 0.11.x and is not a meta-package — v0.12.0+ ships only under fastcrest-tether.
  • tether is the new CLI entry point. A reflex shim is preserved that calls through to tether with a one-line deprecation nag; removed in v0.14.0.
  • GitHub repo moved from FastCrest/reflex-vla to FastCrest/tether. The old repo URL 301-redirects.
  • Storage paths updated. ~/.cache/reflex/~/.cache/tether/, ~/.reflex/pro.license~/.tether/pro.license. Legacy paths are still read for back-compat in v0.12; writes only go to the new paths.
  • Python client class renamed. ReflexClientTetherClient; old name re-exported as a deprecation alias.
  • Prometheus metric names UNCHANGED (reflex_act_latency_seconds, reflex_guard_clamped_total, etc.). These are a wire-level contract that customer dashboards reference by literal string. A future major release will dual-emit tether_* aliases for one minor-version window before retiring reflex_*.
  • JSON cert / config field UNCHANGED (reflex_version in parity.cert.json, tether_config.json). Wire-format identifiers, customer files already exist with these names.
  • Docs site title is now “Tether”; (formerly Reflex) parentheticals are scattered through landing + docs for ~60 days to help SEO and customer comprehension, then removed in a follow-up.

Sibling release: FastCrest Cloud (the hosted optimize-and-verify SaaS) absorbs the marketing surface that was previously called “Reflex Cloud” — same product, same dashboard, just the brand naming aligned to the one-brand-one-name plan.

Shipped 2026-05-29.

Headline: Hardening patch for the v0.11 lift program, and the formal N=100/task L3 LIBERO parity gate clears the opt-in --fast-kernels Triton runtime.

  • L3 parity gate — kill-trigger 3 clear. Pi0.5 LIBERO-10 tasks 0-2, N=100/task per runtime (600 episodes total): Triton --fast-kernels 91.3% (274/300) vs native ORT 85.3% (256/300). native − triton = −6.0pp — fast-kernels runs ahead of native, the opposite direction from the kill-trigger, so the flag stays on. Modal A100-40GB, ~157 min, ~$33.75.
  • reflex connect works on a clean install. requests is now a core dependency — on pip install reflex-vla with no extras, reflex connect status no longer raises ModuleNotFoundError (it had been an undeclared import that only resolved as a transitive dep locally).
  • Monolithic serve/bench path hardened, with external-data ONNX. Dedicated ORT provider-options and tokenizer-loading modules are extracted from the request hot path; ONNX models with external weight data (.onnx + .onnx_data, required once a single graph exceeds the 2 GB protobuf limit) now load in both serve and the weight-fusion export pass.
  • Cleaner streams + version lockstep. Integration-command errors route to stderr (clean stdout for --json consumers and pipelines); __version__ and pyproject.toml are re-locked in step after a drift in v0.11.1.

Shipped 2026-05-27. The PyPI release that folds in the whole 0.11 line — 0.10.0 and 0.11.0 were tagged in development but never published to PyPI.

Headline: The FluxVLA lift program ships — 8 patterns lifted from FluxVLA (LimX Dynamics, Apache-2.0), two new VLA families (taking the BaseVLA spine from 5 → 7), and a reflex connect integration framework. Every lift is opt-in or additive; reflex serve ./export/ behaves exactly as it did in v0.10.0.

  • Triton fast kernels (reflex serve --fast-kernels, opt-in, Pi0.5 only). The entire pipeline runs through vendored Triton kernels captured in a single CUDA Graph, replacing the per-op ORT session path. Measured on A100-40GB: predict_action 127.4 ms → 51.0 ms = 2.5× vs PyTorch, ~12× vs the standard ORT serve path. Hardware-gated (sm ≥ 8.0) with a silent ORT fallback on unsupported hardware, and bounded by a kill-trigger ADR.
  • Inference-only weights (reflex serve --inference-only-weights). Weights load from safetensors directly into a flat bf16 CUDA tensor dict — no nn.Module is built at request time. +67% peak RSS reduction on Pi0 (39.2 GB → 12.9 GB), validated on lerobot/pi0_base.
  • ZMQ transport (reflex serve --transport zmq). A REQ/REP transport alongside HTTP, with a thin GPU-free camera-side client and JPEG-on-wire serialization (~20× bandwidth reduction for 3-camera setups). Ships in the [serve] extra — both transports out of the box.
  • Model coverage 5 → 7 families. DreamZero WAM (6th) and MolmoAct2 (7th) join the BaseVLA spine; FluxVLA’s pi0.5 LIBERO-10 checkpoint (published 97.85% LIBERO-10 average, Apache-2.0) is added to the curated registry as pi05-libero10-fluxvla.
  • reflex connect — an integration framework (list / up / down / status) that installs and lifecycle-manages external tools while each tool owns its own code. First integration is RTSM (Real-Time Spatial Memory, Apache-2.0), exposing 6 MCP query tools.
  • Weight-fusion ONNX export pass (technique lifted from dexmal/realtime-vla, MIT — algebraically equivalent, zero accuracy cost) and Aloha + UR3 ROS2 starter kits in contrib/ros2/.

Shipped 2026-05-22 (folded into the 0.11.1 PyPI release).

Headline: BaseVLA spine refactor — a 12-day arc that replaces five bespoke ~600-1000 LOC exporter pipelines with one component-slot composition. Every VLA family is now a thin ~100 LOC subclass of BaseVLA that declares which of 6 component slots it uses (vision_backbone, llm_backbone, vlm_backbone, projector, vla_head, text_encoder). Adding a backbone is now a composition file plus a registry entry.

  • Validated bit-identical to lerobot’s reference on real checkpoints: pi0 (max 1.13e-6), pi0.5 (2.74e-6), SmolVLA (0.0), GR00T N1.6 (0.0).
  • 6 silent ONNX export bugs found and fixed along the way; 1197 LOC of duplicated exporter orchestration deleted at sunset.
  • New docs/adding_a_vla.md cookbook documents the ~100 LOC pattern.

Shipped 2026-05-10. Latest in the v0.9 lineage that closed Phase 1 on 2026-05-07.

Headline: OpenVLA-7B + GR00T N1.6 join the curated model registry. Both exporters had shipped earlier (gr00t_exporter.py validated to max_diff = 8.34e-07 vs PyTorch; openvla_exporter.py via the optimum-cli onnx path) but the reflex models list / reflex chat / reflex doctor surface had only pi0 / pi0.5 / SmolVLA entries. v0.9.6 wires both in and adds a contract test (test_registry_completeness.py) that fails CI if a new exporter ever ships without a matching registry entry.

Other v0.9.x highlights:

  • v0.9.5 — CLI cut pass. reflex --help shrinks 22% (18 → 14 visible top-level verbs); reflex inspect --help shrinks 60% (5 → 2). No commands deleted — internal / one-shot / SO-ARM-specific verbs are now hidden=True and still callable directly.
  • v0.9.0 — FlashVLA action-similarity fast-path (--action-similarity-threshold + --max-similar-skips). When the expert produces an action chunk L2-similar to the previously-emitted one, the next predict_action_chunk() reuses the cached chunk. Production-validated on Modal A100: 9 skips / 20 calls = 45% skip rate, 20/20 bit-exact actions, 1.24× wall-clock speedup. Capped at 3 consecutive cached returns to bound drift on slow-changing scenes.
  • v0.9.0 — reflex traces query + reflex traces summary. Filter and aggregate the JSONL traces written by reflex serve --record <dir> (by task, model, day; rich table / JSON / CSV output). Built on existing JSONL storage; parquet + DuckDB index migration deferred to v2.
  • v0.9.0 — uncertainty scoring + 4-quadrant classifier in reflex.curate.quality. N-pass inference variance for flow-matching VLAs; classifies episodes into informative_edge_case / edge_case_to_correct / redundant_known_good / model_blind_spot.

Phase 1 closed 2026-05-07 with 16/16 features shipped or explicitly killed. a2c2-correction marked phase_1_shipped; Phase 2 ON > OFF positive delta filed as research-revisit successors (FASTER, TAS, Legato).

Shipped 2026-05-02.

Headline: First OSS per-step expert ONNX export. New per_step_expert=True flag on export_pi05_decomposed produces an expert_denoise.onnx that takes (x_t, t, past_kv) and returns v_t (single Euler step velocity), instead of the default baked-loop ONNX. Exposes the diffusion denoise loop to Python so RTC, A2C2-style correction, per-step caching, and per-step distillation all compose against it.

  • Bit-exact parity vs baked-loop ONNX on A100-80GB (cos = 1.0000000000, max_abs = 0.000e+00). Two real bugs surfaced + fixed mid-flight: _denoise_phase freeze flag now applied to PI05Pytorch.denoise_step (was pi0-only); torch.onnx.export optimize=False skips the constant-folding pass that evaluated fp64 sin/cos in fp32 (3e-5 max_abs error in the baked time embedding).
  • ORT IOBinding pins past_kvs to device once per chunk via OrtValue.ortvalue_from_numpy. Drops per-step Euler loop overhead from +36% to +13% median.
  • cudnn_conv_algo_search=HEURISTIC pinned in the per-step path. The default EXHAUSTIVE setting picks different cuDNN convolution algorithms per session despite functionally-identical compute — baked sometimes lands fast (~45ms), per-step always slow (~91ms). Pinning HEURISTIC closes the gap.
  • CUDA EP loadchain hardened — eager-dlopen libcurand / libcufft / libcusparse / libnvJitLink. Independently load-bearing for any consumer on a fresh image where torch’s transitive curand path doesn’t happen to be loadable.

Shipped 2026-04-29.

Headline: ORT-TensorRT EP becomes first-class. pip install reflex-vla[serve,gpu] now installs everything needed for the 5.55× TensorRT speedup out of the box.

  • 5.55× speedup on Modal A10G — measured on SmolVLA monolithic, 5 warmup + 20 measured forward passes, batch=1. ORT-CUDA fallback: 108.11 ms → ORT-TRT EP: 19.49 ms.
  • [serve,gpu] extras now pull tensorrt>=10.0,<11 + the NVIDIA CUDA libraries automatically.
  • reflex patches LD_LIBRARY_PATH at import time so the TRT runtime libraries are found without manual env-var fiddling.
  • reflex doctor gains 4 TRT EP load-chain checks (libnvinfer.so.10, libcublas.so.12, libcudnn.so.9, ORT-TRT EP active).
  • Chat tool routing covered by structural pin tests (later consolidated into tests/test_chat_regression.py and tests/test_chat_tools_executable.py in v0.9).

Validation: 9-iteration Modal A10G spike, ~$5. CUDA graphs work shipped in v0.7.1 — 1.30× per-chunk on A100 N=200 (vlm_prefix 1.07× + expert_denoise 1.47×, p99 -30%, jitter 4.1× tighter); A10G expert-only 1.18× mean (p99 -35%, jitter 14× tighter).

Shipped 2026-04-28.

  • Decomposed pi0.5 ships in Reflex via reflex export --mode decomposed. Splits the model into vlm_prefix.onnx + expert_denoise.onnx with KV cache reuse.
  • 9× speedup verified on Jetson AGX Orin: monolithic 900 ms/chunk → decomposed 100 ms/chunk.
  • --export-mode {auto,parallel,sequential} flag — auto-detects via VRAM probe; refuses with InsufficientVRAMError rather than silently falling back.
  • Hardware matrix CI: Mac CPU / Orin Nano / Orin AGX / RTX / T4 / A10G / A100.

Shipped 2026-04-23 to 2026-04-28.

  • First public PyPI release at v0.5.0 (2026-04-28). License changed from Apache 2.0 to BSL 1.1 between dev releases (HashiCorp / MongoDB pattern).
  • A2C2 correction head ships behind --a2c2-checkpoint. Auto-skip semantics validated on N=10 LIBERO at --inject-latency-ms 100: 8/10 (80%), matching baseline exactly.
  • Auto-calibration ships behind --auto-calibrate. Hardware fingerprinting + selection across (variant × provider × NFE × chunk_size). Cache at ~/.reflex/calibration.json.
  • Policy versioning substrate ships: 2-slot router, sticky-per-episode SHA-256 hash, per-policy circuit breaker, additive record-replay schema.
  • Eval-as-a-service ships at reflex eval — LIBERO suite + cost preview + machine-readable JSON envelope (schema v1 locked).
  • Self-distilling serve Pro-tier substrate: 9-gate methodology + 24h post-swap monitor + auto-rollback. Phase 1 dev-license harness.

Shipped April 2026.

  • SnapFlow distillation ships as reflex train distill (first public open-source reproduction). 1-step student beats 10-step teacher: 64% vs 56% on libero_object N=50.
  • ONNX export reliability hardening across pi0, pi0.5, SmolVLA, GR00T. Three load-bearing patches under torch.export: F.pad causal mask, frozen DynamicLayer.update, manual past_kv.get_seq_length() for mask assembly.
  • Phoenix wired as OTel backend; record-replay schema v1 locked.
  • 24× CLI speedup via PEP 562 lazy-import (reflex --version: 2.4 s → 0.10 s).

Shipped late March 2026.

  • First “talk to your robot fleet” reflex chat agent. 17 chat tools wrapping the entire reflex CLI surface.
  • ROS2 reflex ros2-serve transport (now legacy alias for serve --transport ros2).
  • Adaptive denoising telemetry on flow-matching VLAs.

Internal-only milestones, March 2026. First end-to-end pipeline: HuggingFace lerobot/smolvla_base → ONNX export → FastAPI /act server. Foundation for everything since.


For the full per-commit history, see GitHub releases.