Changelog
v0.12.0 (rename)
Section titled “v0.12.0 (rename)”Shipped 2026-06-03.
Headline: Renamed from Reflex to Tether. One brand, one name. Zero functional changes.
pip install fastcrest-tetheris the new install line. The PyPI package was renamed fromreflex-vlatofastcrest-tether(the bare nametetheris reserved on PyPI). The oldreflex-vlapackage is frozen at 0.11.x and is not a meta-package — v0.12.0+ ships only underfastcrest-tether.tetheris the new CLI entry point. Areflexshim is preserved that calls through totetherwith a one-line deprecation nag; removed in v0.14.0.- GitHub repo moved from FastCrest/reflex-vla to FastCrest/tether. The old repo URL 301-redirects.
- Storage paths updated.
~/.cache/reflex/→~/.cache/tether/,~/.reflex/pro.license→~/.tether/pro.license. Legacy paths are still read for back-compat in v0.12; writes only go to the new paths. - Python client class renamed.
ReflexClient→TetherClient; old name re-exported as a deprecation alias. - Prometheus metric names UNCHANGED (
reflex_act_latency_seconds,reflex_guard_clamped_total, etc.). These are a wire-level contract that customer dashboards reference by literal string. A future major release will dual-emittether_*aliases for one minor-version window before retiringreflex_*. - JSON cert / config field UNCHANGED (
reflex_versioninparity.cert.json,tether_config.json). Wire-format identifiers, customer files already exist with these names. - Docs site title is now “Tether”;
(formerly Reflex)parentheticals are scattered through landing + docs for ~60 days to help SEO and customer comprehension, then removed in a follow-up.
Sibling release: FastCrest Cloud (the hosted optimize-and-verify SaaS) absorbs the marketing surface that was previously called “Reflex Cloud” — same product, same dashboard, just the brand naming aligned to the one-brand-one-name plan.
v0.11.2
Section titled “v0.11.2”Shipped 2026-05-29.
Headline: Hardening patch for the v0.11 lift program, and the formal N=100/task L3 LIBERO parity gate clears the opt-in --fast-kernels Triton runtime.
- L3 parity gate — kill-trigger 3 clear. Pi0.5 LIBERO-10 tasks 0-2, N=100/task per runtime (600 episodes total): Triton
--fast-kernels91.3% (274/300) vs native ORT 85.3% (256/300). native − triton = −6.0pp — fast-kernels runs ahead of native, the opposite direction from the kill-trigger, so the flag stays on. Modal A100-40GB, ~157 min, ~$33.75. reflex connectworks on a clean install.requestsis now a core dependency — onpip install reflex-vlawith no extras,reflex connect statusno longer raisesModuleNotFoundError(it had been an undeclared import that only resolved as a transitive dep locally).- Monolithic serve/bench path hardened, with external-data ONNX. Dedicated ORT provider-options and tokenizer-loading modules are extracted from the request hot path; ONNX models with external weight data (
.onnx+.onnx_data, required once a single graph exceeds the 2 GB protobuf limit) now load in both serve and the weight-fusion export pass. - Cleaner streams + version lockstep. Integration-command errors route to stderr (clean stdout for
--jsonconsumers and pipelines);__version__andpyproject.tomlare re-locked in step after a drift in v0.11.1.
v0.11.1
Section titled “v0.11.1”Shipped 2026-05-27. The PyPI release that folds in the whole 0.11 line — 0.10.0 and 0.11.0 were tagged in development but never published to PyPI.
Headline: The FluxVLA lift program ships — 8 patterns lifted from FluxVLA (LimX Dynamics, Apache-2.0), two new VLA families (taking the BaseVLA spine from 5 → 7), and a reflex connect integration framework. Every lift is opt-in or additive; reflex serve ./export/ behaves exactly as it did in v0.10.0.
- Triton fast kernels (
reflex serve --fast-kernels, opt-in, Pi0.5 only). The entire pipeline runs through vendored Triton kernels captured in a single CUDA Graph, replacing the per-op ORT session path. Measured on A100-40GB:predict_action127.4 ms → 51.0 ms = 2.5× vs PyTorch, ~12× vs the standard ORT serve path. Hardware-gated (sm ≥ 8.0) with a silent ORT fallback on unsupported hardware, and bounded by a kill-trigger ADR. - Inference-only weights (
reflex serve --inference-only-weights). Weights load from safetensors directly into a flat bf16 CUDA tensor dict — nonn.Moduleis built at request time. +67% peak RSS reduction on Pi0 (39.2 GB → 12.9 GB), validated onlerobot/pi0_base. - ZMQ transport (
reflex serve --transport zmq). A REQ/REP transport alongside HTTP, with a thin GPU-free camera-side client and JPEG-on-wire serialization (~20× bandwidth reduction for 3-camera setups). Ships in the[serve]extra — both transports out of the box. - Model coverage 5 → 7 families. DreamZero WAM (6th) and MolmoAct2 (7th) join the BaseVLA spine; FluxVLA’s pi0.5 LIBERO-10 checkpoint (published 97.85% LIBERO-10 average, Apache-2.0) is added to the curated registry as
pi05-libero10-fluxvla. reflex connect— an integration framework (list/up/down/status) that installs and lifecycle-manages external tools while each tool owns its own code. First integration is RTSM (Real-Time Spatial Memory, Apache-2.0), exposing 6 MCP query tools.- Weight-fusion ONNX export pass (technique lifted from
dexmal/realtime-vla, MIT — algebraically equivalent, zero accuracy cost) and Aloha + UR3 ROS2 starter kits incontrib/ros2/.
v0.10.0
Section titled “v0.10.0”Shipped 2026-05-22 (folded into the 0.11.1 PyPI release).
Headline: BaseVLA spine refactor — a 12-day arc that replaces five bespoke ~600-1000 LOC exporter pipelines with one component-slot composition. Every VLA family is now a thin ~100 LOC subclass of BaseVLA that declares which of 6 component slots it uses (vision_backbone, llm_backbone, vlm_backbone, projector, vla_head, text_encoder). Adding a backbone is now a composition file plus a registry entry.
- Validated bit-identical to lerobot’s reference on real checkpoints: pi0 (max 1.13e-6), pi0.5 (2.74e-6), SmolVLA (0.0), GR00T N1.6 (0.0).
- 6 silent ONNX export bugs found and fixed along the way; 1197 LOC of duplicated exporter orchestration deleted at sunset.
- New
docs/adding_a_vla.mdcookbook documents the ~100 LOC pattern.
v0.9.6
Section titled “v0.9.6”Shipped 2026-05-10. Latest in the v0.9 lineage that closed Phase 1 on 2026-05-07.
Headline: OpenVLA-7B + GR00T N1.6 join the curated model registry. Both exporters had shipped earlier (gr00t_exporter.py validated to max_diff = 8.34e-07 vs PyTorch; openvla_exporter.py via the optimum-cli onnx path) but the reflex models list / reflex chat / reflex doctor surface had only pi0 / pi0.5 / SmolVLA entries. v0.9.6 wires both in and adds a contract test (test_registry_completeness.py) that fails CI if a new exporter ever ships without a matching registry entry.
Other v0.9.x highlights:
- v0.9.5 — CLI cut pass.
reflex --helpshrinks 22% (18 → 14 visible top-level verbs);reflex inspect --helpshrinks 60% (5 → 2). No commands deleted — internal / one-shot / SO-ARM-specific verbs are nowhidden=Trueand still callable directly. - v0.9.0 — FlashVLA action-similarity fast-path (
--action-similarity-threshold+--max-similar-skips). When the expert produces an action chunk L2-similar to the previously-emitted one, the nextpredict_action_chunk()reuses the cached chunk. Production-validated on Modal A100: 9 skips / 20 calls = 45% skip rate, 20/20 bit-exact actions, 1.24× wall-clock speedup. Capped at 3 consecutive cached returns to bound drift on slow-changing scenes. - v0.9.0 —
reflex traces query+reflex traces summary. Filter and aggregate the JSONL traces written byreflex serve --record <dir>(by task, model, day; rich table / JSON / CSV output). Built on existing JSONL storage; parquet + DuckDB index migration deferred to v2. - v0.9.0 — uncertainty scoring + 4-quadrant classifier in
reflex.curate.quality. N-pass inference variance for flow-matching VLAs; classifies episodes intoinformative_edge_case/edge_case_to_correct/redundant_known_good/model_blind_spot.
Phase 1 closed 2026-05-07 with 16/16 features shipped or explicitly killed. a2c2-correction marked phase_1_shipped; Phase 2 ON > OFF positive delta filed as research-revisit successors (FASTER, TAS, Legato).
v0.8.0
Section titled “v0.8.0”Shipped 2026-05-02.
Headline: First OSS per-step expert ONNX export. New per_step_expert=True flag on export_pi05_decomposed produces an expert_denoise.onnx that takes (x_t, t, past_kv) and returns v_t (single Euler step velocity), instead of the default baked-loop ONNX. Exposes the diffusion denoise loop to Python so RTC, A2C2-style correction, per-step caching, and per-step distillation all compose against it.
- Bit-exact parity vs baked-loop ONNX on A100-80GB (cos = 1.0000000000, max_abs = 0.000e+00). Two real bugs surfaced + fixed mid-flight:
_denoise_phasefreeze flag now applied toPI05Pytorch.denoise_step(was pi0-only);torch.onnx.exportoptimize=Falseskips the constant-folding pass that evaluated fp64 sin/cos in fp32 (3e-5 max_abs error in the baked time embedding). - ORT IOBinding pins
past_kvsto device once per chunk viaOrtValue.ortvalue_from_numpy. Drops per-step Euler loop overhead from +36% to +13% median. cudnn_conv_algo_search=HEURISTICpinned in the per-step path. The defaultEXHAUSTIVEsetting picks different cuDNN convolution algorithms per session despite functionally-identical compute — baked sometimes lands fast (~45ms), per-step always slow (~91ms). Pinning HEURISTIC closes the gap.- CUDA EP loadchain hardened — eager-dlopen
libcurand/libcufft/libcusparse/libnvJitLink. Independently load-bearing for any consumer on a fresh image where torch’s transitive curand path doesn’t happen to be loadable.
Shipped 2026-04-29.
Headline: ORT-TensorRT EP becomes first-class. pip install reflex-vla[serve,gpu] now installs everything needed for the 5.55× TensorRT speedup out of the box.
- 5.55× speedup on Modal A10G — measured on SmolVLA monolithic, 5 warmup + 20 measured forward passes, batch=1. ORT-CUDA fallback: 108.11 ms → ORT-TRT EP: 19.49 ms.
[serve,gpu]extras now pulltensorrt>=10.0,<11+ the NVIDIA CUDA libraries automatically.reflexpatchesLD_LIBRARY_PATHat import time so the TRT runtime libraries are found without manual env-var fiddling.reflex doctorgains 4 TRT EP load-chain checks (libnvinfer.so.10,libcublas.so.12,libcudnn.so.9, ORT-TRT EP active).- Chat tool routing covered by structural pin tests (later consolidated into
tests/test_chat_regression.pyandtests/test_chat_tools_executable.pyin v0.9).
Validation: 9-iteration Modal A10G spike, ~$5. CUDA graphs work shipped in v0.7.1 — 1.30× per-chunk on A100 N=200 (vlm_prefix 1.07× + expert_denoise 1.47×, p99 -30%, jitter 4.1× tighter); A10G expert-only 1.18× mean (p99 -35%, jitter 14× tighter).
Shipped 2026-04-28.
- Decomposed pi0.5 ships in Reflex via
reflex export --mode decomposed. Splits the model intovlm_prefix.onnx+expert_denoise.onnxwith KV cache reuse. - 9× speedup verified on Jetson AGX Orin: monolithic 900 ms/chunk → decomposed 100 ms/chunk.
--export-mode {auto,parallel,sequential}flag — auto-detects via VRAM probe; refuses withInsufficientVRAMErrorrather than silently falling back.- Hardware matrix CI: Mac CPU / Orin Nano / Orin AGX / RTX / T4 / A10G / A100.
v0.5.5 → v0.5.0
Section titled “v0.5.5 → v0.5.0”Shipped 2026-04-23 to 2026-04-28.
- First public PyPI release at v0.5.0 (2026-04-28). License changed from Apache 2.0 to BSL 1.1 between dev releases (HashiCorp / MongoDB pattern).
- A2C2 correction head ships behind
--a2c2-checkpoint. Auto-skip semantics validated on N=10 LIBERO at--inject-latency-ms 100: 8/10 (80%), matching baseline exactly. - Auto-calibration ships behind
--auto-calibrate. Hardware fingerprinting + selection across (variant × provider × NFE × chunk_size). Cache at~/.reflex/calibration.json. - Policy versioning substrate ships: 2-slot router, sticky-per-episode SHA-256 hash, per-policy circuit breaker, additive record-replay schema.
- Eval-as-a-service ships at
reflex eval— LIBERO suite + cost preview + machine-readable JSON envelope (schema v1 locked). - Self-distilling serve Pro-tier substrate: 9-gate methodology + 24h post-swap monitor + auto-rollback. Phase 1 dev-license harness.
v0.4 → v0.3.1
Section titled “v0.4 → v0.3.1”Shipped April 2026.
- SnapFlow distillation ships as
reflex train distill(first public open-source reproduction). 1-step student beats 10-step teacher: 64% vs 56% on libero_object N=50. - ONNX export reliability hardening across pi0, pi0.5, SmolVLA, GR00T. Three load-bearing patches under
torch.export: F.pad causal mask, frozenDynamicLayer.update, manualpast_kv.get_seq_length()for mask assembly. - Phoenix wired as OTel backend; record-replay schema v1 locked.
- 24× CLI speedup via PEP 562 lazy-import (
reflex --version: 2.4 s → 0.10 s).
Shipped late March 2026.
- First “talk to your robot fleet”
reflex chatagent. 17 chat tools wrapping the entire reflex CLI surface. - ROS2
reflex ros2-servetransport (now legacy alias forserve --transport ros2). - Adaptive denoising telemetry on flow-matching VLAs.
Internal-only milestones, March 2026. First end-to-end pipeline: HuggingFace lerobot/smolvla_base → ONNX export → FastAPI /act server. Foundation for everything since.
For the full per-commit history, see GitHub releases.