Changelog

v0.7 (current)

Shipped 2026-04-29.

Headline: ORT-TensorRT EP becomes first-class. pip install reflex-vla[serve,gpu] now installs everything needed for the 5.55× TensorRT speedup out of the box.

5.55× speedup on Modal A10G — measured on SmolVLA monolithic, 5 warmup + 20 measured forward passes, batch=1. ORT-CUDA fallback: 108.11 ms → ORT-TRT EP: 19.49 ms.
[serve,gpu] extras now pull tensorrt>=10.0,<11 + the NVIDIA CUDA libraries automatically.
reflex patches LD_LIBRARY_PATH at import time so the TRT runtime libraries are found without manual env-var fiddling.
reflex doctor gains 4 TRT EP load-chain checks (libnvinfer.so.10, libcublas.so.12, libcudnn.so.9, ORT-TRT EP active).
47/47 chat tool tests passing.

Validation: reflex_context/03_experiments/2026-04-29-v07-runtime-spike.md (9 iterations, ~$5 Modal).

v0.6

Shipped 2026-04-28.

Decomposed pi0.5 ships in Reflex via reflex export --mode decomposed. Splits the model into vlm_prefix.onnx + expert_denoise.onnx with KV cache reuse.
9× speedup verified on Jetson AGX Orin: monolithic 900 ms/chunk → decomposed 100 ms/chunk.
--export-mode {auto,parallel,sequential} flag — auto-detects via VRAM probe; refuses with InsufficientVRAMError rather than silently falling back.
Hardware matrix CI: Mac CPU / Orin Nano / Orin AGX / RTX / T4 / A10G / A100.

v0.5.5 → v0.5.0

Shipped 2026-04-23 to 2026-04-28.

First public PyPI release at v0.5.0 (2026-04-28). License changed from Apache 2.0 to BSL 1.1 between dev releases (HashiCorp / MongoDB pattern).
A2C2 correction head ships behind --a2c2-checkpoint. Auto-skip semantics validated on N=10 LIBERO at --inject-latency-ms 100: 8/10 (80%), matching baseline exactly.
Auto-calibration ships behind --auto-calibrate. Hardware fingerprinting + selection across (variant × provider × NFE × chunk_size). Cache at ~/.reflex/calibration.json.
Policy versioning substrate ships: 2-slot router, sticky-per-episode SHA-256 hash, per-policy circuit breaker, additive record-replay schema.
Eval-as-a-service ships at reflex eval — LIBERO suite + cost preview + machine-readable JSON envelope (schema v1 locked).
Self-distilling serve Pro-tier substrate: 9-gate methodology + 24h post-swap monitor + auto-rollback. Phase 1 dev-license harness.

v0.4 → v0.3.1

Shipped April 2026.

SnapFlow distillation ships as reflex train distill (first public open-source reproduction). 1-step student beats 10-step teacher: 64% vs 56% on libero_object N=50.
ONNX export reliability hardening across pi0, pi0.5, SmolVLA, GR00T. Three load-bearing patches under torch.export: F.pad causal mask, frozen DynamicLayer.update, manual past_kv.get_seq_length() for mask assembly.
Phoenix wired as OTel backend; record-replay schema v1 locked.
24× CLI speedup via PEP 562 lazy-import (reflex --version: 2.4 s → 0.10 s).

v0.2

Shipped late March 2026.

First “talk to your robot fleet” reflex chat agent. 17 chat tools wrapping the entire reflex CLI surface.
ROS2 reflex ros2-serve transport (now legacy alias for serve --transport ros2).
Adaptive denoising telemetry on flow-matching VLAs.

v0.1

Internal-only milestones, March 2026. First end-to-end pipeline: HuggingFace lerobot/smolvla_base → ONNX export → FastAPI /act server. Foundation for everything since.

For the full per-commit history, see GitHub releases.