Skip to content

Changelog

Shipped 2026-04-29.

Headline: ORT-TensorRT EP becomes first-class. pip install reflex-vla[serve,gpu] now installs everything needed for the 5.55× TensorRT speedup out of the box.

  • 5.55× speedup on Modal A10G — measured on SmolVLA monolithic, 5 warmup + 20 measured forward passes, batch=1. ORT-CUDA fallback: 108.11 ms → ORT-TRT EP: 19.49 ms.
  • [serve,gpu] extras now pull tensorrt>=10.0,<11 + the NVIDIA CUDA libraries automatically.
  • reflex patches LD_LIBRARY_PATH at import time so the TRT runtime libraries are found without manual env-var fiddling.
  • reflex doctor gains 4 TRT EP load-chain checks (libnvinfer.so.10, libcublas.so.12, libcudnn.so.9, ORT-TRT EP active).
  • 47/47 chat tool tests passing.

Validation: reflex_context/03_experiments/2026-04-29-v07-runtime-spike.md (9 iterations, ~$5 Modal).

Shipped 2026-04-28.

  • Decomposed pi0.5 ships in Reflex via reflex export --mode decomposed. Splits the model into vlm_prefix.onnx + expert_denoise.onnx with KV cache reuse.
  • 9× speedup verified on Jetson AGX Orin: monolithic 900 ms/chunk → decomposed 100 ms/chunk.
  • --export-mode {auto,parallel,sequential} flag — auto-detects via VRAM probe; refuses with InsufficientVRAMError rather than silently falling back.
  • Hardware matrix CI: Mac CPU / Orin Nano / Orin AGX / RTX / T4 / A10G / A100.

Shipped 2026-04-23 to 2026-04-28.

  • First public PyPI release at v0.5.0 (2026-04-28). License changed from Apache 2.0 to BSL 1.1 between dev releases (HashiCorp / MongoDB pattern).
  • A2C2 correction head ships behind --a2c2-checkpoint. Auto-skip semantics validated on N=10 LIBERO at --inject-latency-ms 100: 8/10 (80%), matching baseline exactly.
  • Auto-calibration ships behind --auto-calibrate. Hardware fingerprinting + selection across (variant × provider × NFE × chunk_size). Cache at ~/.reflex/calibration.json.
  • Policy versioning substrate ships: 2-slot router, sticky-per-episode SHA-256 hash, per-policy circuit breaker, additive record-replay schema.
  • Eval-as-a-service ships at reflex eval — LIBERO suite + cost preview + machine-readable JSON envelope (schema v1 locked).
  • Self-distilling serve Pro-tier substrate: 9-gate methodology + 24h post-swap monitor + auto-rollback. Phase 1 dev-license harness.

Shipped April 2026.

  • SnapFlow distillation ships as reflex train distill (first public open-source reproduction). 1-step student beats 10-step teacher: 64% vs 56% on libero_object N=50.
  • ONNX export reliability hardening across pi0, pi0.5, SmolVLA, GR00T. Three load-bearing patches under torch.export: F.pad causal mask, frozen DynamicLayer.update, manual past_kv.get_seq_length() for mask assembly.
  • Phoenix wired as OTel backend; record-replay schema v1 locked.
  • 24× CLI speedup via PEP 562 lazy-import (reflex --version: 2.4 s → 0.10 s).

Shipped late March 2026.

  • First “talk to your robot fleet” reflex chat agent. 17 chat tools wrapping the entire reflex CLI surface.
  • ROS2 reflex ros2-serve transport (now legacy alias for serve --transport ros2).
  • Adaptive denoising telemetry on flow-matching VLAs.

Internal-only milestones, March 2026. First end-to-end pipeline: HuggingFace lerobot/smolvla_base → ONNX export → FastAPI /act server. Foundation for everything since.


For the full per-commit history, see GitHub releases.