Changelog
v0.7 (current)
Section titled “v0.7 (current)”Shipped 2026-04-29.
Headline: ORT-TensorRT EP becomes first-class. pip install reflex-vla[serve,gpu] now installs everything needed for the 5.55× TensorRT speedup out of the box.
- 5.55× speedup on Modal A10G — measured on SmolVLA monolithic, 5 warmup + 20 measured forward passes, batch=1. ORT-CUDA fallback: 108.11 ms → ORT-TRT EP: 19.49 ms.
[serve,gpu]extras now pulltensorrt>=10.0,<11+ the NVIDIA CUDA libraries automatically.reflexpatchesLD_LIBRARY_PATHat import time so the TRT runtime libraries are found without manual env-var fiddling.reflex doctorgains 4 TRT EP load-chain checks (libnvinfer.so.10,libcublas.so.12,libcudnn.so.9, ORT-TRT EP active).- 47/47 chat tool tests passing.
Validation: reflex_context/03_experiments/2026-04-29-v07-runtime-spike.md (9 iterations, ~$5 Modal).
Shipped 2026-04-28.
- Decomposed pi0.5 ships in Reflex via
reflex export --mode decomposed. Splits the model intovlm_prefix.onnx+expert_denoise.onnxwith KV cache reuse. - 9× speedup verified on Jetson AGX Orin: monolithic 900 ms/chunk → decomposed 100 ms/chunk.
--export-mode {auto,parallel,sequential}flag — auto-detects via VRAM probe; refuses withInsufficientVRAMErrorrather than silently falling back.- Hardware matrix CI: Mac CPU / Orin Nano / Orin AGX / RTX / T4 / A10G / A100.
v0.5.5 → v0.5.0
Section titled “v0.5.5 → v0.5.0”Shipped 2026-04-23 to 2026-04-28.
- First public PyPI release at v0.5.0 (2026-04-28). License changed from Apache 2.0 to BSL 1.1 between dev releases (HashiCorp / MongoDB pattern).
- A2C2 correction head ships behind
--a2c2-checkpoint. Auto-skip semantics validated on N=10 LIBERO at--inject-latency-ms 100: 8/10 (80%), matching baseline exactly. - Auto-calibration ships behind
--auto-calibrate. Hardware fingerprinting + selection across (variant × provider × NFE × chunk_size). Cache at~/.reflex/calibration.json. - Policy versioning substrate ships: 2-slot router, sticky-per-episode SHA-256 hash, per-policy circuit breaker, additive record-replay schema.
- Eval-as-a-service ships at
reflex eval— LIBERO suite + cost preview + machine-readable JSON envelope (schema v1 locked). - Self-distilling serve Pro-tier substrate: 9-gate methodology + 24h post-swap monitor + auto-rollback. Phase 1 dev-license harness.
v0.4 → v0.3.1
Section titled “v0.4 → v0.3.1”Shipped April 2026.
- SnapFlow distillation ships as
reflex train distill(first public open-source reproduction). 1-step student beats 10-step teacher: 64% vs 56% on libero_object N=50. - ONNX export reliability hardening across pi0, pi0.5, SmolVLA, GR00T. Three load-bearing patches under
torch.export: F.pad causal mask, frozenDynamicLayer.update, manualpast_kv.get_seq_length()for mask assembly. - Phoenix wired as OTel backend; record-replay schema v1 locked.
- 24× CLI speedup via PEP 562 lazy-import (
reflex --version: 2.4 s → 0.10 s).
Shipped late March 2026.
- First “talk to your robot fleet”
reflex chatagent. 17 chat tools wrapping the entire reflex CLI surface. - ROS2
reflex ros2-servetransport (now legacy alias forserve --transport ros2). - Adaptive denoising telemetry on flow-matching VLAs.
Internal-only milestones, March 2026. First end-to-end pipeline: HuggingFace lerobot/smolvla_base → ONNX export → FastAPI /act server. Foundation for everything since.
For the full per-commit history, see GitHub releases.