Roadmap

Now (v0.7)

Shipped on PyPI, source-available under BSL 1.1. Working end-to-end on the four major open VLAs at machine-precision parity. The wedges currently shipped:

export — monolithic + decomposed ONNX, validated cos = +1.0
serve — FastAPI runtime, ORT-TensorRT EP default, composable wedges
distill — SnapFlow 1-step distillation (first public reproduction)
guard — ActionGuard with embodiment + safety-config layers
a2c2 — chunk correction head with auto-skip
cuda-graphs — ORT-native capture, tier-aware fallback
batching — cost-weighted scheduler with backpressure
slo — rolling p99 enforcement with 503 / log-only / degrade modes
fleet — per-robot Prometheus identity
otel — GenAI-semconv tracing
mcp — Model Context Protocol server
record-replay — JSONL trace + reflex replay
auto-calibrate — passive selection across hardware tiers
policy-versioning — 2-slot routing for A/B + rollback

Phase 1 KPIs targeted by end-of-quarter: 500+ GitHub stars, 3 paying Pro subscribers, NVIDIA Inception membership.

v0.8 (next 60 days)

Target ship date is rolling. The order is approximate; we ship as we land each.

Item	Why	Status
First-class ROS2 transport (`reflex serve --transport ros2`)	Folds the legacy `ros2-serve` alias into the main `serve` surface	Designed
Bearer-token auth (`auth-bearer` wedge)	Currently `/act` has no auth — fine for local but blocks regulated-industry deployments	Designed
Shadow inference (`--shadow-policy`)	Run a candidate policy alongside production without affecting traffic	Designed
Adaptive denoise validation in production	`--adaptive-steps` flag exists; Phase 1 ships telemetry but not real early-exit; v0.8 wires the cutoff	In progress
OpenVLA full coverage	Currently a passing reference; should be a first-class supported model	Designed
Blackwell support	RTX 5090 + B200 segfault at startup today (ORT-bundled cuBLAS/cuDNN missing sm_100 kernels). Tracking ORT upstream + investigating TensorRT-LLM path	Tracking upstream
Per-policy calibration in 2-policy mode	Auto-calibrate composes with policy-versioning’s per-policy state	Designed

Phase 1.5 (90-180 days)

Hardening and DX polish on top of the wedge surface. Scoped from real user feedback once we have ~10 deployed teams.

Live VLM-prefix observation feeding A2C2 (paper assumption that the head sees real per-image context)
CLI flags for A2C2 thresholds (--a2c2-latency-threshold-ms, --a2c2-success-threshold)
Auto-calibrate default-on (currently opt-in)
reflex eval macOS local fallback
reflex eval SimplerEnv suite
Customer-data fine-tune composition with auto-calibrate
Email + Slack adapters for Pro tier weekly reports
Versioning system in docs (multi-version side-by-side)

Pro tier

reflex serve --pro gates the continuous-learning loop:

4-stage loop: collect → distill → eval → swap
9-gate methodology (3 SAFETY non-overridable, 6 PERFORMANCE overridable with audit)
Atomic warm-swap via the policy-versioning router (≤ 60s SLA)
24-hour post-swap monitoring with auto-rollback
Customer-specific HF Hub artifact storage

Pricing: $99/mo. Net gross margin: 37-57% at typical $10-15/wk Modal burn.

See pricing for the full breakdown.

The 5-phase arc

The longer-term plan that the v0.7 software is Phase 1 of:

Phase	Window	Deliverable	KPI target
1 — Software	0-6 mo	Open-source CLI on PyPI. Seven wedges shipped.	500 stars, 3 Pro subs, NVIDIA Inception
2 — Hardware bundles	6-12 mo	Pre-flashed Jetson kits via Seeed reComputer, Trossen, Connect Tech / ADLINK partnerships. Rev-share per unit.	3 SKUs, 100+ units/mo, $20K MRR
3 — Compute Pack	12-24 mo	Branded Jetson appliance, $2-5K/unit. Sold direct to lab integrators + small robotics teams.	500 units, $3M ARR
4 — Custom silicon	24-48 mo	VLA-specific inference ASIC. Flow-matching denoising loops have fixed shapes — perfect for an ASIC.	$15-30M Series A
5 — Taiwan datacenter	48+ mo	Own-silicon datacenter co-located with TSMC and Asian humanoid OEMs. Sell VLA inference as a utility.	$100M+ Series B

The thread: the robotics inference market needs different silicon and tooling than the LLM inference market, and most of the value will accrue to whoever owns the deployment layer (Phase 1) early enough to compound through to silicon (Phase 4) and operating cost advantage (Phase 5).

What’s deliberately NOT on the roadmap

Generic inference server. Triton, Ray Serve, vLLM win on this dimension. Reflex is VLA-only.
Token-level autoregressive scheduling. vLLM’s pattern. Wrong tool for fixed-shape action chunks.
Kubernetes operators. Run on K8s, don’t be K8s.
Multi-language i18n in docs. No signal that the audience needs it.
Browser-based playground. Long-term vision, but not on the 12-month plan.

Competitive landscape

Three methods landed in the research vault recently that overlap or threaten the wedges Reflex differentiates on. Naming them here for honesty + so the roadmap can react if any of them produces a stronger result than what Reflex ships:

Method	Threatens	Status
Mean-Flow VLA — single-step action via mean-flow vector field, claims 8.7× faster than SmolVLA / 83.9× vs Diffusion Policy	SnapFlow’s “first public 1-NFE pi0.5” headline	Reproducing in Q3; if competitive, ship as a `reflex train distill --method mean-flow` backend
FASTER — Horizon-Aware Schedule for flow VLAs, per-action-index NFE budgeting (1 step on leading action, more on the tail), 10× compression	A2C2 stack’s “leading action gets corrected” approach	Architectural pivot under research; A2C2 Phase 2 plan tracks this
AsyncVLA — per-token non-uniform denoising + confidence-rater self-correction, unified sync/async modes	— (complementary)	Reflex’s per-step expert export contract is the substrate AsyncVLA-class methods build on. We benefit from this work landing.

The thread: Reflex’s moat is the toolchain and the verified parity, not loyalty to any specific distillation method. As better methods land, they become backends.