Roadmap
Now (v0.7)
Section titled “Now (v0.7)”Shipped on PyPI, source-available under BSL 1.1. Working end-to-end on the four major open VLAs at machine-precision parity. The wedges currently shipped:
- export — monolithic + decomposed ONNX, validated
cos = +1.0 - serve — FastAPI runtime, ORT-TensorRT EP default, composable wedges
- distill — SnapFlow 1-step distillation (first public reproduction)
- guard — ActionGuard with embodiment + safety-config layers
- a2c2 — chunk correction head with auto-skip
- cuda-graphs — ORT-native capture, tier-aware fallback
- batching — cost-weighted scheduler with backpressure
- slo — rolling p99 enforcement with 503 / log-only / degrade modes
- fleet — per-robot Prometheus identity
- otel — GenAI-semconv tracing
- mcp — Model Context Protocol server
- record-replay — JSONL trace + reflex replay
- auto-calibrate — passive selection across hardware tiers
- policy-versioning — 2-slot routing for A/B + rollback
Phase 1 KPIs targeted by end-of-quarter: 500+ GitHub stars, 3 paying Pro subscribers, NVIDIA Inception membership.
v0.8 (next 60 days)
Section titled “v0.8 (next 60 days)”Target ship date is rolling. The order is approximate; we ship as we land each.
| Item | Why | Status |
|---|---|---|
First-class ROS2 transport (reflex serve --transport ros2) | Folds the legacy ros2-serve alias into the main serve surface | Designed |
Bearer-token auth (auth-bearer wedge) | Currently /act has no auth — fine for local but blocks regulated-industry deployments | Designed |
Shadow inference (--shadow-policy) | Run a candidate policy alongside production without affecting traffic | Designed |
| Adaptive denoise validation in production | --adaptive-steps flag exists; Phase 1 ships telemetry but not real early-exit; v0.8 wires the cutoff | In progress |
| OpenVLA full coverage | Currently a passing reference; should be a first-class supported model | Designed |
| Blackwell support | RTX 5090 + B200 segfault at startup today (ORT-bundled cuBLAS/cuDNN missing sm_100 kernels). Tracking ORT upstream + investigating TensorRT-LLM path | Tracking upstream |
| Per-policy calibration in 2-policy mode | Auto-calibrate composes with policy-versioning’s per-policy state | Designed |
Phase 1.5 (90-180 days)
Section titled “Phase 1.5 (90-180 days)”Hardening and DX polish on top of the wedge surface. Scoped from real user feedback once we have ~10 deployed teams.
- Live VLM-prefix observation feeding A2C2 (paper assumption that the head sees real per-image context)
- CLI flags for A2C2 thresholds (
--a2c2-latency-threshold-ms,--a2c2-success-threshold) - Auto-calibrate default-on (currently opt-in)
reflex evalmacOS local fallbackreflex evalSimplerEnv suite- Customer-data fine-tune composition with auto-calibrate
- Email + Slack adapters for Pro tier weekly reports
- Versioning system in docs (multi-version side-by-side)
Pro tier
Section titled “Pro tier”reflex serve --pro gates the continuous-learning loop:
- 4-stage loop: collect → distill → eval → swap
- 9-gate methodology (3 SAFETY non-overridable, 6 PERFORMANCE overridable with audit)
- Atomic warm-swap via the policy-versioning router (≤ 60s SLA)
- 24-hour post-swap monitoring with auto-rollback
- Customer-specific HF Hub artifact storage
Pricing: $99/mo. Net gross margin: 37-57% at typical $10-15/wk Modal burn.
See pricing for the full breakdown.
The 5-phase arc
Section titled “The 5-phase arc”The longer-term plan that the v0.7 software is Phase 1 of:
| Phase | Window | Deliverable | KPI target |
|---|---|---|---|
| 1 — Software | 0-6 mo | Open-source CLI on PyPI. Seven wedges shipped. | 500 stars, 3 Pro subs, NVIDIA Inception |
| 2 — Hardware bundles | 6-12 mo | Pre-flashed Jetson kits via Seeed reComputer, Trossen, Connect Tech / ADLINK partnerships. Rev-share per unit. | 3 SKUs, 100+ units/mo, $20K MRR |
| 3 — Compute Pack | 12-24 mo | Branded Jetson appliance, $2-5K/unit. Sold direct to lab integrators + small robotics teams. | 500 units, $3M ARR |
| 4 — Custom silicon | 24-48 mo | VLA-specific inference ASIC. Flow-matching denoising loops have fixed shapes — perfect for an ASIC. | $15-30M Series A |
| 5 — Taiwan datacenter | 48+ mo | Own-silicon datacenter co-located with TSMC and Asian humanoid OEMs. Sell VLA inference as a utility. | $100M+ Series B |
The thread: the robotics inference market needs different silicon and tooling than the LLM inference market, and most of the value will accrue to whoever owns the deployment layer (Phase 1) early enough to compound through to silicon (Phase 4) and operating cost advantage (Phase 5).
What’s deliberately NOT on the roadmap
Section titled “What’s deliberately NOT on the roadmap”- Generic inference server. Triton, Ray Serve, vLLM win on this dimension. Reflex is VLA-only.
- Token-level autoregressive scheduling. vLLM’s pattern. Wrong tool for fixed-shape action chunks.
- Kubernetes operators. Run on K8s, don’t be K8s.
- Multi-language i18n in docs. No signal that the audience needs it.
- Browser-based playground. Long-term vision, but not on the 12-month plan.
Competitive landscape
Section titled “Competitive landscape”Three methods landed in the research vault recently that overlap or threaten the wedges Reflex differentiates on. Naming them here for honesty + so the roadmap can react if any of them produces a stronger result than what Reflex ships:
| Method | Threatens | Status |
|---|---|---|
| Mean-Flow VLA — single-step action via mean-flow vector field, claims 8.7× faster than SmolVLA / 83.9× vs Diffusion Policy | SnapFlow’s “first public 1-NFE pi0.5” headline | Reproducing in Q3; if competitive, ship as a reflex train distill --method mean-flow backend |
| FASTER — Horizon-Aware Schedule for flow VLAs, per-action-index NFE budgeting (1 step on leading action, more on the tail), 10× compression | A2C2 stack’s “leading action gets corrected” approach | Architectural pivot under research; A2C2 Phase 2 plan tracks this |
| AsyncVLA — per-token non-uniform denoising + confidence-rater self-correction, unified sync/async modes | — (complementary) | Reflex’s per-step expert export contract is the substrate AsyncVLA-class methods build on. We benefit from this work landing. |
The thread: Reflex’s moat is the toolchain and the verified parity, not loyalty to any specific distillation method. As better methods land, they become backends.