Skip to content

Roadmap

Shipped on PyPI, source-available under BSL 1.1. Working end-to-end on the four major open VLAs at machine-precision parity. The wedges currently shipped:

  • export — monolithic + decomposed ONNX, validated cos = +1.0
  • serve — FastAPI runtime, ORT-TensorRT EP default, composable wedges
  • distill — SnapFlow 1-step distillation (first public reproduction)
  • guard — ActionGuard with embodiment + safety-config layers
  • a2c2 — chunk correction head with auto-skip
  • cuda-graphs — ORT-native capture, tier-aware fallback
  • batching — cost-weighted scheduler with backpressure
  • slo — rolling p99 enforcement with 503 / log-only / degrade modes
  • fleet — per-robot Prometheus identity
  • otel — GenAI-semconv tracing
  • mcp — Model Context Protocol server
  • record-replay — JSONL trace + reflex replay
  • auto-calibrate — passive selection across hardware tiers
  • policy-versioning — 2-slot routing for A/B + rollback

Phase 1 KPIs targeted by end-of-quarter: 500+ GitHub stars, 3 paying Pro subscribers, NVIDIA Inception membership.

Target ship date is rolling. The order is approximate; we ship as we land each.

ItemWhyStatus
First-class ROS2 transport (reflex serve --transport ros2)Folds the legacy ros2-serve alias into the main serve surfaceDesigned
Bearer-token auth (auth-bearer wedge)Currently /act has no auth — fine for local but blocks regulated-industry deploymentsDesigned
Shadow inference (--shadow-policy)Run a candidate policy alongside production without affecting trafficDesigned
Adaptive denoise validation in production--adaptive-steps flag exists; Phase 1 ships telemetry but not real early-exit; v0.8 wires the cutoffIn progress
OpenVLA full coverageCurrently a passing reference; should be a first-class supported modelDesigned
Blackwell supportRTX 5090 + B200 segfault at startup today (ORT-bundled cuBLAS/cuDNN missing sm_100 kernels). Tracking ORT upstream + investigating TensorRT-LLM pathTracking upstream
Per-policy calibration in 2-policy modeAuto-calibrate composes with policy-versioning’s per-policy stateDesigned

Hardening and DX polish on top of the wedge surface. Scoped from real user feedback once we have ~10 deployed teams.

  • Live VLM-prefix observation feeding A2C2 (paper assumption that the head sees real per-image context)
  • CLI flags for A2C2 thresholds (--a2c2-latency-threshold-ms, --a2c2-success-threshold)
  • Auto-calibrate default-on (currently opt-in)
  • reflex eval macOS local fallback
  • reflex eval SimplerEnv suite
  • Customer-data fine-tune composition with auto-calibrate
  • Email + Slack adapters for Pro tier weekly reports
  • Versioning system in docs (multi-version side-by-side)

reflex serve --pro gates the continuous-learning loop:

  • 4-stage loop: collect → distill → eval → swap
  • 9-gate methodology (3 SAFETY non-overridable, 6 PERFORMANCE overridable with audit)
  • Atomic warm-swap via the policy-versioning router (≤ 60s SLA)
  • 24-hour post-swap monitoring with auto-rollback
  • Customer-specific HF Hub artifact storage

Pricing: $99/mo. Net gross margin: 37-57% at typical $10-15/wk Modal burn.

See pricing for the full breakdown.

The longer-term plan that the v0.7 software is Phase 1 of:

PhaseWindowDeliverableKPI target
1 — Software0-6 moOpen-source CLI on PyPI. Seven wedges shipped.500 stars, 3 Pro subs, NVIDIA Inception
2 — Hardware bundles6-12 moPre-flashed Jetson kits via Seeed reComputer, Trossen, Connect Tech / ADLINK partnerships. Rev-share per unit.3 SKUs, 100+ units/mo, $20K MRR
3 — Compute Pack12-24 moBranded Jetson appliance, $2-5K/unit. Sold direct to lab integrators + small robotics teams.500 units, $3M ARR
4 — Custom silicon24-48 moVLA-specific inference ASIC. Flow-matching denoising loops have fixed shapes — perfect for an ASIC.$15-30M Series A
5 — Taiwan datacenter48+ moOwn-silicon datacenter co-located with TSMC and Asian humanoid OEMs. Sell VLA inference as a utility.$100M+ Series B

The thread: the robotics inference market needs different silicon and tooling than the LLM inference market, and most of the value will accrue to whoever owns the deployment layer (Phase 1) early enough to compound through to silicon (Phase 4) and operating cost advantage (Phase 5).

  • Generic inference server. Triton, Ray Serve, vLLM win on this dimension. Reflex is VLA-only.
  • Token-level autoregressive scheduling. vLLM’s pattern. Wrong tool for fixed-shape action chunks.
  • Kubernetes operators. Run on K8s, don’t be K8s.
  • Multi-language i18n in docs. No signal that the audience needs it.
  • Browser-based playground. Long-term vision, but not on the 12-month plan.

Three methods landed in the research vault recently that overlap or threaten the wedges Reflex differentiates on. Naming them here for honesty + so the roadmap can react if any of them produces a stronger result than what Reflex ships:

MethodThreatensStatus
Mean-Flow VLA — single-step action via mean-flow vector field, claims 8.7× faster than SmolVLA / 83.9× vs Diffusion PolicySnapFlow’s “first public 1-NFE pi0.5” headlineReproducing in Q3; if competitive, ship as a reflex train distill --method mean-flow backend
FASTER — Horizon-Aware Schedule for flow VLAs, per-action-index NFE budgeting (1 step on leading action, more on the tail), 10× compressionA2C2 stack’s “leading action gets corrected” approachArchitectural pivot under research; A2C2 Phase 2 plan tracks this
AsyncVLA — per-token non-uniform denoising + confidence-rater self-correction, unified sync/async modes— (complementary)Reflex’s per-step expert export contract is the substrate AsyncVLA-class methods build on. We benefit from this work landing.

The thread: Reflex’s moat is the toolchain and the verified parity, not loyalty to any specific distillation method. As better methods land, they become backends.