Skip to content

Roadmap

Shipped on PyPI, source-available under BSL 1.1. Phase 1 closed 2026-05-07 with 16/16 features shipped or explicitly killed. Working end-to-end on the four major open flow-matching VLAs (SmolVLA, pi0, pi0.5, GR00T N1.6) at machine-precision parity (cos = +1.000000), plus OpenVLA-7B in the curated registry via the optimum-cli onnx path. The wedges currently shipped:

  • export — monolithic + decomposed ONNX, validated cos = +1.0; v0.8.0 added per-step expert ONNX (per_step_expert=True) that exposes the diffusion loop to Python
  • serve — FastAPI runtime, ORT-TensorRT EP default, composable wedges
  • distill — SnapFlow 1-step distillation (first OSS reproduction)
  • guard — ActionGuard with embodiment + safety-config layers
  • a2c2 — chunk correction head with auto-skip (Phase 1 fix shipped; Phase 2 ON > OFF positive delta is research-revisit)
  • cuda-graphs — ORT-native capture, tier-aware fallback (1.30× per-chunk on A100, 1.18× expert-only on A10G with p99 -35%)
  • action-similarity fast-path — FlashVLA pattern, 1.24× wall-clock on Modal A100 production smoke
  • batching — cost-weighted scheduler with backpressure
  • slo — rolling p99 enforcement with 503 / log-only / degrade modes
  • fleet — per-robot Prometheus identity
  • otel — GenAI-semconv tracing
  • mcp — Model Context Protocol server
  • record-replay — JSONL trace + tether replay, plus v0.9.0’s tether traces query / tether traces summary aggregation
  • auto-calibrate — passive selection across hardware tiers
  • policy-versioning — 2-slot routing for A/B + rollback
  • curate — uncertainty scoring + 4-quadrant classifier shipped in v0.9.0; full Tether Curate (formerly Reflex Curate) data-sale wedge ADR locked 2026-05-05 (build queue)

Phase 1 KPIs targeted by end-of-quarter: 500+ GitHub stars, 3 paying Pro subscribers, NVIDIA Inception membership.

Target ship date is rolling. The order is approximate; we ship as we land each.

ItemWhyStatus
First-class ROS2 transport (tether serve --transport ros2)Folds the legacy ros2-serve alias into the main serve surfaceDesigned
Bearer-token auth (auth-bearer wedge)Currently /act has no auth — fine for local but blocks regulated-industry deploymentsDesigned
Shadow inference (--shadow-policy)Run a candidate policy alongside production without affecting trafficDesigned
Adaptive denoise validation in production--adaptive-steps flag exists; Phase 1 ships telemetry but not real early-exit; v1.0 wires the cutoffIn progress
OpenVLA full parityv0.9.6 surfaces OpenVLA in the registry at cos ≥ 0.999 via optimum-cli; v1.0 brings it to the same bit-exact parity tier as the flow-matching fourIn progress
Blackwell production-tier validationShipped in v0.9.2 (onnxruntime-gpu>=1.25.1) with a doctor guard in v0.9.3. v1.0 smoke-validates on RTX 5090 hardware end-to-end + clears the multi-threaded codepath blocked by open ORT issue #27621Shipped, awaiting end-to-end RTX 5090 validation
Per-policy calibration in 2-policy modeAuto-calibrate composes with policy-versioning’s per-policy stateDesigned
Tether Curate (formerly Reflex Curate) Phase 1Tiered-consent data collection from production traces, cleaned + anonymized + sold to model labs with customer rev share. ADR locked 2026-05-05 (Cloudflare R2, two-layer anonymization, sequential format converters across LeRobot v3 / RLDS / Open X-Embodiment / HDF5)Designed

Hardening and DX polish on top of the wedge surface. Scoped from real user feedback once we have ~10 deployed teams.

  • Live VLM-prefix observation feeding A2C2 (paper assumption that the head sees real per-image context)
  • CLI flags for A2C2 thresholds (--a2c2-latency-threshold-ms, --a2c2-success-threshold)
  • Auto-calibrate default-on (currently opt-in)
  • tether eval macOS local fallback
  • tether eval SimplerEnv suite
  • Customer-data fine-tune composition with auto-calibrate
  • Email + Slack adapters for Pro tier weekly reports
  • Versioning system in docs (multi-version side-by-side)

tether serve --pro gates the continuous-learning loop:

  • 4-stage loop: collect → distill → eval → swap
  • 9-gate methodology (3 SAFETY non-overridable, 6 PERFORMANCE overridable with audit)
  • Atomic warm-swap via the policy-versioning router (≤ 60s SLA)
  • 24-hour post-swap monitoring with auto-rollback
  • Customer-specific HF Hub artifact storage

Pricing: $99/mo. Net gross margin: 37-57% at typical $10-15/wk Modal burn.

See pricing for the full breakdown.

The longer-term plan that the v0.9 software is Phase 1 of:

PhaseWindowDeliverableKPI target
1 — Software0-6 moOpen-source CLI on PyPI. Seven wedges shipped.500 stars, 3 Pro subs, NVIDIA Inception
2 — Hardware bundles6-12 moPre-flashed Jetson kits via Seeed reComputer, Trossen, Connect Tech / ADLINK partnerships. Rev-share per unit.3 SKUs, 100+ units/mo, $20K MRR
3 — Compute Pack12-24 moBranded Jetson appliance, $2-5K/unit. Sold direct to lab integrators + small robotics teams.500 units, $3M ARR
4 — Custom silicon24-48 moVLA-specific inference ASIC. Flow-matching denoising loops have fixed shapes — perfect for an ASIC.$15-30M Series A
5 — Taiwan datacenter48+ moOwn-silicon datacenter co-located with TSMC and Asian humanoid OEMs. Sell VLA inference as a utility.$100M+ Series B

The thread: the robotics inference market needs different silicon and tooling than the LLM inference market, and most of the value will accrue to whoever owns the deployment layer (Phase 1) early enough to compound through to silicon (Phase 4) and operating cost advantage (Phase 5).

  • Generic inference server. Triton, Ray Serve, vLLM win on this dimension. Tether is VLA-only.
  • Token-level autoregressive scheduling. vLLM’s pattern. Wrong tool for fixed-shape action chunks.
  • Kubernetes operators. Run on K8s, don’t be K8s.
  • Multi-language i18n in docs. No signal that the audience needs it.
  • Browser-based playground. Long-term vision, but not on the 12-month plan.

Three methods landed in the research vault recently that overlap or threaten the wedges Tether differentiates on. Naming them here for honesty + so the roadmap can react if any of them produces a stronger result than what Tether ships:

MethodThreatensStatus
Mean-Flow VLA — single-step action via mean-flow vector field, claims 8.7× faster than SmolVLA / 83.9× vs Diffusion PolicySnapFlow’s “first public 1-NFE pi0.5” headlineReproducing in Q3; if competitive, ship as a tether train distill --method mean-flow backend
FASTER — Horizon-Aware Schedule for flow VLAs, per-action-index NFE budgeting (1 step on leading action, more on the tail), 10× compressionA2C2 stack’s “leading action gets corrected” approachArchitectural pivot under research; A2C2 Phase 2 plan tracks this
AsyncVLA — per-token non-uniform denoising + confidence-rater self-correction, unified sync/async modes— (complementary)Tether’s per-step expert export contract is the substrate AsyncVLA-class methods build on. We benefit from this work landing.

The thread: Tether’s moat is the toolchain and the verified parity, not loyalty to any specific distillation method. As better methods land, they become backends.