distill — SnapFlow 1-step
tether train distill runs SnapFlow 1-step distillation: from a 10-step flow-matching teacher (pi0, pi0.5, SmolVLA), train a 1-step student that retains — or in our reproduction, exceeds — the teacher’s task success rate.
The headline result
Section titled “The headline result”Student beats teacher on libero_object, N=50: 64% vs 56%.
First public open-source reproduction we’re aware of. The student is a separately exportable ONNX with 10× lower inference cost (1 forward pass vs 10).
| Stage | Modal cost | Time |
|---|---|---|
| Distill (A100, 10K steps) | ~$8-12 | 60-90 min |
| Eval against LIBERO N=50 | ~$3-5 | 30 min |
Quick start
Section titled “Quick start”# Distill a customer-specific student from a pi0.5 teachertether train distill ./teacher-export/ \ --output ./student-export/ \ --steps 10000 \ --runtime modal
# Validate the student's parity against the teachertether validate export ./student-export/ \ --reference ./teacher-export/ \ --threshold 1e-3
# Eval on LIBEROtether eval ./student-export/ --suite libero --num-episodes 50How it works
Section titled “How it works”SnapFlow is self-distillation, not teacher-supervised. The trick: the teacher and student share a backbone — the student is a single Euler step initialized from the teacher’s velocity field, then trained to match the teacher’s full unrolled 10-step trajectory.
Key implementation notes:
- Teacher in eval mode, frozen weights — gradients flow through the student only
- Loss: L2 on the integrated trajectory, not per-step velocity
- Mix ratio 50/50 — half the batches are teacher rollouts, half are noise samples (regularizes against narrow distribution)
- State-out distillation — the student takes state as an explicit input, not as part of the language conditioning. This unlocks prefix KV cache hits in production.
CLI surface (Phase 1)
Section titled “CLI surface (Phase 1)”| Flag | Default | Notes |
|---|---|---|
<teacher_export> | required | Directory with teacher ONNX + config |
--output | ./student-export/ | Where the student lands |
--steps | 10000 | Distill iterations |
--mix-ratio | 0.5 | Fraction of teacher-rollout batches |
--runtime | modal | modal or local |
--lr | 1e-4 | AdamW learning rate |
--batch-size | 32 | Per-step batch |
--validate-after | false | Run tether validate export against teacher post-distill |
Pro tier: continuous self-distillation
Section titled “Pro tier: continuous self-distillation”tether serve --pro --collect-data --distill-schedule nightly runs the entire loop continuously: collect production traffic, distill a customer-specific student every N hours, gate via a 9-gate methodology, hot-swap the new student. See self-distilling serve for the full Pro surface.
Reproducing our LIBERO numbers
Section titled “Reproducing our LIBERO numbers”modal run scripts/modal_distill_pi05_libero.pyThis runs a 10K-step distill against lerobot/pi05_libero_finetuned_v044 on Modal A100, then evaluates the student on libero_object N=50. Expected: student matches or beats teacher within statistical noise.
Reproduce the run with the Modal script above; results land alongside eval_output/. The student-beats-teacher result on pi05_libero_finetuned_v044 was published in the v0.7 changelog.
What’s NOT in the open-source distill
Section titled “What’s NOT in the open-source distill”The open-source path covers offline distillation: you supply a teacher, you get a student. The Pro tier adds:
- 4-stage continuous loop (collect / distill / eval / swap)
- 9-gate eval methodology (3 SAFETY non-overridable, 6 PERFORMANCE overridable with audit)
- Atomic warm-swap via the policy-versioning router
- 24-hour post-swap monitoring with auto-rollback
- Customer-specific HF Hub artifact storage
See self-distilling serve — same wedge, Pro license.
Source paper + competitive landscape
Section titled “Source paper + competitive landscape”The wedge is built on SnapFlow (released 2026-04). Three concurrent or follow-up methods are worth knowing about — they target the same single-step inference goal via different math:
| Paper | Approach | Implication for Tether |
|---|---|---|
| Mean-Flow VLA | Single-step action via a learned mean-flow vector field — no consistency constraints | Different math, similar single-step outcome. Threatens SnapFlow’s “first public 1-NFE pi0.5” headline. |
| FASTER (Horizon-Aware Schedule for flow VLAs) | Per-action-index NFE — 1 step on the leading action, more on the tail. 10× compression on pi0.5 / X-VLA. | Reframes the problem the A2C2 wedge tries to solve. Phase 2 architectural pivot tracks this. |
| AsyncVLA | Per-token non-uniform denoising + confidence-rater self-correction | Built on the per-step expert export contract that v0.8.0 ships. Tether’s decomposed export unblocks AsyncVLA-style methods on existing pi0/pi05 weights without retraining. |
Tether’s distill wedge will track these methods — when one of them produces a stronger student than SnapFlow on the same benchmarks, we’ll add it as a backend (tether train distill --method mean-flow, etc.). The shipping commitment is to whichever method produces the best 1-NFE result at the time, not loyalty to SnapFlow specifically.