distill — SnapFlow 1-step
reflex train distill runs SnapFlow 1-step distillation: from a 10-step flow-matching teacher (pi0, pi0.5, SmolVLA), train a 1-step student that retains — or in our reproduction, exceeds — the teacher’s task success rate.
The headline result
Section titled “The headline result”Student beats teacher on libero_object, N=50: 64% vs 56%.
First public open-source reproduction we’re aware of. The student is a separately exportable ONNX with 10× lower inference cost (1 forward pass vs 10).
| Stage | Modal cost | Time |
|---|---|---|
| Distill (A100, 10K steps) | ~$8-12 | 60-90 min |
| Eval against LIBERO N=50 | ~$3-5 | 30 min |
Quick start
Section titled “Quick start”# Distill a customer-specific student from a pi0.5 teacherreflex train distill ./teacher-export/ \ --output ./student-export/ \ --steps 10000 \ --runtime modal
# Validate the student's parity against the teacherreflex validate export ./student-export/ \ --reference ./teacher-export/ \ --threshold 1e-3
# Eval on LIBEROreflex eval ./student-export/ --suite libero --num-episodes 50How it works
Section titled “How it works”SnapFlow is self-distillation, not teacher-supervised. The trick: the teacher and student share a backbone — the student is a single Euler step initialized from the teacher’s velocity field, then trained to match the teacher’s full unrolled 10-step trajectory.
Key implementation notes:
- Teacher in eval mode, frozen weights — gradients flow through the student only
- Loss: L2 on the integrated trajectory, not per-step velocity
- Mix ratio 50/50 — half the batches are teacher rollouts, half are noise samples (regularizes against narrow distribution)
- State-out distillation — the student takes state as an explicit input, not as part of the language conditioning. This unlocks prefix KV cache hits in production.
CLI surface (Phase 1)
Section titled “CLI surface (Phase 1)”| Flag | Default | Notes |
|---|---|---|
<teacher_export> | required | Directory with teacher ONNX + config |
--output | ./student-export/ | Where the student lands |
--steps | 10000 | Distill iterations |
--mix-ratio | 0.5 | Fraction of teacher-rollout batches |
--runtime | modal | modal or local |
--lr | 1e-4 | AdamW learning rate |
--batch-size | 32 | Per-step batch |
--validate-after | false | Run reflex validate export against teacher post-distill |
Pro tier: continuous self-distillation
Section titled “Pro tier: continuous self-distillation”reflex serve --pro --collect-data --distill-schedule nightly runs the entire loop continuously: collect production traffic, distill a customer-specific student every N hours, gate via a 9-gate methodology, hot-swap the new student. See self-distilling serve for the full Pro surface.
Reproducing our LIBERO numbers
Section titled “Reproducing our LIBERO numbers”modal run scripts/modal_distill_pi05_libero.pyThis runs a 10K-step distill against lerobot/pi05_libero_finetuned_v044 on Modal A100, then evaluates the student on libero_object N=50. Expected: student matches or beats teacher within statistical noise.
Full experiment notes: reflex_context/03_experiments/2026-04-26-self-distilling-serve-libero-regression-student-beats-teacher.md.
What’s NOT in the open-source distill
Section titled “What’s NOT in the open-source distill”The open-source path covers offline distillation: you supply a teacher, you get a student. The Pro tier adds:
- 4-stage continuous loop (collect / distill / eval / swap)
- 9-gate eval methodology (3 SAFETY non-overridable, 6 PERFORMANCE overridable with audit)
- Atomic warm-swap via the policy-versioning router
- 24-hour post-swap monitoring with auto-rollback
- Customer-specific HF Hub artifact storage
See self-distilling serve — same wedge, Pro license.
Source paper
Section titled “Source paper”SnapFlow — released 2026-04. Local reading notes at reflex_context/02_research/papers/2604.05656-snapflow.md.