Skip to content

distill — SnapFlow 1-step

reflex train distill runs SnapFlow 1-step distillation: from a 10-step flow-matching teacher (pi0, pi0.5, SmolVLA), train a 1-step student that retains — or in our reproduction, exceeds — the teacher’s task success rate.

Student beats teacher on libero_object, N=50: 64% vs 56%.

First public open-source reproduction we’re aware of. The student is a separately exportable ONNX with 10× lower inference cost (1 forward pass vs 10).

StageModal costTime
Distill (A100, 10K steps)~$8-1260-90 min
Eval against LIBERO N=50~$3-530 min
Terminal window
# Distill a customer-specific student from a pi0.5 teacher
reflex train distill ./teacher-export/ \
--output ./student-export/ \
--steps 10000 \
--runtime modal
# Validate the student's parity against the teacher
reflex validate export ./student-export/ \
--reference ./teacher-export/ \
--threshold 1e-3
# Eval on LIBERO
reflex eval ./student-export/ --suite libero --num-episodes 50

SnapFlow is self-distillation, not teacher-supervised. The trick: the teacher and student share a backbone — the student is a single Euler step initialized from the teacher’s velocity field, then trained to match the teacher’s full unrolled 10-step trajectory.

Key implementation notes:

  • Teacher in eval mode, frozen weights — gradients flow through the student only
  • Loss: L2 on the integrated trajectory, not per-step velocity
  • Mix ratio 50/50 — half the batches are teacher rollouts, half are noise samples (regularizes against narrow distribution)
  • State-out distillation — the student takes state as an explicit input, not as part of the language conditioning. This unlocks prefix KV cache hits in production.
FlagDefaultNotes
<teacher_export>requiredDirectory with teacher ONNX + config
--output./student-export/Where the student lands
--steps10000Distill iterations
--mix-ratio0.5Fraction of teacher-rollout batches
--runtimemodalmodal or local
--lr1e-4AdamW learning rate
--batch-size32Per-step batch
--validate-afterfalseRun reflex validate export against teacher post-distill

reflex serve --pro --collect-data --distill-schedule nightly runs the entire loop continuously: collect production traffic, distill a customer-specific student every N hours, gate via a 9-gate methodology, hot-swap the new student. See self-distilling serve for the full Pro surface.

Terminal window
modal run scripts/modal_distill_pi05_libero.py

This runs a 10K-step distill against lerobot/pi05_libero_finetuned_v044 on Modal A100, then evaluates the student on libero_object N=50. Expected: student matches or beats teacher within statistical noise.

Full experiment notes: reflex_context/03_experiments/2026-04-26-self-distilling-serve-libero-regression-student-beats-teacher.md.

The open-source path covers offline distillation: you supply a teacher, you get a student. The Pro tier adds:

  • 4-stage continuous loop (collect / distill / eval / swap)
  • 9-gate eval methodology (3 SAFETY non-overridable, 6 PERFORMANCE overridable with audit)
  • Atomic warm-swap via the policy-versioning router
  • 24-hour post-swap monitoring with auto-rollback
  • Customer-specific HF Hub artifact storage

See self-distilling serve — same wedge, Pro license.

SnapFlow — released 2026-04. Local reading notes at reflex_context/02_research/papers/2604.05656-snapflow.md.