distill — SnapFlow 1-step

reflex train distill runs SnapFlow 1-step distillation: from a 10-step flow-matching teacher (pi0, pi0.5, SmolVLA), train a 1-step student that retains — or in our reproduction, exceeds — the teacher’s task success rate.

The headline result

Student beats teacher on libero_object, N=50: 64% vs 56%.

First public open-source reproduction we’re aware of. The student is a separately exportable ONNX with 10× lower inference cost (1 forward pass vs 10).

Stage	Modal cost	Time
Distill (A100, 10K steps)	~$8-12	60-90 min
Eval against LIBERO N=50	~$3-5	30 min

Quick start

# Distill a customer-specific student from a pi0.5 teacher
reflex train distill ./teacher-export/ \
  --output ./student-export/ \
  --steps 10000 \
  --runtime modal

# Validate the student's parity against the teacher
reflex validate export ./student-export/ \
  --reference ./teacher-export/ \
  --threshold 1e-3

# Eval on LIBERO
reflex eval ./student-export/ --suite libero --num-episodes 50

How it works

SnapFlow is self-distillation, not teacher-supervised. The trick: the teacher and student share a backbone — the student is a single Euler step initialized from the teacher’s velocity field, then trained to match the teacher’s full unrolled 10-step trajectory.

Key implementation notes:

Teacher in eval mode, frozen weights — gradients flow through the student only
Loss: L2 on the integrated trajectory, not per-step velocity
Mix ratio 50/50 — half the batches are teacher rollouts, half are noise samples (regularizes against narrow distribution)
State-out distillation — the student takes state as an explicit input, not as part of the language conditioning. This unlocks prefix KV cache hits in production.

CLI surface (Phase 1)

Flag	Default	Notes
`<teacher_export>`	required	Directory with teacher ONNX + config
`--output`	`./student-export/`	Where the student lands
`--steps`	`10000`	Distill iterations
`--mix-ratio`	`0.5`	Fraction of teacher-rollout batches
`--runtime`	`modal`	`modal` or `local`
`--lr`	`1e-4`	AdamW learning rate
`--batch-size`	`32`	Per-step batch
`--validate-after`	`false`	Run `reflex validate export` against teacher post-distill

Pro tier: continuous self-distillation

reflex serve --pro --collect-data --distill-schedule nightly runs the entire loop continuously: collect production traffic, distill a customer-specific student every N hours, gate via a 9-gate methodology, hot-swap the new student. See self-distilling serve for the full Pro surface.

Reproducing our LIBERO numbers

modal run scripts/modal_distill_pi05_libero.py

This runs a 10K-step distill against lerobot/pi05_libero_finetuned_v044 on Modal A100, then evaluates the student on libero_object N=50. Expected: student matches or beats teacher within statistical noise.

Full experiment notes: reflex_context/03_experiments/2026-04-26-self-distilling-serve-libero-regression-student-beats-teacher.md.

What’s NOT in the open-source distill

The open-source path covers offline distillation: you supply a teacher, you get a student. The Pro tier adds:

4-stage continuous loop (collect / distill / eval / swap)
9-gate eval methodology (3 SAFETY non-overridable, 6 PERFORMANCE overridable with audit)
Atomic warm-swap via the policy-versioning router
24-hour post-swap monitoring with auto-rollback
Customer-specific HF Hub artifact storage

See self-distilling serve — same wedge, Pro license.

Source paper

SnapFlow — released 2026-04. Local reading notes at reflex_context/02_research/papers/2604.05656-snapflow.md.