fleet telemetry
When you deploy Reflex one process per robot, --robot-id gives each process a human-readable identity that Prometheus + Grafana can pivot on. Per-robot p99, per-robot error rates, per-robot safety violations — all in one dashboard.
Zero cost when you’re not using it. Single-robot deploys see no extra cardinality.
Quick start
Section titled “Quick start”# On robot A:reflex serve ./my-export/ --robot-id warehouse-01 --port 8000
# On robot B:reflex serve ./my-export/ --robot-id warehouse-02 --port 8000
# On robot C:reflex serve ./my-export/ --robot-id arm-prototype-alpha --port 8000Your Prometheus config scrapes each instance. In Grafana, import dashboards/reflex_fleet.json and select one or more robots from the dropdown.
How it works
Section titled “How it works”Each reflex serve process exports a single info-style gauge:
reflex_robot_info{robot_id="warehouse-01",embodiment="franka",model_id="pi0-libero"} 1Grafana joins hot metrics to this gauge via instance:
histogram_quantile(0.99, sum by (le, instance) (rate(reflex_act_latency_seconds_bucket[5m]))) * on (instance) group_left(robot_id) reflex_robot_inforobot_id appears as a label on p99 latency even though the underlying histogram doesn’t carry it. Cardinality stays flat — one series per process on reflex_robot_info, not one per request on every histogram.
Why not put robot_id as a label on every metric?
Section titled “Why not put robot_id as a label on every metric?”A fleet of 1,000 robots × 3 embodiments × 6 models × N metrics = hundreds of thousands of series. Prometheus handles that but pays memory for it, and most per-label slicing operators actually want — per-robot, not per-(robot × embodiment) — is available from the info-metric join.
We keep the existing label set tight (embodiment, model_id, violation_kind, etc.) and let operators opt into per-robot slicing via --robot-id + the info join.
Endpoints that expose robot_id
Section titled “Endpoints that expose robot_id”Every reflex serve process surfaces the robot_id via:
GET /health—"robot_id": "warehouse-01"in the JSON bodyGET /config— same keyGET /metrics—reflex_robot_info{robot_id="warehouse-01",...}(when set)
When --robot-id is unset, robot_id is "" on /health and /config, and no reflex_robot_info series is emitted.
Alerting on a single robot
Section titled “Alerting on a single robot”- alert: ReflexRobotLatencyHigh expr: | histogram_quantile(0.99, sum by (le, instance) (rate(reflex_act_latency_seconds_bucket[5m])) ) * on (instance) group_left(robot_id) reflex_robot_info{robot_id="warehouse-01"} > 0.2 for: 3m labels: { severity: page, robot_id: warehouse-01 } annotations: summary: "Reflex on {{ $labels.robot_id }} over p99=200ms"Drop the robot_id= filter to alert on any robot in the fleet.
Deployment patterns
Section titled “Deployment patterns”One process per robot (recommended)
Section titled “One process per robot (recommended)”# systemd unit per robot — hostname macro gives each robot its own identityExecStart=/usr/local/bin/reflex serve /opt/reflex/export \ --robot-id %H \ --port 8000 \ --slo p99=150msCentral aggregator
Section titled “Central aggregator”Don’t. Reflex does per-process inference; a central aggregator adds network latency that violates the real-time invariant. Instead, scrape each robot’s /metrics from a central Prometheus and render one dashboard against the aggregate.