Skip to content

HTTP /act endpoint

reflex serve listens on :8000 (configurable) with four endpoints. /act is the inference path; the rest are observability.

Send {instruction, state, image?, episode_id?}, get back a 50-step action chunk.

{
"instruction": "pick up the red cup",
"state": [0.1, 0.2, 0.3, 0.4, 0.5, 0.6],
"image_b64": "<base64-encoded JPEG/PNG>",
"episode_id": "ep-2026-05-01-001"
}
FieldTypeRequiredNotes
instructionstringYesNatural-language task spec. ≤ 512 chars (Phase 1).
statearray of floatYesProprioceptive state vector. Length must match the embodiment’s mean_state.
image_b64stringOptionalBase64-encoded JPEG or PNG. Required when the model is multi-modal (most are).
imagearrayOptionalAlternative to image_b64 — raw HxWxC pixel array. Use base64 for HTTP.
episode_idstringRecommendedStable identifier for the current task. Required for cache locality + RTC carry-over + policy-versioning routing.
request_idstringAutoSet if you want client-side request tracing. Falls back to a UUID.
{
"actions": [[0.01, -0.02, 0.0, 0.0, 0.0, 0.0, 0.5], ...],
"num_actions": 50,
"action_dim": 7,
"latency_ms": 11.9,
"hz": 84.0,
"denoising_steps": 10,
"inference_mode": "onnx_trt_fp16",
"guard_clamped": false,
"guard_violations": []
}
FieldTypeNotes
actions2D arraynum_actions × action_dim action chunk. Pre-clamped to embodiment ranges.
num_actionsintLength of the chunk. Usually 50 for flow-matching models.
action_dimintPer-action dimensionality (7 for Franka 6-DOF + gripper).
latency_msfloatWall-clock from request receipt to response.
hzfloat1000 / latency_ms.
denoising_stepsintActual denoise steps used (may be fewer than max if --adaptive-steps engaged).
inference_modestringonnx_trt_fp16, onnx_cuda, onnx_cpu, decomposed, monolithic
guard_clampedboolTrue if ActionGuard clamped any action in the chunk.
guard_violationsarrayPer-axis violations when guard_clamped is true.

Response (telemetry — when wedges are active)

Section titled “Response (telemetry — when wedges are active)”

When you enable wedges, additional fields appear:

{
"actions": [...],
"latency_ms": 45.2,
// a2c2 (when --a2c2-checkpoint is set)
"a2c2_applied": true,
"a2c2_reason": "applied",
"a2c2_correction_magnitude": 0.073,
// record (when --record is set)
"record_seq": 1842,
// policy-versioning (when --policy-a/--policy-b are set)
"policy_slot": "a",
// robot identity (when --robot-id is set)
"robot_id": "warehouse-01"
}

These fields are additive — clients ignoring unknown fields stay backwards-compatible.

When 2-policy mode is active, the response also carries headers:

X-Reflex-Policy-Slot: a
X-Reflex-Model-Version: pi0-libero-v1@<hash>
X-Reflex-Routing-Key: ep_xyz
X-Reflex-Routing-Degraded: false
{
"error": "queue_full",
"message": "policy runtime queue at capacity",
"policy_id": "prod",
"max_queue": 1000
}

Status codes:

CodeMeaning
200Success
400Malformed request (missing field, wrong type, image decode failure)
422Schema-valid but semantically invalid (e.g. state length doesn’t match embodiment)
503Server unavailable (warming up, queue full, SLO violation, circuit breaker tripped)
500Internal error (model crashed; check /health and the audit log)

503 always carries Retry-After indicating when to retry.

Terminal window
curl http://localhost:8000/health
{
"status": "ready",
"model_loaded": true,
"model_version": "pi0-libero-v1@7a8b3c1d",
"inference_mode": "onnx_trt_fp16",
"uptime_seconds": 1245,
"robot_id": "warehouse-01",
"cuda_graphs_active": true
}

status is one of:

  • warming — first cold-start (10-70 sec). Returns HTTP 503.
  • ready — operational. HTTP 200.
  • degraded — circuit breaker tripped or both 2-policy slots failed. HTTP 503.

Load balancers should treat 503 as “skip this instance” — Reflex never returns 503 for transient single-request failures, only durable instance state.

Returns the saved reflex_config.json from the loaded export. Useful for verifying which model + embodiment + version is actually serving.

Prometheus exposition format. Scrape interval: 15 seconds is typical.

Key metrics:

MetricTypeLabels
reflex_act_latency_secondsHistogramembodiment, policy_slot, inference_mode
reflex_act_totalCounterembodiment, status
reflex_guard_clamped_totalCounterembodiment
reflex_cache_hit_total / reflex_cache_miss_totalCounterembodiment, policy_slot
reflex_in_flight_requestsGaugepolicy_slot
reflex_robot_infoGaugerobot_id, embodiment, model_id
reflex_slo_violations_totalCounterembodiment, kind

Default: Access-Control-Allow-Origin: * (open). Override with --cors-origins https://app.example.com for restricted origins. CORS is on for browser-based control loops; never disable in non-browser deploys.

Python client is provided:

from reflex.client import ReflexClient
with ReflexClient("http://localhost:8000") as client:
with client.episode() as ep:
result = ep.act(image=numpy_frame, state=[0.1, 0.2, ...])
print(result["actions"])

The client handles 503 retries, episode_id management, and image base64 encoding. Other languages: just use any HTTP client; the wire format is plain JSON.