Skip to content

Getting started

This guide assumes a Linux box with an NVIDIA GPU. CPU-only deployments work too — every example below applies, just install with [serve,onnx] instead of [serve,gpu].

The fastest path: tether go does everything in one command.

Terminal window
# Smallest, fastest to download — good for first try
tether go --model smolvla-base
# pi0 — Physical Intelligence's flagship 3.5B model
tether go --model pi0-base
# pi0.5 — newer pi0 variant with AdaRMSNorm time conditioning
tether go --model pi05-base
# GR00T N1.6 — NVIDIA's humanoid model (3.29B)
tether go --model gr00t-n1.6
# OpenVLA — 7B model (added to registry in v0.9.6)
tether go --model openvla-7b

Under the hood, tether go:

  1. Probes hardware (tether doctor) and picks per-platform tuning (FP16 on Orin, FP8 on Thor, etc.)
  2. Downloads the checkpoint from HuggingFace (cached after first run)
  3. Runs the model-specific exporter to ONNX
  4. Validates the ONNX numerically against the PyTorch reference (cos = +1.000000)
  5. If TensorRT is installed, builds and caches an engine
  6. Starts an HTTP server on :8000

After the first run, your export cache looks like:

~/.cache/tether/exports/smolvla-base/
├── expert_stack.onnx # the graph
├── expert_stack.onnx.data # the weights (external-data file; ~1.3 GB for pi0 / smaller for SmolVLA)
├── tether_config.json # model meta — used by serve
└── expert_stack.trt # TRT engine (only if trtexec was available)

tether go left a server running on http://localhost:8000 with three endpoints:

  • POST /act — send {instruction, state, image?}, get back a 50-step action chunk
  • GET /health — returns {status, model_loaded, inference_mode}
  • GET /config — returns the saved tether_config

Test it:

Terminal window
curl -X POST http://localhost:8000/act \
-H 'content-type: application/json' \
-d '{"instruction":"pick up the red cup","state":[0.1,0.2,0.3,0.4,0.5,0.6]}'

You’ll get back:

{
"actions": [[...], [...], ...],
"num_actions": 50,
"action_dim": 32,
"latency_ms": 47.3,
"hz": 21.1,
"denoising_steps": 10,
"inference_mode": "onnx_trt_fp16",
"guard_clamped": false
}

tether go without an embodiment returns raw, unscaled actions — fine for a smoke test. For real deployment, add --embodiment:

Terminal window
tether go --model pi05-base --embodiment franka

Tether ships embodiment configs for franka, so100, ur5, and quadcopter (gripperless). Custom robots use --custom-embodiment-config <path>.

The embodiment maps action ranges to robot joint limits and engages ActionGuard for clamping unsafe outputs. The response now includes guard_clamped and guard_violations fields.

For non-experts, tether chat is a natural-language wrapper around the entire CLI:

Terminal window
tether chat
you › what version am I running and what hardware can I deploy to?
→ show_version({}) → tether --version → "tether 0.12.0"
→ list_targets({}) → tether inspect targets → [orin-nano, orin, orin-64, thor, desktop]
You're running tether 0.12.0. Supported targets:
- orin-nano — Jetson Orin Nano: 8 GB, fp16
- orin — Jetson AGX Orin 32GB: 32 GB, fp16
- orin-64 — Jetson AGX Orin 64GB: 64 GB, fp16
- thor — Jetson Thor: 128 GB, fp8
- desktop — Desktop GPU (RTX 4090 / A100 / H100 / RTX 5090): 24 GB, fp16

Chat understands 17 commands and runs them as subprocess on your behalf. Powered by GPT-5 Mini through a hosted proxy at chat.fastcrest.com — free tier is 100 calls/day per machine, no signup, no API key.