VIEWPORT
Self-host

Monitor and operate

/state, /metrics, /logs. Health endpoints, Prometheus, alert recipes. The relay you actually run.

The relay has three admin endpoints, all gated by a bearer token. Enable them once:

docker run -d \
  -e RELAY_ENABLE_ADMIN_HTTP=1 \
  -e RELAY_ADMIN_TOKEN="$(openssl rand -hex 32)" \

Store the token in your secret manager. It grants full visibility into who's connected and recent activity.

/health

No auth. Lightweight liveness check:

curl https://relay.your-co.com/health
{
  "ok": true,
  "service": "relay",
  "uptimeMs": 123456,
  "tlsEnabled": true,
  "relayMode": "single",
  "relayId": "relay-a"
}

Use for your load balancer's health check. If ok is false, the container is broken. Restart.

/state

Requires Authorization: Bearer $RELAY_ADMIN_TOKEN. Returns the in-memory connection registry:

{
  "uptimeMs": 6789012,
  "totalConnections": 248,
  "workspaces": [
    {
      "workspaceId": "01J…",
      "daemons": [
        { "installId": "ins_…", "connectedAt": "…", "framesIn": 1240, "framesOut": 8920 }
      ],
      "clients": 3
    }
  ]
}

Useful for:

  • Confirming a specific daemon is connected.
  • Counting active workspaces.
  • Spotting load distribution before scaling.

/logs

Returns the recent in-memory log buffer (last ~1000 entries by default):

curl -H "Authorization: Bearer $TOKEN" https://relay.your-co.com/logs
curl -H "Authorization: Bearer $TOKEN" https://relay.your-co.com/logs?summary=1

summary=1 returns counts by log level instead of the entries themselves. Useful for "is there any error noise."

For persistent logs, capture stdout / stderr of the container. The relay writes structured JSON logs to stdout.

/metrics

Prometheus exposition. Set up scraping in your Prometheus config:

scrape_configs:
  - job_name: viewport-relay
    bearer_token: "${RELAY_ADMIN_TOKEN}"
    static_configs:
      - targets: ['relay.your-co.com:7781']

Key metrics:

MetricTypeMeaning
relay_connections_totalgaugeCurrently connected daemons + clients.
relay_handshakes_totalcounterSuccessful handshakes.
relay_handshake_failures_total{reason}counterAuth failures, malformed frames, etc.
relay_frames_in_total{workspace_id}counterFrames from daemon.
relay_frames_out_total{workspace_id}counterFrames to clients.
relay_frame_size_byteshistogramFrame size distribution.
relay_jwt_validate_duration_secondshistogramLatency of the platform's JWT validate call.
relay_backplane_publish_secondshistogramLatency to publish onto the backplane (server / redis mode).

Alert recipes

Alert: relay unreachable.

- alert: ViewportRelayDown
  expr: up{job="viewport-relay"} == 0
  for: 2m
  labels: { severity: critical }
  annotations:
    summary: "Viewport relay {{ $labels.instance }} is down"

Alert: high handshake failure rate.

- alert: ViewportRelayHandshakeFailing
  expr: rate(relay_handshake_failures_total[5m]) > 0.5
  for: 5m

0.5 failures/sec sustained for 5 minutes usually means platform JWT validate is slow or your JWKS URL is wrong.

Alert: JWT validate slow.

- alert: ViewportRelayJwtSlow
  expr: histogram_quantile(0.95, rate(relay_jwt_validate_duration_seconds_bucket[5m])) > 1
  for: 10m

If p95 JWT validate latency goes over 1s sustained, the platform-side validate endpoint is degraded. Check on the platform first.

Alert: backplane lag (server / redis modes).

- alert: ViewportRelayBackplaneSlow
  expr: histogram_quantile(0.95, rate(relay_backplane_publish_seconds_bucket[5m])) > 0.2
  for: 10m

Restart, redeploy, upgrade

The relay is stateless. Restart any time. Connected daemons and clients see a clean disconnect and reconnect within a few seconds (configurable backoff on the daemon side).

Upgrading:

  1. Pull the new image: docker pull ghcr.io/viewportai/relay:latest.
  2. Restart your container.
  3. Daemons reconnect automatically.
  4. Watch relay_handshakes_total to confirm activity returns.

Rolling deploys (multiple replicas): just kubectl rollout restart or your equivalent.

Where to go next

On this page