Skip to main content

Health check

curl http://localhost:8080/v1/health
Returns {"status": "healthy"} when the control plane is running.

System status

curl -H "Authorization: Bearer <token>" http://localhost:8080/v1/status
Returns system-wide statistics scoped to the user’s organization: app count, machine count, database count, and host status.

Prometheus metrics

The control plane exposes a Prometheus-compatible metrics endpoint:
curl http://localhost:8080/metrics
Metrics include HTTP request duration and count (by method, path, status), VM count, and host resource usage.

Scrape config

scrape_configs:
  - job_name: "zwrmd"
    static_configs:
      - targets: ["localhost:8080"]
    metrics_path: "/metrics"

Host status

Monitor host health across the cluster:
zwrm host list
StatusMeaning
onlineHealthy, receiving heartbeats
offlineNo heartbeat for 30+ seconds, machines marked failed
drainingMigrating VMs off the host
maintenanceDrain complete, host deactivated

Background services

These services run inside zwrmd and are monitored via logs:
ServiceIntervalDescription
Local host heartbeat10sUpdates last_heartbeat in database
Host health monitor15sMarks hosts offline after 30s, fails their machines
VM cleanup30sDetects orphaned Firecracker processes, schedules restarts
Restart manageron demandProcesses restart queue with exponential backoff
Cache cleanupperiodicEvicts stale build cache entries
Proxy health checker10sChecks backend health, rotates unhealthy VMs out

Logs

Control plane

sudo journalctl -u zwrmd -f
sudo journalctl -u zwrmd --since "1 hour ago"
The control plane logs every request as METHOD PATH STATUS DURATION BYTES.

Host agent

sudo journalctl -u zwrm-agent -f

VM console logs

# Via CLI
zwrm logs --app <name>

# Via API
curl -H "Authorization: Bearer <token>" \
  "http://localhost:8080/v1/machines/{id}/logs?lines=100"