Skip to content

[services.*]

A service is an always-on process that RunWisp keeps alive. It exits, it gets restarted. It crashes, it gets restarted. The only way a service stops permanently is by you stopping it from the Web UI (the Stop Service button) or the TUI (s on a service execution). That stop flag lives in memory only; restart the daemon and the service comes back up on its own.

The TOML key (api-worker in [services.api-worker]) is the service name. It shares one namespace with [tasks.*] — names must be unique across both kinds. Each instance’s run shows up in the same history view as task runs, with its own ULID, log file, and lifecycle.

[services.metrics-collector]
run = "/usr/local/bin/metrics-agent"

That’s a complete service — one instance, restarted forever with bounded exponential backoff.

KeyDefaultWhat it does
[services.*]requiredThe service name (the TOML table key). Used in CLI, API, and log paths.
runrequiredShell command. Multi-line OK with TOML triple-quotes.
description(empty)Human-readable description shown in the UI and TUI.
group"Services"UI grouping label.
api_triggertrueAllow manual trigger from CLI / API / UI. (Restart is the usual interaction for services.)
[services.api-worker]
instances = 3
run = "/usr/local/bin/worker"
KeyDefaultWhat it does
instances1Number of concurrent instances. Bounded 1 ≤ instances ≤ 64.

Each instance is its own visible run with its own instance_index (0, 1, 2, …). They share configuration, logs are unified per service, and instances are restarted independently when their process exits.

KeyDefaultWhat it does
restart_delay1sBase delay between restarts. Go duration string.
restart_backoff"exponential"Curve applied to restart_delay: constant, linear, or exponential (shared with task retry_backoff).
backoff_reset_afterinheritedInstance must stay up at least this long before its restart counter resets. See [defaults] for the inherited value (default "60s").

Backoff is bounded, so even after a long flap session the next restart doesn’t keep growing forever. An instance that stays up for backoff_reset_after resets its backoff counter, so transient flapping doesn’t permanently slow restarts on a service that eventually stabilises.

restart = "always" is implicit and cannot be overridden — that’s the contract. If you want “run once and exit,” use a task.

KeyDefaultWhat it does
on_overlap"skip"What happens when something tries to start a new run while one is going.

Services default to on_overlap = "skip" because the supervisor keeps the instance count steady and overlap is unusual. Manually triggering a service that’s already running gets cleanly rejected. Services don’t have max_concurrent — instance count is governed by instances, not in-flight overlap.

KeyDefaultWhat it does
graceful_stop"5s"SIGTERM grace period per instance before SIGKILL — for manual stop, Restart Service, daemon shutdown.

graceful_stop is process-group-wide: SIGTERM goes to the instance’s process group, every descendant gets the same window, and any survivors are SIGKILL’d together. If graceful_stop exceeds [daemon] shutdown_timeout the daemon emits a boot-time warning; during whole-daemon shutdown each instance is bounded by the daemon cap regardless of its own setting.

The log story is identical to tasks — same fields, same defaults. The [defaults] section is what keep_runs and keep_for inherit from when omitted here.

KeyDefault
log_max_size100MB
log_on_full"drop_old"
keep_runsinherits [defaults] keep_runs
keep_forinherits [defaults] keep_for

The same accept/reject rules apply as on tasks: positive numbers cap, omitting inherits, and bare 0 / negative values are rejected at config load.

A service’s run history can grow much faster than a task’s because each crash is a new run row. Set keep_runs defensively — 200 is a reasonable starting point for a flap-prone service.

See Logs & retention for the underlying behaviour.

KeyDefaultWhat it does
notify_on_failure(none)Notifier IDs to alert when an instance exits with failed / crashed.
notify_on_success(none)Notifier IDs to alert on run.succeeded (a clean instance shutdown).

Identical shape and semantics as on [tasks.*] — including the implicit addition of [notify] global_notifiers (default ["inapp"]). The shared reference lives at Per-task notifications; the [tasks.*] notifications section is the mirror entry. A failed instance in a [services.*] block notifies the same channels a failed [tasks.*] run would.

In practice: trap SIGTERM in your run command and exit cleanly. The example file’s pattern is a good starting point:

Terminal window
trap 'echo "SIGTERM — shutting down"; exit 0' TERM INT
while true; do
# do work
done

An instance that exits cleanly via SIGTERM records end_reason = stopped.

  • cron, catch_up — services aren’t cron-driven.
  • retry_attempts, retry_delay, retry_backoff — services restart instead of retry. Use restart_delay / restart_backoff.
  • max_concurrent, queue_max — instance count is instances; services don’t queue.
  • A name shared with a [tasks.*] entry.
  • Empty or missing run.
  • instances outside [1, 64].
[services.api-worker]
description = "Three always-on workers consuming the same job queue"
instances = 3
restart_delay = "2s"
restart_backoff = "exponential"
backoff_reset_after = "2m" # this one needs longer to call "stable"
graceful_stop = "20s" # leave time to finish the in-flight job
keep_runs = 500
notify_on_failure = ["slack-ops"]
run = """
trap 'echo "SIGTERM — draining and exiting"; exit 0' TERM INT
echo "[$(date -Iseconds)] worker starting up..."
while true; do
/usr/local/bin/consume-job
done
"""