[services.*]

A service is an always-on process that RunWisp keeps alive. It exits, it gets restarted. It crashes, it gets restarted. The only way a service stops permanently is by you stopping it from the Web UI (the Stop Service button) or the TUI (s on a service execution). That stop flag lives in memory only; restart the daemon and the service comes back up on its own.

The TOML key (api-worker in [services.api-worker]) is the service name. It shares one namespace with [tasks.*] — names must be unique across both kinds. Each instance’s run shows up in the same history view as task runs, with its own ULID, log file, and lifecycle.

Minimum example

[services.metrics-collector]
run = "/usr/local/bin/metrics-agent"

That’s a complete service — one instance, restarted forever with bounded exponential backoff.

Identity & metadata

Key	Default	What it does
`[services.*]`	required	The service name (the TOML table key). Used in CLI, API, and log paths.
`run`	required	Shell command. Multi-line OK with TOML triple-quotes.
`description`	(empty)	Human-readable description shown in the UI and TUI.
`group`	`"Services"`	UI grouping label.
`api_trigger`	`true`	Allow manual trigger from CLI / API / UI. (Restart is the usual interaction for services.)

Instances

[services.api-worker]
instances = 3
run       = "/usr/local/bin/worker"

Key	Default	What it does
`instances`	`1`	Number of concurrent instances. Bounded `1 ≤ instances ≤ 64`.

Each instance is its own visible run with its own instance_index (0, 1, 2, …). They share configuration, logs are unified per service, and instances are restarted independently when their process exits.

Restart behaviour

Key	Default	What it does
`restart_delay`	`1s`	Base delay between restarts. Go duration string.
`restart_backoff`	`"exponential"`	Curve applied to `restart_delay`: `constant`, `linear`, or `exponential` (shared with task `retry_backoff`).
`backoff_reset_after`	inherited	Instance must stay up at least this long before its restart counter resets. See `[defaults]` for the inherited value (default `"60s"`).

Backoff is bounded, so even after a long flap session the next restart doesn’t keep growing forever. An instance that stays up for backoff_reset_after resets its backoff counter, so transient flapping doesn’t permanently slow restarts on a service that eventually stabilises.

restart = "always" is implicit and cannot be overridden — that’s the contract. If you want “run once and exit,” use a task.

Concurrency

Key	Default	What it does
`on_overlap`	`"skip"`	What happens when something tries to start a new run while one is going.

Services default to on_overlap = "skip" because the supervisor keeps the instance count steady and overlap is unusual. Manually triggering a service that’s already running gets cleanly rejected. Services don’t have max_concurrent — instance count is governed by instances, not in-flight overlap.

Graceful shutdown

Key	Default	What it does
`graceful_stop`	`"5s"`	SIGTERM grace period per instance before SIGKILL — for manual stop, `Restart Service`, daemon shutdown.

graceful_stop is process-group-wide: SIGTERM goes to the instance’s process group, every descendant gets the same window, and any survivors are SIGKILL’d together. If graceful_stop exceeds [daemon] shutdown_timeout the daemon emits a boot-time warning; during whole-daemon shutdown each instance is bounded by the daemon cap regardless of its own setting.

Logs & retention

The log story is identical to tasks — same fields, same defaults. The [defaults] section is what keep_runs and keep_for inherit from when omitted here.

Key	Default
`log_max_size`	`100MB`
`log_on_full`	`"drop_old"`
`keep_runs`	inherits `[defaults] keep_runs`
`keep_for`	inherits `[defaults] keep_for`

The same accept/reject rules apply as on tasks: positive numbers cap, omitting inherits, and bare 0 / negative values are rejected at config load.

A service’s run history can grow much faster than a task’s because each crash is a new run row. Set keep_runs defensively — 200 is a reasonable starting point for a flap-prone service.

See Logs & retention for the underlying behaviour.

Notifications

Key	Default	What it does
`notify_on_failure`	(none)	Notifier IDs to alert when an instance exits with `failed` / `crashed`.
`notify_on_success`	(none)	Notifier IDs to alert on `run.succeeded` (a clean instance shutdown).

Identical shape and semantics as on [tasks.*] — including the implicit addition of [notify] global_notifiers (default ["inapp"]). The shared reference lives at Per-task notifications; the [tasks.*] notifications section is the mirror entry. A failed instance in a [services.*] block notifies the same channels a failed [tasks.*] run would.

Cooperating with `graceful_stop`

In practice: trap SIGTERM in your run command and exit cleanly. The example file’s pattern is a good starting point:

trap 'echo "SIGTERM — shutting down"; exit 0' TERM INT
while true; do
  # do work
done

An instance that exits cleanly via SIGTERM records end_reason = stopped.

What’s rejected on services

cron, catch_up — services aren’t cron-driven.
retry_attempts, retry_delay, retry_backoff — services restart instead of retry. Use restart_delay / restart_backoff.
max_concurrent, queue_max — instance count is instances; services don’t queue.
A name shared with a [tasks.*] entry.
Empty or missing run.
instances outside [1, 64].

Worked example: 3 queue workers

[services.api-worker]
description         = "Three always-on workers consuming the same job queue"
instances           = 3
restart_delay       = "2s"
restart_backoff     = "exponential"
backoff_reset_after = "2m"     # this one needs longer to call "stable"
graceful_stop       = "20s"    # leave time to finish the in-flight job
keep_runs           = 500
notify_on_failure   = ["slack-ops"]
run = """
trap 'echo "SIGTERM — draining and exiting"; exit 0' TERM INT
echo "[$(date -Iseconds)] worker starting up..."
while true; do
  /usr/local/bin/consume-job
done
"""

Where to next

[tasks.*] reference — the run-and-exit counterpart.
Tasks vs Services — picking the right kind.
Retries & timeouts — why retry and restart are different things.