[services.*]
A service is an always-on process that RunWisp keeps alive. It exits, it
gets restarted. It crashes, it gets restarted. The only way a service
stops permanently is by you stopping it from the Web UI (the Stop
Service button) or the TUI (s on a service execution). That stop
flag lives in memory only; restart the daemon and the service comes
back up on its own.
The TOML key (api-worker in [services.api-worker]) is the service
name. It shares one namespace with [tasks.*] — names must be unique
across both kinds. Each instance’s run shows up in the same history view
as task runs, with its own ULID, log file, and lifecycle.
Minimum example
Section titled “Minimum example”[services.metrics-collector]run = "/usr/local/bin/metrics-agent"That’s a complete service — one instance, restarted forever with bounded exponential backoff.
Identity & metadata
Section titled “Identity & metadata”| Key | Default | What it does |
|---|---|---|
[services.*] | required | The service name (the TOML table key). Used in CLI, API, and log paths. |
run | required | Shell command. Multi-line OK with TOML triple-quotes. |
description | (empty) | Human-readable description shown in the UI and TUI. |
group | "Services" | UI grouping label. |
api_trigger | true | Allow manual trigger from CLI / API / UI. (Restart is the usual interaction for services.) |
Instances
Section titled “Instances”[services.api-worker]instances = 3run = "/usr/local/bin/worker"| Key | Default | What it does |
|---|---|---|
instances | 1 | Number of concurrent instances. Bounded 1 ≤ instances ≤ 64. |
Each instance is its own visible run with its own instance_index
(0, 1, 2, …). They share configuration, logs are unified per
service, and instances are restarted independently when their process
exits.
Restart behaviour
Section titled “Restart behaviour”| Key | Default | What it does |
|---|---|---|
restart_delay | 1s | Base delay between restarts. Go duration string. |
restart_backoff | "exponential" | Curve applied to restart_delay: constant, linear, or exponential (shared with task retry_backoff). |
backoff_reset_after | inherited | Instance must stay up at least this long before its restart counter resets. See [defaults] for the inherited value (default "60s"). |
Backoff is bounded, so even after a long flap session the next
restart doesn’t keep growing forever. An instance that stays up for
backoff_reset_after resets its backoff counter, so transient
flapping doesn’t permanently slow restarts on a service that
eventually stabilises.
restart = "always" is implicit and cannot be overridden — that’s
the contract. If you want “run once and exit,” use a task.
Concurrency
Section titled “Concurrency”| Key | Default | What it does |
|---|---|---|
on_overlap | "skip" | What happens when something tries to start a new run while one is going. |
Services default to on_overlap = "skip" because the supervisor keeps
the instance count steady and overlap is unusual. Manually triggering a
service that’s already running gets cleanly rejected. Services don’t
have max_concurrent — instance count is governed by instances, not
in-flight overlap.
Graceful shutdown
Section titled “Graceful shutdown”| Key | Default | What it does |
|---|---|---|
graceful_stop | "5s" | SIGTERM grace period per instance before SIGKILL — for manual stop, Restart Service, daemon shutdown. |
graceful_stop is process-group-wide: SIGTERM goes to the instance’s
process group, every descendant gets the same window, and any
survivors are SIGKILL’d together. If graceful_stop exceeds
[daemon] shutdown_timeout
the daemon emits a boot-time warning; during whole-daemon shutdown
each instance is bounded by the daemon cap regardless of its own
setting.
Logs & retention
Section titled “Logs & retention”The log story is identical to tasks — same fields, same defaults. The
[defaults] section is what keep_runs
and keep_for inherit from when omitted here.
| Key | Default |
|---|---|
log_max_size | 100MB |
log_on_full | "drop_old" |
keep_runs | inherits [defaults] keep_runs |
keep_for | inherits [defaults] keep_for |
The same accept/reject rules apply as on tasks: positive numbers cap,
omitting inherits, and bare 0 / negative values are rejected at
config load.
A service’s run history can grow much faster than a task’s because each
crash is a new run row. Set keep_runs defensively — 200 is a
reasonable starting point for a flap-prone service.
See Logs & retention for the underlying behaviour.
Notifications
Section titled “Notifications”| Key | Default | What it does |
|---|---|---|
notify_on_failure | (none) | Notifier IDs to alert when an instance exits with failed / crashed. |
notify_on_success | (none) | Notifier IDs to alert on run.succeeded (a clean instance shutdown). |
Identical shape and semantics as on [tasks.*] — including the
implicit addition of [notify] global_notifiers (default ["inapp"]).
The shared reference lives at
Per-task notifications; the
[tasks.*] notifications section
is the mirror entry. A failed instance in a [services.*] block notifies
the same channels a failed [tasks.*] run would.
Cooperating with graceful_stop
Section titled “Cooperating with graceful_stop”In practice: trap SIGTERM in your run command and exit cleanly.
The example file’s pattern is a good starting point:
trap 'echo "SIGTERM — shutting down"; exit 0' TERM INTwhile true; do # do workdoneAn instance that exits cleanly via SIGTERM records end_reason = stopped.
What’s rejected on services
Section titled “What’s rejected on services”cron,catch_up— services aren’t cron-driven.retry_attempts,retry_delay,retry_backoff— services restart instead of retry. Userestart_delay/restart_backoff.max_concurrent,queue_max— instance count isinstances; services don’t queue.- A name shared with a
[tasks.*]entry. - Empty or missing
run. instancesoutside[1, 64].
Worked example: 3 queue workers
Section titled “Worked example: 3 queue workers”[services.api-worker]description = "Three always-on workers consuming the same job queue"instances = 3restart_delay = "2s"restart_backoff = "exponential"backoff_reset_after = "2m" # this one needs longer to call "stable"graceful_stop = "20s" # leave time to finish the in-flight jobkeep_runs = 500notify_on_failure = ["slack-ops"]run = """trap 'echo "SIGTERM — draining and exiting"; exit 0' TERM INTecho "[$(date -Iseconds)] worker starting up..."while true; do /usr/local/bin/consume-jobdone"""Where to next
Section titled “Where to next”[tasks.*]reference — the run-and-exit counterpart.- Tasks vs Services — picking the right kind.
- Retries & timeouts — why retry and restart are different things.