Skip to content

Notifications model

When a task fails, you want to know. RunWisp has two layers for that:

  1. The bell in the Web UI and the alert line in the TUI. Always on, no config required.
  2. Outbound channels — off by default. Add one with a [[notifier]] block. See the Providers section for the channels that ship today.

Everything below ships in the binary. No plugins, no remote service to sign up for. A laptop with the network unplugged still shows a row in the bell when a task fails.

Every task and service that ends with run.failed, run.timeout, or run.crashed writes one row in SQLite and streams it to the Web UI bell and the TUI footer. The row survives a daemon restart, so the badge count is still right after a reboot.

That’s the default. You don’t write any TOML to get it.

Three steps, the same shape for every provider:

  1. Declare the channel in a [[notifier]] block. One per channel.
  2. Wire it up in one of two ways:
    • Per-task notificationsnotify_on_failure directly on the task. Best when one task has its own destination.
    • Notification rules — a [[notification_route]] block. Best when one rule covers many tasks.
  3. Test it by triggering a task that fails on purpose.
[[notifier]]
id = "slack-ops"
type = "slack"
webhook_url_env = "RUNWISP_SLACK_OPS_URL"
[tasks.backup-postgres]
cron = "30 2 * * *"
run = "/usr/local/bin/backup.sh"
notify_on_failure = ["slack-ops"]

That is a working setup. The same shape works for every provider — see the Providers section: Slack · Telegram.

Every notifier needs:

KeyTypeRequiredWhat it does
idstringyesName you use to refer to the channel from routes and per-task fields. Must be unique. "inapp" is reserved. Cannot contain : (reserved for inline target overrides).
typeenumyes"slack" or "telegram" for now. More drivers will land later.

The rest of the fields depend on the type — see the provider page for the full list.

[[notifier]]
id = "slack-ops"
type = "slack"
webhook_url_env = "RUNWISP_SLACK_OPS_URL"
channel = "#ops-alerts" # optional
[[notifier]]
id = "tg-oncall"
type = "telegram"
bot_token_env = "RUNWISP_TG_TOKEN"
chat_id = "-1001234567890"

Each notifier needs a credential — for example a webhook URL or a bot token. You can supply it three ways:

  • Env var (recommended): set webhook_url_env = "RUNWISP_SLACK_URL" and put the value in your shell or systemd unit. Works the same everywhere.
  • File: set webhook_url_file = "secrets/slack.url". Relative paths resolve under the data directory. Useful when a secrets manager writes the value to disk for you. Make the file chmod 600.
  • Inline: put the value straight in runwisp.toml. Convenient, but TOML files are often committed to git or shared in chat. Use inline values only for local experiments.

Set exactly one of the three per secret. Setting two of them is a config-load error — the loader does not pick a winner; it stops the daemon.

In any delivery error reported in the bell or the daemon log, the secret is replaced with [redacted] before the message is written. A 5xx response from the provider will not leak your webhook URL into a log file.

There are two ways to point events at a channel. Both are first-class TOML; neither is a derivative of the other. They can be used together.

  • Per-task notificationsnotify_on_failure = ["slack-ops"] on one [tasks.*] or [services.*] block. Best when one task has its own destination, and you want the setting to live next to the task definition.
  • Notification rules — a [[notification_route]] block matches many tasks by name pattern and matches any combination of event kinds. Best when one rule should cover many tasks.

The two work together. A task with notify_on_failure = ["slack-ops"] that is also matched by a rule ending in slack-ops still sends one message — duplicate channels are removed.

[notify] global_notifiers is the single setting that controls which channels receive every failure regardless of what is on the task. The built-in default is ["inapp"], which is why the bell works with zero TOML.

[notify]
global_notifiers = ["inapp"] # default — bell on
# global_notifiers = [] # silence the bell
# global_notifiers = ["slack-ops"] # send every failure to this channel
# global_notifiers = ["inapp", "slack-ops"] # both

Every id in this list is added to the channels named on each task, with duplicates removed. It also acts as the catch-all for tasks that do not name any channel. So global_notifiers = ["inapp"] plus notify_on_failure = ["slack-ops"] sends to slack-ops and adds a row to the bell — not slack-ops instead of the bell.

"inapp" is the only id that does not need a [[notifier]] block. Every other id in global_notifiers must point at a declared notifier.

A task that fails every minute could send 60 messages an hour without help. RunWisp groups repeated failures by dedup key — the combination of task name, event kind, and end reason.

Inside the coalescing window (default 1h), repeats with the same dedup key update the same bell row instead of writing a new one. The row’s count goes up, and the last N timestamps are kept in an occurrence ring (default size 10).

[notify]
coalesce_window = "30m"
occurrence_ring = 5

Outbound channels coalesce on the same key by default: the first failure in a window is sent immediately. The next ones are held back until either occurrence_ring events accumulate (the Nth is sent as a “check-in”) or the window expires (one closing “summary” is sent). In the channel you see the first failure, periodic check-ins, and one summary — not 60 separate messages.

[notify]
# coalesce_outbound = false # send one message per event

A */1 * * * * task that starts failing at T+0, with defaults (coalesce_window = 1h, occurrence_ring = 10, coalesce_outbound = true):

WhenFailure #Bell rowOutbound delivery
T+01new row, count=1first — sent immediately
T+1m2same row, count=2held
T+2m..+9m3 … 10same row, count keeps risingheld
T+10m11same row, count=11check-incoalesced_count=10
T+11m..+19m12 … 20same rowheld
T+20m21same row, count=21check-incoalesced_count=10
T+30mtask stops failing(nothing yet)
T+1h10mrow stays as-issummarycoalesced_summary=true, count = events held since last check-in
T+25h(new failure)new row in the next windowfirst again

The window resets each time something is sent. After a check-in or summary, the next failure starts a fresh window — the “first” cadence repeats as long as failures keep arriving.

Rule of thumb for a task that fails repeatedly: at most one outbound message every coalesce_window / occurrence_ring. Defaults give one message every six minutes. Raise occurrence_ring for fewer messages, lower coalesce_window for a faster recovery summary.

If every failure is independently meaningful — for example a CI build agent — set coalesce_outbound = false. The bell still coalesces (the row count would otherwise grow without bound); only outbound deliveries change.

If the channel returns a 5xx or the network is down, the notifier retries with exponential backoff (1s base, 60s cap, 5-minute total budget) and respects Retry-After on 429 responses.

When retries run out, the daemon creates a notify.delivery_failed event carrying the original event’s metadata. That event is sent only to the bell — never back through the outbound router. Retrying the same channel to tell you that channel is down would just make things worse. You see a yellow warning in the bell with the original task and kind.

You can still route notify.delivery_failed through a different channel — for example, “if one channel is down, send a message to the on-call channel on another”:

[[notification_route]]
match = { kind = ["notify.delivery_failed"] }
notify = ["tg-oncall"]
  • Secrets stay local. They live in env vars, files under the data dir, or inline TOML — and they never travel anywhere except the HTTPS request body to the channel endpoint.
  • Secrets are never logged. Webhook URLs and bot tokens are redacted from any error message before it reaches the daemon log or the bell.
  • Secrets are never sent to the optional control-plane integration.