Skip to content

Notifications model

When a task fails, you want to know. RunWisp has two layers for that:

  1. The bell in the Web UI and the alert line in the TUI. Always on, no config required.
  2. Outbound channels — off by default. Add one with a [[notifier]] block. See the Providers section for the channels that ship today.

All of this is baked into the binary. No plugins, no remote service to sign up for. Unplug the network on a laptop and a failed task still shows up in the bell.

Any task or service that ends in run.failed, run.timeout, or run.crashed — or misses a scheduled run while the daemon was down (run.missed) — writes a row in SQLite and pushes it straight to the Web UI bell and the TUI footer. That row sticks around across a daemon restart, so the badge count is still correct after a reboot.

That’s the default behaviour — you don’t write a line of TOML to get it.

It’s three steps, and the shape is the same for every provider:

  1. Declare the channel in a [[notifier]] block. One per channel.
  2. Wire it up in one of two ways:
    • Per-task notificationsnotify_on_failure directly on the task. Best when one task has its own destination.
    • Notification rules — a [[notification_route]] block. Best when one rule covers many tasks.
  3. Test it by triggering a task that fails on purpose.
[[notifier]]
id = "slack-ops"
type = "slack"
webhook_url = "${RUNWISP_SLACK_OPS_URL}"
[tasks.backup-postgres]
cron = "30 2 * * *"
run = "/usr/local/bin/backup.sh"
notify_on_failure = ["slack-ops"]

That’s a working setup right there. The same shape carries over to every provider — see the Providers section: Slack · Discord · Telegram · Email (SMTP) · Webhook.

Every notifier needs two things:

KeyTypeRequiredWhat it does
idstringyesName you use to refer to the channel from routes and per-task fields. Must be unique. "inapp" is reserved. Cannot contain : (reserved for inline target overrides).
typeenumyes"slack", "discord", "telegram", "smtp", or "webhook".

Everything beyond those two depends on the type — the provider page has the full list for each.

[[notifier]]
id = "slack-ops"
type = "slack"
webhook_url = "${RUNWISP_SLACK_OPS_URL}"
channel = "#ops-alerts" # optional
[[notifier]]
id = "tg-oncall"
type = "telegram"
bot_token = "${RUNWISP_TG_TOKEN}"
chat_id = "-1001234567890"

Every notifier needs a credential of some kind — a webhook URL, a bot token, that sort of thing. It’s one field (webhook_url, bot_token, password), and ${...} substitution decides where the value comes from:

  • Env var (the one to reach for): set webhook_url = "${RUNWISP_SLACK_URL}" and put the actual value in your shell or systemd unit. Works the same way everywhere.
  • File: set webhook_url = "${file:secrets/slack.url}". Relative paths resolve next to runwisp.toml. Handy when a secrets manager drops the value on disk for you — just chmod 600 the file.
  • Inline: drop the value straight into runwisp.toml. Convenient, sure, but TOML files have a way of ending up in git or pasted into chat. Keep inline values to local experiments.

Either way the value resolves once, at config load — an unset variable or a missing file stops the daemon at boot with an error naming the field, instead of failing silently at delivery time.

And if a delivery ever errors out, the secret is swapped for [redacted] before anything hits the bell or the daemon log. A 5xx from the provider won’t drag your webhook URL into a log file.

There are two ways to point events at a channel. Both are first-class TOML — neither is built on top of the other — and you can use them together.

  • Per-task notificationsnotify_on_failure = ["slack-ops"] right on a [tasks.*] or [services.*] block. Best when a task has its own destination and you want that setting to sit next to the task definition.
  • Notification rules — a [[notification_route]] block that matches many tasks by name pattern and any mix of event kinds. Best when one rule should blanket a whole set of tasks.

And they cooperate. A task with notify_on_failure = ["slack-ops"] that also gets matched by a rule routing to slack-ops still sends just one message — duplicate channels get collapsed.

[notify] global_notifiers is the one setting that decides which channels hear about every failure, no matter what an individual task says. Its built-in default is ["inapp"] — which is exactly why the bell just works with no TOML at all.

[notify]
global_notifiers = ["inapp"] # default — bell on
# global_notifiers = [] # silence the bell
# global_notifiers = ["slack-ops"] # send every failure to this channel
# global_notifiers = ["inapp", "slack-ops"] # both

Every id in this list gets added to whatever channels a task already names, duplicates removed — and it’s also the catch-all for tasks that don’t name any channel at all. So global_notifiers = ["inapp"] plus a task’s notify_on_failure = ["slack-ops"] means the failure goes to slack-ops and lands a row in the bell. It’s “and,” not “instead of.”

"inapp" is the one id that doesn’t need its own [[notifier]] block. Every other id you put in global_notifiers has to point at a notifier you’ve declared.

A task that fails every minute would fire off 60 messages an hour if nothing stopped it. So RunWisp groups repeated failures by dedup key — task name, event kind, and end reason taken together.

Within the coalescing window (default 1h), repeats that share a dedup key just update the existing bell row instead of spawning a new one. The row’s count ticks up, and the last few timestamps are kept in an occurrence ring (default size 10).

[notify]
coalesce_window = "30m"
occurrence_ring = 5

Outbound channels coalesce on that same key by default. The first failure in a window goes out right away. After that, the rest are held back until one of two things happens: either occurrence_ring events pile up (and the Nth goes out as a “check-in”), or the window runs out (and a single closing “summary” goes out). So in the channel you get the first failure, the occasional check-in, and one summary — not 60 separate pings.

[notify]
# coalesce_outbound = false # send one message per event

A */1 * * * * task that starts failing at T+0, with defaults (coalesce_window = 1h, occurrence_ring = 10, coalesce_outbound = true):

WhenFailure #Bell rowOutbound delivery
T+01new row, count=1first — sent immediately
T+1m2same row, count=2held
T+2m..+9m3 … 10same row, count keeps risingheld
T+10m11same row, count=11check-incoalesced_count=10
T+11m..+19m12 … 20same rowheld
T+20m21same row, count=21check-incoalesced_count=10
T+30mtask stops failing(nothing yet)
T+1h10mrow stays as-issummarycoalesced_summary=true, count = events held since last check-in
T+25h(new failure)new row in the next windowfirst again

The window resets every time something goes out. After a check-in or a summary, the next failure kicks off a fresh window — so the “first” cadence just repeats for as long as failures keep rolling in.

A handy rule of thumb for a task that’s failing repeatedly: you’ll get at most one outbound message every coalesce_window / occurrence_ring. With the defaults, that’s one message every six minutes. Want fewer? Raise occurrence_ring. Want the recovery summary sooner? Lower coalesce_window.

If every single failure genuinely matters on its own — a CI build agent, say — set coalesce_outbound = false. The bell keeps coalescing regardless (otherwise its row count would just grow forever); only the outbound deliveries change.

If the channel hands back a 5xx, or the network’s down, the notifier retries with exponential backoff (1s base, 60s cap, a 5-minute total budget) and honours Retry-After on a 429.

When the retries are exhausted, the daemon raises a notify.delivery_failed event carrying the original event’s metadata. That one goes only to the bell — never back out through the outbound router. Retrying the same dead channel just to announce that the channel is dead would only dig the hole deeper. What you see is a yellow warning in the bell naming the original task and kind.

notify.delivery_failed events bypass the notification route engine entirely — they are delivered directly to the in-app channel. A [[notification_route]] rule with match.kind = ["notify.delivery_failed"] will never fire, no matter what notify channel you point it at. If you need a backup path, add a second notifier ID to the original notify_on_failure list instead — that way both channels get the same event and if one goes down the other still sees it.

  • Secrets stay local. They live in env vars, files next to runwisp.toml, or inline TOML, and the only place they ever travel is the HTTPS request body going to the channel endpoint.
  • Secrets are never logged. Webhook URLs and bot tokens get redacted from any error message before it reaches the daemon log or the bell.
  • Secrets are never handed to the optional control-plane integration.