Notifications model
When a task fails, you want to know. RunWisp has two layers for that:
- The bell in the Web UI and the alert line in the TUI. Always on, no config required.
- Outbound channels — off by default. Add one with a
[[notifier]]block. See the Providers section for the channels that ship today.
All of this is baked into the binary. No plugins, no remote service to sign up for. Unplug the network on a laptop and a failed task still shows up in the bell.
What you get with zero config
Section titled “What you get with zero config”Any task or service that ends in run.failed, run.timeout, or
run.crashed — or misses a scheduled run
while the daemon was down (run.missed) — writes a row in SQLite and
pushes it straight to the Web UI bell and the TUI footer. That row
sticks around across a daemon restart, so the badge count is still
correct after a reboot.
That’s the default behaviour — you don’t write a line of TOML to get it.
Adding an outbound channel
Section titled “Adding an outbound channel”It’s three steps, and the shape is the same for every provider:
- Declare the channel in a
[[notifier]]block. One per channel. - Wire it up in one of two ways:
- Per-task notifications —
notify_on_failuredirectly on the task. Best when one task has its own destination. - Notification rules — a
[[notification_route]]block. Best when one rule covers many tasks.
- Per-task notifications —
- Test it by triggering a task that fails on purpose.
[[notifier]]id = "slack-ops"type = "slack"webhook_url = "${RUNWISP_SLACK_OPS_URL}"
[tasks.backup-postgres]cron = "30 2 * * *"run = "/usr/local/bin/backup.sh"notify_on_failure = ["slack-ops"]That’s a working setup right there. The same shape carries over to every provider — see the Providers section: Slack · Discord · Telegram · Email (SMTP) · Webhook.
[[notifier]] — declaring a channel
Section titled “[[notifier]] — declaring a channel”Every notifier needs two things:
| Key | Type | Required | What it does |
|---|---|---|---|
id | string | yes | Name you use to refer to the channel from routes and per-task fields. Must be unique. "inapp" is reserved. Cannot contain : (reserved for inline target overrides). |
type | enum | yes | "slack", "discord", "telegram", "smtp", or "webhook". |
Everything beyond those two depends on the type — the provider page
has the full list for each.
[[notifier]]id = "slack-ops"type = "slack"webhook_url = "${RUNWISP_SLACK_OPS_URL}"channel = "#ops-alerts" # optional
[[notifier]]id = "tg-oncall"type = "telegram"bot_token = "${RUNWISP_TG_TOKEN}"chat_id = "-1001234567890"Storing the secret
Section titled “Storing the secret”Every notifier needs a credential of some kind — a webhook URL, a bot
token, that sort of thing. It’s one field (webhook_url, bot_token,
password), and ${...} substitution
decides where the value comes from:
- Env var (the one to reach for): set
webhook_url = "${RUNWISP_SLACK_URL}"and put the actual value in your shell or systemd unit. Works the same way everywhere. - File: set
webhook_url = "${file:secrets/slack.url}". Relative paths resolve next torunwisp.toml. Handy when a secrets manager drops the value on disk for you — justchmod 600the file. - Inline: drop the value straight into
runwisp.toml. Convenient, sure, but TOML files have a way of ending up in git or pasted into chat. Keep inline values to local experiments.
Either way the value resolves once, at config load — an unset variable or a missing file stops the daemon at boot with an error naming the field, instead of failing silently at delivery time.
And if a delivery ever errors out, the secret is swapped for
[redacted] before anything hits the bell or the daemon log. A 5xx from
the provider won’t drag your webhook URL into a log file.
Routing events to channels
Section titled “Routing events to channels”There are two ways to point events at a channel. Both are first-class TOML — neither is built on top of the other — and you can use them together.
- Per-task notifications —
notify_on_failure = ["slack-ops"]right on a[tasks.*]or[services.*]block. Best when a task has its own destination and you want that setting to sit next to the task definition. - Notification rules — a
[[notification_route]]block that matches many tasks by name pattern and any mix of event kinds. Best when one rule should blanket a whole set of tasks.
And they cooperate. A task with notify_on_failure = ["slack-ops"] that
also gets matched by a rule routing to slack-ops still sends just one
message — duplicate channels get collapsed.
global_notifiers — the always-on list
Section titled “global_notifiers — the always-on list”[notify] global_notifiers is the one setting that decides which
channels hear about every failure, no matter what an individual task
says. Its built-in default is ["inapp"] — which is exactly why the
bell just works with no TOML at all.
[notify]global_notifiers = ["inapp"] # default — bell on# global_notifiers = [] # silence the bell# global_notifiers = ["slack-ops"] # send every failure to this channel# global_notifiers = ["inapp", "slack-ops"] # bothEvery id in this list gets added to whatever channels a task already
names, duplicates removed — and it’s also the catch-all for tasks that
don’t name any channel at all. So global_notifiers = ["inapp"] plus a
task’s notify_on_failure = ["slack-ops"] means the failure goes to
slack-ops and lands a row in the bell. It’s “and,” not “instead
of.”
"inapp" is the one id that doesn’t need its own [[notifier]] block.
Every other id you put in global_notifiers has to point at a notifier
you’ve declared.
Coalescing
Section titled “Coalescing”A task that fails every minute would fire off 60 messages an hour if nothing stopped it. So RunWisp groups repeated failures by dedup key — task name, event kind, and end reason taken together.
Within the coalescing window (default 1h), repeats that share a
dedup key just update the existing bell row instead of spawning a new
one. The row’s count ticks up, and the last few timestamps are kept in
an occurrence ring (default size 10).
[notify]coalesce_window = "30m"occurrence_ring = 5Outbound channels coalesce on that same key by default. The first
failure in a window goes out right away. After that, the rest are held
back until one of two things happens: either occurrence_ring events
pile up (and the Nth goes out as a “check-in”), or the window runs out
(and a single closing “summary” goes out). So in the channel you get the
first failure, the occasional check-in, and one summary — not 60
separate pings.
[notify]# coalesce_outbound = false # send one message per eventWorked timeline
Section titled “Worked timeline”A */1 * * * * task that starts failing at T+0, with defaults
(coalesce_window = 1h, occurrence_ring = 10,
coalesce_outbound = true):
| When | Failure # | Bell row | Outbound delivery |
|---|---|---|---|
T+0 | 1 | new row, count=1 | first — sent immediately |
T+1m | 2 | same row, count=2 | held |
T+2m..+9m | 3 … 10 | same row, count keeps rising | held |
T+10m | 11 | same row, count=11 | check-in — coalesced_count=10 |
T+11m..+19m | 12 … 20 | same row | held |
T+20m | 21 | same row, count=21 | check-in — coalesced_count=10 |
T+30m | task stops failing | (nothing yet) | |
T+1h10m | — | row stays as-is | summary — coalesced_summary=true, count = events held since last check-in |
T+25h | (new failure) | new row in the next window | first again |
The window resets every time something goes out. After a check-in or a summary, the next failure kicks off a fresh window — so the “first” cadence just repeats for as long as failures keep rolling in.
A handy rule of thumb for a task that’s failing repeatedly: you’ll get
at most one outbound message every coalesce_window / occurrence_ring.
With the defaults, that’s one message every six minutes. Want fewer?
Raise occurrence_ring. Want the recovery summary sooner? Lower
coalesce_window.
If every single failure genuinely matters on its own — a CI build agent,
say — set coalesce_outbound = false. The bell keeps coalescing
regardless (otherwise its row count would just grow forever); only the
outbound deliveries change.
Delivery failures
Section titled “Delivery failures”If the channel hands back a 5xx, or the network’s down, the notifier
retries with exponential backoff (1s base, 60s cap, a 5-minute total
budget) and honours Retry-After on a 429.
When the retries are exhausted, the daemon raises a
notify.delivery_failed event carrying the original event’s metadata.
That one goes only to the bell — never back out through the outbound
router. Retrying the same dead channel just to announce that the channel
is dead would only dig the hole deeper. What you see is a yellow warning
in the bell naming the original task and kind.
notify.delivery_failed events bypass the notification route engine
entirely — they are delivered directly to the in-app channel. A
[[notification_route]] rule with match.kind = ["notify.delivery_failed"]
will never fire, no matter what notify channel you point it at. If you
need a backup path, add a second notifier ID to the original
notify_on_failure list instead — that way both channels get the same
event and if one goes down the other still sees it.
Trust model
Section titled “Trust model”- Secrets stay local. They live in env vars, files next to
runwisp.toml, or inline TOML, and the only place they ever travel is the HTTPS request body going to the channel endpoint. - Secrets are never logged. Webhook URLs and bot tokens get redacted from any error message before it reaches the daemon log or the bell.
- Secrets are never handed to the optional control-plane integration.