Retries & timeouts
A task that exits non-zero can be retried automatically. A run that takes
too long can be killed. Both are configured per-task, with sensible
defaults from [defaults] if you leave them off.
Retries apply only to tasks. Services don’t retry — they restart instead.
retry_attempts
Section titled “retry_attempts”[tasks.publish-feed]cron = "*/15 * * * *"run = "/usr/local/bin/publish.sh"retry_attempts = 3retry_delay = "30s"retry_backoff = "exponential"- Default:
0(no retries). - Semantics: the number of additional attempts after the first failure.
- Triggers a retry: the run ended with
failed(non-zero exit),timeout,crashed, orlog_overflow(cancelled bylog_on_full = "kill_task"after exceedinglog_max_size). - Does not trigger a retry:
success,stopped(manual stop via the API/CLI/UI, or a sibling run cancelled it viaon_overlap = "terminate"), orskipped(on_overlap = "skip"rejected the firing because another run was still going). Stopped is a deliberate human action; skipped means the original run is still running and another attempt would just race it.
retry_delay and retry_backoff
Section titled “retry_delay and retry_backoff”retry_delay is a duration string ("5s", "2m", "1h"). When
retries are enabled, retry_delay defaults to "5s".
retry_backoff chooses how the wait between retries grows. The same
values are used by services’ restart_backoff, so they move between
the two contexts cleanly.
| Value | Wait before attempt N (1-indexed retries) | With retry_delay = "10s" |
|---|---|---|
constant (or unset) | constant delay | 10s, 10s, 10s … |
linear | delay × N | 10s, 20s, 30s, 40s … |
exponential | delay × 2^(N-1) | 10s, 20s, 40s, 80s, 160s, 300s … |
All schedules are capped at 5 minutes.
exponential with a short base hits the cap and stays there.
What retries look like in history
Section titled “What retries look like in history”Each attempt is a separate run, with its own ID, exit code, and
captured log file. The Web UI lists every attempt under the task’s run
history, numbered by retry_attempt (0 for the first try, 1 for
the first retry, and so on). That’s deliberate: if attempt 1 silently
corrupted state and attempt 2 succeeded, you can still go back and
read attempt 1’s stderr.
timeout
Section titled “timeout”[tasks.heavy-job]cron = "0 3 * * *"run = "/usr/local/bin/heavy-job.sh"timeout = "30m"- Same duration syntax as
retry_delay("30s","5m","1h"). - Default: inherited from
[defaults] timeoutif set; otherwise no timeout — the run is allowed to take as long as it likes. - Scope: per attempt. A retry gets a fresh
timeoutwindow; time spent waiting inretry_delaydoesn’t count against it.
When the deadline hits, the daemon SIGTERMs the run’s process group,
waits up to the task’s graceful_stop (default "5s"), then SIGKILLs
any survivors and records the run with end reason timeout. The same
SIGTERM-then-wait flow applies to on_overlap = "terminate", manual
stops, and daemon shutdown — graceful_stop is the single knob.
Interactions
Section titled “Interactions”on_overlap = "terminate"plus retries. If a new firing terminates the running attempt, that attempt records end reasonstopped— which blocks any further retries. The new run from the terminate policy is a fresh execution, not a retry.- Manual stop. Same story: stopping a run from the API/UI records
stoppedand ends the retry chain. max_concurrent > 1. Retries don’t count againstmax_concurrent. A retry only fires after its predecessor has finished, so there’s no overlap to evaluate.
Services don’t retry
Section titled “Services don’t retry”Services have a different model because they’re meant to stay up:
Tasks and services share the same backoff vocabulary —
constant / linear / exponential — so one rule is easier to
remember. Tasks add retry_attempts (services run forever); services
add restart_delay (the supervisor owns the cadence).
| Field | Tasks | Services |
|---|---|---|
retry_attempts | ✅ default 0 | ❌ rejected |
retry_delay | ✅ default 5s | ❌ rejected |
retry_backoff | ✅ constant / linear / exponential, default constant | ❌ rejected |
restart_delay | ❌ rejected | ✅ default 1s |
restart_backoff | ❌ rejected | ✅ constant / linear / exponential, default exponential |
A service supervisor restarts a replica forever (with bounded exponential backoff) until you stop it explicitly.
Backoff reset: backoff_reset_after
Section titled “Backoff reset: backoff_reset_after”A replica that stays up at least backoff_reset_after (default 60s)
resets its restart counter, so a service that fails repeatedly at
first doesn’t keep slow restart delays once it stabilises. Configure
it in [defaults] (applies to every service that doesn’t override it)
or per-service:
[defaults]backoff_reset_after = "30s" # global default
[services.flaky-worker]run = "/usr/local/bin/worker"backoff_reset_after = "2m" # this one needs longer to call "stable"If you want “finite, escalating wait” semantics on a service, model it
as a task with cron/retry_attempts instead. If you want
“indefinite self-healing supervision” on a workload, that’s
[services.*].
Where to next
Section titled “Where to next”- Concurrency policies — what stops a retry
chain when
on_overlap = "terminate"fires. - Notifications model — coalescing repeated failure alerts so retries don’t spam your channel.
[tasks.*]reference — the exact schema for every retry and timeout field.