Configuring Resiliency
Configure retry policies and timeouts to handle transient failures in your workflows. These settings control how Graph Compose recovers from network issues, service outages, and rate limits.
Retry policy
Add a retry policy to any node via activityConfig.retryPolicy. When a node fails with a retryable error, Graph Compose waits and tries again according to your configuration.
graph
.node('call_api')
.get('https://api.example.com/data')
.withRetries({
maximumAttempts: 5,
initialInterval: '2s',
backoffCoefficient: 2,
maximumInterval: '30s'
})
.end()
| Field | Description |
|---|---|
maximumAttempts | Total number of attempts (including the initial attempt). Set to 1 for no retries. |
initialInterval | Delay before the first retry. Accepts duration strings: "1s", "500ms", "2m". |
backoffCoefficient | Multiplier applied to each subsequent retry interval. Minimum: 1. |
maximumInterval | Cap on the retry interval. Prevents unbounded growth from exponential backoff. |
Retry calculator
Configure retry settings below to see when retries would occur if a node fails:
⚡Retry Policy Calculator
If failures occur, retries will happen after these delays:
Backoff coefficient
The backoff coefficient controls how quickly the delay between retries grows. The formula is:
delay = initialInterval * (backoffCoefficient ^ (attempt - 1))
For example, with initialInterval: "2s" and backoffCoefficient: 2:
| Attempt | Delay before retry |
|---|---|
| 1st retry | 2s (2 * 2^0) |
| 2nd retry | 4s (2 * 2^1) |
| 3rd retry | 8s (2 * 2^2) |
| 4th retry | 16s (2 * 2^3) |
| 5th retry | 32s (2 * 2^4) |
If maximumInterval is set, the calculated delay is capped at that value. For example, with maximumInterval: "10s", the 3rd through 5th retries above would all wait 10s instead of 8s, 16s, and 32s.
Timeouts
Two timeout settings control how long a node can run:
startToCloseTimeout: Maximum time for a single execution attempt. If the attempt exceeds this duration, it is considered failed. If retries remain, the next attempt starts after the retry delay.scheduleToCloseTimeout: Maximum total time from when the node is scheduled until it completes. This includes queue time, all execution attempts, and all retry delays. If this timeout is exceeded, the node fails regardless of remaining retries.
graph
.node('slow_api')
.post('https://api.example.com/process')
.withStartToCloseTimeout('30s')
.withScheduleToCloseTimeout('5m')
.withRetries({
maximumAttempts: 3,
initialInterval: '5s'
})
.end()
Duration strings accept flexible formats: "500ms", "30s", "5m", "1h", "2d". Long forms like "30 seconds" and "5 minutes" are also supported.
Workflow execution timeout
In addition to per-node timeouts, you can set a timeout on the entire workflow. The workflowExecutionTimeout caps the total duration of the workflow, including all nodes, retries, and waiting time.
Workflow timeout (SDK)
import { GraphCompose } from '@graph-compose/client'
const graph = new GraphCompose({
token: process.env.GRAPH_COMPOSE_TOKEN
})
graph.withWorkflowConfig({
workflowExecutionTimeout: '10m'
})
Workflow timeout (REST)
{
"workflowConfig": {
"workflowExecutionTimeout": "10m"
},
"nodes": []
}
Default values
If you do not specify activityConfig, these defaults apply:
| Field | Default |
|---|---|
maximumAttempts | 1 (no retries) |
initialInterval | 100 milliseconds |
maximumInterval | 1 second |
startToCloseTimeout | 30 seconds |
scheduleToCloseTimeout | 30 seconds |
The default maximumAttempts is 1, which means no retries. You must explicitly configure a retry policy to enable retries.
Retryable and non-retryable errors
Not all errors trigger retries. Graph Compose distinguishes between transient failures (which are retried) and permanent failures (which fail immediately).
Retryable errors (use your retry policy):
- Network issues: connection timeouts, DNS resolution failures
- HTTP 5xx responses: 500 Internal Server Error, 503 Service Unavailable
- Temporary service failures
Non-retryable errors (fail immediately, skip retries):
- HTTP 4xx responses: 401 Unauthorized, 400 Bad Request, 404 Not Found
- Configuration errors: invalid URLs, malformed request bodies
- Resolution failures: unresolved secrets (
$secret('name')), invalid template expressions
Common patterns
Quick retries for flaky APIs
Use a short initial interval and many attempts for services that fail intermittently but recover quickly.
graph
.node('flaky_api')
.get('https://api.example.com/data')
.withRetries({
maximumAttempts: 10,
initialInterval: '100ms',
backoffCoefficient: 1.5,
maximumInterval: '1s'
})
.withStartToCloseTimeout('5s')
.end()
Careful retries for critical operations
Use a longer initial interval and fewer attempts for services where each attempt is expensive or has side effects.
graph
.node('process_payment')
.post('https://api.example.com/payments')
.withRetries({
maximumAttempts: 3,
initialInterval: '5s',
backoffCoefficient: 2,
maximumInterval: '1m'
})
.withStartToCloseTimeout('2m')
.withScheduleToCloseTimeout('10m')
.end()
Fail fast (no retries)
Set maximumAttempts: 1 to disable retries entirely. The node fails on the first error.
No retries
graph
.node('one_shot')
.post('https://api.example.com/webhook')
.withRetries({ maximumAttempts: 1 })
.withStartToCloseTimeout('15s')
.end()