Configuring Resiliency

Configure retry policies and timeouts to handle transient failures in your workflows. These settings control how Graph Compose recovers from network issues, service outages, and rate limits.

Retry policy

Add a retry policy to any node via activityConfig.retryPolicy. When a node fails with a retryable error, Graph Compose waits and tries again according to your configuration.

graph
  .node('call_api')
    .get('https://api.example.com/data')
    .withRetries({
      maximumAttempts: 5,
      initialInterval: '2s',
      backoffCoefficient: 2,
      maximumInterval: '30s'
    })
  .end()
FieldDescription
maximumAttemptsTotal number of attempts (including the initial attempt). Set to 1 for no retries.
initialIntervalDelay before the first retry. Accepts duration strings: "1s", "500ms", "2m".
backoffCoefficientMultiplier applied to each subsequent retry interval. Minimum: 1.
maximumIntervalCap on the retry interval. Prevents unbounded growth from exponential backoff.

Retry calculator

Configure retry settings below to see when retries would occur if a node fails:

Retry Policy Calculator

In seconds
retryPolicy.initialInterval: "1s"
In seconds (0 for no limit)
retryPolicy.maximumInterval: "60s"
1 initial attempt + up to 4 retries if failures occur
retryPolicy.maximumAttempts: 5
Multiplier for increasing delay between retries
retryPolicy.backoffCoefficient: 2
If failures occur, retries will happen after these delays:
Enter valid parameters to calculate retry delays.

Backoff coefficient

The backoff coefficient controls how quickly the delay between retries grows. The formula is:

delay = initialInterval * (backoffCoefficient ^ (attempt - 1))

For example, with initialInterval: "2s" and backoffCoefficient: 2:

AttemptDelay before retry
1st retry2s (2 * 2^0)
2nd retry4s (2 * 2^1)
3rd retry8s (2 * 2^2)
4th retry16s (2 * 2^3)
5th retry32s (2 * 2^4)

If maximumInterval is set, the calculated delay is capped at that value. For example, with maximumInterval: "10s", the 3rd through 5th retries above would all wait 10s instead of 8s, 16s, and 32s.

Timeouts

Two timeout settings control how long a node can run:

  • startToCloseTimeout: Maximum time for a single execution attempt. If the attempt exceeds this duration, it is considered failed. If retries remain, the next attempt starts after the retry delay.
  • scheduleToCloseTimeout: Maximum total time from when the node is scheduled until it completes. This includes queue time, all execution attempts, and all retry delays. If this timeout is exceeded, the node fails regardless of remaining retries.
graph
  .node('slow_api')
    .post('https://api.example.com/process')
    .withStartToCloseTimeout('30s')
    .withScheduleToCloseTimeout('5m')
    .withRetries({
      maximumAttempts: 3,
      initialInterval: '5s'
    })
  .end()

Workflow execution timeout

In addition to per-node timeouts, you can set a timeout on the entire workflow. The workflowExecutionTimeout caps the total duration of the workflow, including all nodes, retries, and waiting time.

Workflow timeout (SDK)

import { GraphCompose } from '@graph-compose/client'

const graph = new GraphCompose({
  token: process.env.GRAPH_COMPOSE_TOKEN
})

graph.withWorkflowConfig({
  workflowExecutionTimeout: '10m'
})

Workflow timeout (REST)

{
  "workflowConfig": {
    "workflowExecutionTimeout": "10m"
  },
  "nodes": []
}

Default values

If you do not specify activityConfig, these defaults apply:

FieldDefault
maximumAttempts1 (no retries)
initialInterval100 milliseconds
maximumInterval1 second
startToCloseTimeout30 seconds
scheduleToCloseTimeout30 seconds

Retryable and non-retryable errors

Not all errors trigger retries. Graph Compose distinguishes between transient failures (which are retried) and permanent failures (which fail immediately).

Retryable errors (use your retry policy):

  • Network issues: connection timeouts, DNS resolution failures
  • HTTP 5xx responses: 500 Internal Server Error, 503 Service Unavailable
  • Temporary service failures

Non-retryable errors (fail immediately, skip retries):

  • HTTP 4xx responses: 401 Unauthorized, 400 Bad Request, 404 Not Found
  • Configuration errors: invalid URLs, malformed request bodies
  • Resolution failures: unresolved secrets ($secret('name')), invalid template expressions

Common patterns

Quick retries for flaky APIs

Use a short initial interval and many attempts for services that fail intermittently but recover quickly.

graph
  .node('flaky_api')
    .get('https://api.example.com/data')
    .withRetries({
      maximumAttempts: 10,
      initialInterval: '100ms',
      backoffCoefficient: 1.5,
      maximumInterval: '1s'
    })
    .withStartToCloseTimeout('5s')
  .end()

Careful retries for critical operations

Use a longer initial interval and fewer attempts for services where each attempt is expensive or has side effects.

graph
  .node('process_payment')
    .post('https://api.example.com/payments')
    .withRetries({
      maximumAttempts: 3,
      initialInterval: '5s',
      backoffCoefficient: 2,
      maximumInterval: '1m'
    })
    .withStartToCloseTimeout('2m')
    .withScheduleToCloseTimeout('10m')
  .end()

Fail fast (no retries)

Set maximumAttempts: 1 to disable retries entirely. The node fails on the first error.

No retries

graph
  .node('one_shot')
    .post('https://api.example.com/webhook')
    .withRetries({ maximumAttempts: 1 })
    .withStartToCloseTimeout('15s')
  .end()

Next steps