Configuring Resiliency โ๏ธ
How Graph Compose Handles Durability ๐ก๏ธ
Graph Compose ensures your workflows are durable and resilient by leveraging Temporal under the hood. This means:
- ๐พ Automatic State Persistence: Every step of your workflow is automatically persisted
- ๐ Seamless Recovery: If a node fails, we automatically resume from the last successful state
- ๐ Distributed Execution: Your workflows are executed across multiple machines for high availability
- ๐ Built-in Monitoring: Track the health and performance of your workflows
All you need to do is provide the configuration - we handle all the complex infrastructure for you!
๐ TLDR: Add retry policies to nodes and workflows to handle failures:
const node = {
activityConfig: {
retryPolicy: {
initialInterval: "1s", // Wait 1 second before the first retry
maximumInterval: "10s", // Maximum wait time between retries is 10 seconds
maximumAttempts: 3, // Attempt the operation up to 3 times
backoffCoefficient: 2.0 // Double the wait time after each failed attempt (1s, 2s, 4s...)
},
startToCloseTimeout: "30s" // Each attempt has 30 seconds to complete before timing out
}
}
Interactive Retry Calculator โก
Configure retry settings below to see when retries would occur if a node fails:
โกRetry Policy Calculator
If failures occur, retries will happen after these delays:
Retry Policies ๐
Why Retries Matter
In distributed systems, temporary failures are common:
- ๐ Network hiccups
- ๐ Service outages
- ๐ Rate limiting
- ๐ซ Transient errors
Graph Compose automatically handles these issues by retrying failed operations according to your configuration.
Retry Options
The specific options available for configuring retry behavior are defined in the schema below. This schema is automatically generated from our API specification and serves as the definitive source of truth for all available parameters and their descriptions.
Understanding Backoff Coefficient ๐
The backoff coefficient determines how quickly the retry interval grows between attempts. The formula is:
wait_time = initialInterval * (backoffCoefficient ^ (attempt - 1))
Let's look at a real-world example of retrying an API call to a payment processor:
{
activityConfig: {
retryPolicy: {
initialInterval: "2s", // Wait 2 seconds before the first retry
maximumInterval: "1m", // Maximum wait time between retries is 1 minute
maximumAttempts: 5, // Retry the operation up to 5 times (1 initial + 4 retries)
backoffCoefficient: 2.0 // Double the wait time after each failed attempt
},
startToCloseTimeout: "90s" // Each attempt has 90 seconds to complete
}
}
Here's what happens on each retry:
- First attempt fails โ Wait 2s (initial interval)
- Second attempt fails โ Wait 4s (2s ร 2.0ยน)
- Third attempt fails โ Wait 8s (2s ร 2.0ยฒ)
- Fourth attempt fails โ Wait 16s (2s ร 2.0ยณ)
- Fifth attempt fails โ Wait 32s (2s ร 2.0โด)
This exponential backoff is ideal for:
- ๐ณ Payment processing retries (allowing time for bank processing)
- ๐ External API rate limits to reset
- ๐ Database connection recovery
- ๐ฆ Resource cleanup or provisioning
Note: The actual wait time will never exceed maximumInterval
, even if the calculated value is larger.
Let's trace the execution flow if the operation fails repeatedly:
- Initial Attempt: Runs immediately. Fails.
- Wait: Waits for
initialInterval
(2 seconds). - Retry 1: Runs. Fails.
- Wait: Waits for
initialInterval * backoffCoefficient^1
(2s * 2.0^1 = 4 seconds). - Retry 2: Runs. Fails.
- Wait: Waits for
initialInterval * backoffCoefficient^2
(2s * 2.0^2 = 8 seconds). - Retry 3: Runs. Fails.
- Wait: Waits for
initialInterval * backoffCoefficient^3
(2s * 2.0^3 = 16 seconds). - Retry 4: Runs. Fails.
- Wait: Waits for
initialInterval * backoffCoefficient^4
(2s * 2.0^4 = 32 seconds). - Final Failure: Since
maximumAttempts
is 5 (1 initial + 4 retries), the workflow proceeds to handle the failure after the last wait.
The total minimum time spent waiting between retries in this scenario is 2 + 4 + 8 + 16 + 32 = 62 seconds. Remember that each attempt also has its own startToCloseTimeout
of 90 seconds.
Common Patterns ๐ฏ
Aggressive Retries for Flaky APIs
{
activityConfig: {
retryPolicy: {
initialInterval: "100ms", // Wait 100 milliseconds before the first retry
maximumInterval: "1s", // Maximum wait time between retries is 1 second
maximumAttempts: 10, // Retry the operation up to 10 times
backoffCoefficient: 1.5 // Increase wait time by 50% after each failed attempt
},
startToCloseTimeout: "5s" // Each attempt has 5 seconds to complete
}
}
Careful Retries for Critical Operations
{
activityConfig: {
retryPolicy: {
initialInterval: "5s", // Wait 5 seconds before the first retry
maximumInterval: "1m", // Maximum wait time between retries is 1 minute
maximumAttempts: 3, // Retry the operation up to 3 times
backoffCoefficient: 2.0 // Double the wait time after each failed attempt
},
startToCloseTimeout: "2m" // Each attempt has 2 minutes to complete
}
}
Under the Hood: Temporal Integration ๐ง
Graph Compose uses Temporal to provide enterprise-grade durability:
- ๐ข Business Continuity: Your workflows continue even during infrastructure updates
- ๐ Exactly-Once Execution: No duplicate operations or lost work
- ๐ Complete Audit Trail: Track every step and retry of your workflow
- โก Zero Data Loss: All progress is automatically persisted
You get all these benefits automatically - just focus on your workflow logic and let us handle the rest!
Ready to make your workflows more resilient? Let's go! ๐