Configuring Resiliency โš™๏ธ

How Graph Compose Handles Durability ๐Ÿ›ก๏ธ

Graph Compose ensures your workflows are durable and resilient by leveraging Temporal under the hood. This means:

  • ๐Ÿ’พ Automatic State Persistence: Every step of your workflow is automatically persisted
  • ๐Ÿ”„ Seamless Recovery: If a node fails, we automatically resume from the last successful state
  • ๐ŸŒ Distributed Execution: Your workflows are executed across multiple machines for high availability
  • ๐Ÿ“Š Built-in Monitoring: Track the health and performance of your workflows

All you need to do is provide the configuration - we handle all the complex infrastructure for you!

Interactive Retry Calculator โšก

Configure retry settings below to see when retries would occur if a node fails:

โšกRetry Policy Calculator

In seconds
retryPolicy.initialInterval: "1s"
In seconds (0 for no limit)
retryPolicy.maximumInterval: "60s"
1 initial attempt + up to 4 retries if failures occur
retryPolicy.maximumAttempts: 5
Multiplier for increasing delay between retries
retryPolicy.backoffCoefficient: 2
If failures occur, retries will happen after these delays:
Enter valid parameters to calculate retry delays.

Retry Policies ๐Ÿ”„

Why Retries Matter

In distributed systems, temporary failures are common:

  • ๐ŸŒ Network hiccups
  • ๐Ÿ”Œ Service outages
  • ๐Ÿ•’ Rate limiting
  • ๐Ÿ’ซ Transient errors

Graph Compose automatically handles these issues by retrying failed operations according to your configuration.

Retry Options

The specific options available for configuring retry behavior are defined in the schema below. This schema is automatically generated from our API specification and serves as the definitive source of truth for all available parameters and their descriptions.

Understanding Backoff Coefficient ๐Ÿ“ˆ

The backoff coefficient determines how quickly the retry interval grows between attempts. The formula is:

wait_time = initialInterval * (backoffCoefficient ^ (attempt - 1))

Let's look at a real-world example of retrying an API call to a payment processor:

{
  activityConfig: {
    retryPolicy: {
      initialInterval: "2s",    // Wait 2 seconds before the first retry
      maximumInterval: "1m",    // Maximum wait time between retries is 1 minute
      maximumAttempts: 5,       // Retry the operation up to 5 times (1 initial + 4 retries)
      backoffCoefficient: 2.0   // Double the wait time after each failed attempt
    },
    startToCloseTimeout: "90s" // Each attempt has 90 seconds to complete
  }
}

Here's what happens on each retry:

  1. First attempt fails โ†’ Wait 2s (initial interval)
  2. Second attempt fails โ†’ Wait 4s (2s ร— 2.0ยน)
  3. Third attempt fails โ†’ Wait 8s (2s ร— 2.0ยฒ)
  4. Fourth attempt fails โ†’ Wait 16s (2s ร— 2.0ยณ)
  5. Fifth attempt fails โ†’ Wait 32s (2s ร— 2.0โด)

This exponential backoff is ideal for:

  • ๐Ÿ’ณ Payment processing retries (allowing time for bank processing)
  • ๐ŸŒ External API rate limits to reset
  • ๐Ÿ”„ Database connection recovery
  • ๐Ÿ“ฆ Resource cleanup or provisioning

Note: The actual wait time will never exceed maximumInterval, even if the calculated value is larger.

Let's trace the execution flow if the operation fails repeatedly:

  1. Initial Attempt: Runs immediately. Fails.
  2. Wait: Waits for initialInterval (2 seconds).
  3. Retry 1: Runs. Fails.
  4. Wait: Waits for initialInterval * backoffCoefficient^1 (2s * 2.0^1 = 4 seconds).
  5. Retry 2: Runs. Fails.
  6. Wait: Waits for initialInterval * backoffCoefficient^2 (2s * 2.0^2 = 8 seconds).
  7. Retry 3: Runs. Fails.
  8. Wait: Waits for initialInterval * backoffCoefficient^3 (2s * 2.0^3 = 16 seconds).
  9. Retry 4: Runs. Fails.
  10. Wait: Waits for initialInterval * backoffCoefficient^4 (2s * 2.0^4 = 32 seconds).
  11. Final Failure: Since maximumAttempts is 5 (1 initial + 4 retries), the workflow proceeds to handle the failure after the last wait.

The total minimum time spent waiting between retries in this scenario is 2 + 4 + 8 + 16 + 32 = 62 seconds. Remember that each attempt also has its own startToCloseTimeout of 90 seconds.

Common Patterns ๐ŸŽฏ

Aggressive Retries for Flaky APIs

{
  activityConfig: {
    retryPolicy: {
      initialInterval: "100ms",  // Wait 100 milliseconds before the first retry
      maximumInterval: "1s",     // Maximum wait time between retries is 1 second
      maximumAttempts: 10,       // Retry the operation up to 10 times
      backoffCoefficient: 1.5    // Increase wait time by 50% after each failed attempt
    },
    startToCloseTimeout: "5s"    // Each attempt has 5 seconds to complete
  }
}

Careful Retries for Critical Operations

{
  activityConfig: {
    retryPolicy: {
      initialInterval: "5s",     // Wait 5 seconds before the first retry
      maximumInterval: "1m",     // Maximum wait time between retries is 1 minute
      maximumAttempts: 3,        // Retry the operation up to 3 times
      backoffCoefficient: 2.0    // Double the wait time after each failed attempt
    },
    startToCloseTimeout: "2m"   // Each attempt has 2 minutes to complete
  }
}

Under the Hood: Temporal Integration ๐Ÿ”ง

Graph Compose uses Temporal to provide enterprise-grade durability:

  • ๐Ÿข Business Continuity: Your workflows continue even during infrastructure updates
  • ๐Ÿ”’ Exactly-Once Execution: No duplicate operations or lost work
  • ๐Ÿ“ Complete Audit Trail: Track every step and retry of your workflow
  • โšก Zero Data Loss: All progress is automatically persisted

You get all these benefits automatically - just focus on your workflow logic and let us handle the rest!

Ready to make your workflows more resilient? Let's go! ๐Ÿš€