Designing a Retry Mechanism for Resilient Spring Boot Applications

"How To" with YugabyteDB

Amit Chauhan

February 27, 2024

Building a resilient application means being prepared for the unexpected. When it comes to database operations, this means ensuring that transient errors, like temporary network glitches or brief database unavailability, don’t disrupt my application’s functionality. In this guide, I’ll show you how to set up a robust retry mechanism and manage transactions effectively in a Spring Boot application with YugabyteDB as the database.

What is a Retry Mechanism?

A retry mechanism is essential to many modern software systems. It enables a system to quickly recover from transient errors or network outages by automatically retrying failed operations. Its goal is to ensure that the system continues to operate smoothly.

Learn more about transaction retries in YSQL>>>>

Why Are Retry Mechanisms Necessary?

Let me answer this question with an analogy. Imagine an application is a busy coffee shop. Customers constantly place orders (transactions) and expect their coffee (successful database operations). If there’s a sudden, brief disruption to the coffee supply (a transient database error), I wouldn’t close the shop. Instead, I’d wait a bit and try to serve the order again. That’s exactly what a retry mechanism does for an application. It ensures that temporary issues don’t result in failed operations but are gracefully handled by pausing and retrying.

Setting Up the Retry Mechanism

Let me set the stage for my application to handle these brief interruptions.

High level architecture Spring Boot apps — High level architecture

Understanding the Configuration

In RetryConfigProperties, I specify the number of retries and the delay between them. This is similar to deciding how often I’ll check back for the coffee supply before telling the customer there’s an issue.

@ConfigurationProperties("spring.retry")
public class RetryConfigProperties {
    // ... (Properties and their getters/setters)
    private int maxAttempts = 3;
    private int backoffInitialInterval = 3500;
    private int backoffMultiplier = 3;
    private int backoffMaxInterval = 30000;

    public int getMaxAttempts() {
        return maxAttempts;
    }

    public void setMaxAttempts(int maxAttempts) {
        this.maxAttempts = maxAttempts;
    }

    public int getBackoffInitialInterval() {
        return backoffInitialInterval;
    }

    public void setBackoffInitialInterval(int backoffInitialInterval) {
        this.backoffInitialInterval = backoffInitialInterval;
    }

    public int getBackoffMultiplier() {
        return backoffMultiplier;
    }

    public void setBackoffMultiplier(int backoffMultiplier) {
        this.backoffMultiplier = backoffMultiplier;
    }

    public int getBackoffMaxInterval() {
        return backoffMaxInterval;
    }

    public void setBackoffMaxInterval(int backoffMaxInterval) {
        this.backoffMaxInterval = backoffMaxInterval;
    }
}

Crafting the Retry Policy

In RetryConfig, I’ll set the rules for retrying, using patterns to identify errors. This is akin to saying, “This type of error is like running out of coffee beans; it’s temporary, so let’s try again in a bit.”

You can utilize a list of PostgreSQL error codes (YugabyteDB uses these same codes) and their descriptions found on the PostgreSQL documentation site. Add the codes in the pattern variable for which you like your transactions to be retried.

@Configuration
@EnableConfigurationProperties(RetryConfigProperties.class)
public class RetryConfig {
    private static final Logger LOGGER = LoggerFactory.getLogger(RetryConfig.class);

    // 40001 - optimistic locking or leader changes abort
    // 40P01 - deadlock
    // 08006 - connection issues
    // 57P01 - broken pool conn (invalidated connections because of node failure, etc.)
    // XX000 - other connection related issues (not classified) <- removed as not explicitly retryable
    private static final Pattern SQL_STATE_PATTERN = Pattern.compile("^(40001)|(40P01)|(57P01)|(08006)|(XX000)|(42804)");

    /**
* Configures a Spring Retry Backoff policy based on a randomized exponential backoff.
* Exponential backoff uses a multiplier factor to determine the delay for the next retry.
*
* This behaves nicely as it assumes the first retry is likely something minor and will * be resolved with the next connection and, if it fails again, that it may take longer with * each successive retry. 
*
* The addition of a randomized "jitter" helps reduce the impact of synchronized retry loops * all colliding with each other making the problem worse. 
*
* As a general rule of thumb, set the initial interval low so that a single retry does not 
* add too much latency of the original request (assuming a single retry will resolve 99.9% of 
* the time). The multiplier should be fairly small as well but not so small that all the 
* retries are exhausted in < 3 seconds as this should cover the exceptional case of complete 
* network failure and tablet leader re-election in another zone/region. 
* 
* @return a configured BackOffPolicy 
*/ 

  @Bean
    public BackOffPolicy exponentialRandomBackOffPolicy(RetryConfigProperties retryProperties) {
        ExponentialRandomBackOffPolicy randomBackOffPolicy = new ExponentialRandomBackOffPolicy();
        randomBackOffPolicy.setInitialInterval(retryProperties.getBackoffInitialInterval());
        randomBackOffPolicy.setMultiplier(retryProperties.getBackoffMultiplier());
        // max interval will set the upper bounds of any calculated interval so that no
        // single retry loop will ever wait longer than this value.
        randomBackOffPolicy.setMaxInterval(retryProperties.getBackoffMaxInterval());
        return randomBackOffPolicy;
    }
/** 
* Configures a Spring Retry policy that handles nested exceptions specifically designed 
* to catch and retry specific SQL exceptions. Since this cannot be determined entirely 
* by exception class, this retry policy also uses SQL State to determine if an execution 
* is retryable using a regular expression. For any other class of execution, a no-op 
* retry policy will be used. 
* 
* @return a configured RetryPolicy 
*/ 

 @Bean
    public RetryPolicy exceptionClassifierRetryPolicy(RetryConfigProperties retryProperties) {
        ExceptionClassifierRetryPolicy retryPolicy = new ExceptionClassifierRetryPolicy();

        // delegate retry policies based on the type of exception/sql state
        SimpleRetryPolicy simpleRetryPolicy = new SimpleRetryPolicy(retryProperties.getMaxAttempts());
        NeverRetryPolicy neverRetryPolicy = new NeverRetryPolicy();

        // Unroll the exception stack looking for:
        // SQLRecoverableException or SQLTransientConnectionException
        // OR any other SQLException that has a SqlState matching the
        // pattern of known retryable errors.  Otherwise, use a never-retry policy.

        retryPolicy.setExceptionClassifier(classifiable -> {
            while (classifiable != null) {
                if (classifiable instanceof SQLRecoverableException || classifiable instanceof SQLTransientConnectionException) {
                    return simpleRetryPolicy;
                } else if (classifiable instanceof SQLException ) {
                    SQLException ex = (SQLException) classifiable;
                    System.out.println("SQLState: " + ex.getSQLState() + " ErrorCode: " + ex.getErrorCode() + " Message: " + ex.getMessage());
                    // assumes SQLState is only populated with state codes
                    if (ex.getSQLState() != null && SQL_STATE_PATTERN.matcher(ex.getSQLState()).matches()) {
                        return simpleRetryPolicy;
                    }
                    else if(ex.getSQLState() == null){
                        return simpleRetryPolicy;
                    }
                }
                classifiable = classifiable.getCause();
            }

            return neverRetryPolicy; // never retry on anything else
        });

        return retryPolicy;
    }
}

Integrating Retry Policies

In RetryInterceptor, I’ll equip the application with the retry policy. This is akin to giving the barista (the application) instructions on how to handle the situation when there’s an issue with the coffee supply (the transient error).

@Component
@EnableRetry
public class RetryInterceptor {
    // ... (RetryTemplate and RetryOperationsInterceptor configuration)
    private final RetryPolicy retryPolicy;
    private final BackOffPolicy backOffPolicy;

    public RetryInterceptor(RetryPolicy retryPolicy, BackOffPolicy backOffPolicy) {
        this.retryPolicy = retryPolicy;
        this.backOffPolicy = backOffPolicy;
    }

    /**
     * Creates and configures a RetryTemplate object.
     *
     * @return a RetryTemplate object with the configured retry policy and
     * back-off policy
     */
    @Bean
    public RetryTemplate retryTemplate() {
        RetryTemplate retryTemplate = new RetryTemplate();
        retryTemplate.setRetryPolicy(retryPolicy);
        retryTemplate.setBackOffPolicy(backOffPolicy);
        return retryTemplate;
    }

     /**
     * Returns a RetryOperationsInterceptor for use in methods annotated with
     * @Retry(interceptor="ysqlRetryInterceptor"). The behavior of * this interceptor is affected by the configuration of both the retry and * back-off policies. * * @return a RetryOperationsInterceptor bean named "ysqlRetryInterceptor" */ @Bean("ysqlRetryInterceptor") public RetryOperationsInterceptor ysqlRetryInterceptor() { return RetryInterceptorBuilder.stateless() .retryPolicy(retryPolicy) .backOffPolicy(backOffPolicy) .build(); } }

Managing Transactions: Keeping Orders in Check

Transactions mirror customer orders; they must be complete and accurate. Just as a coffee order must be fully prepared before serving, database operations should all succeed before committing to maintain consistency and reliability.

Using @Transactional Annotation

Using @Transactional is like having an assistant who ensures that a coffee order is either fully prepared and served or, in case of an issue, it’s as though the order never happened, keeping the process simple and clean.

 
@Repository
public class RetryExampleWorkload {
    // ... (Other fields and methods)

    @Transactional
    public void execTransactionsWithRetryTemplateAndTransactionalAnnotation() {
        // ... (Transactional operations with retry logic)
        retryTemplate.execute(context -> {
            // Check if retry is happening
            if (RetrySynchronizationManager.getContext().getRetryCount() > 0) {
                System.out.println("RETRY COUNT:[" + RetrySynchronizationManager.getContext().getRetryCount() + "] ");
            }
            	// Your transactional logic here
            	jdbcTemplate.update(...); // Transaction 1
jdbcTemplate.update(...); // Transaction 2
	jdbcTemplate.update(...); // Transaction 3
            return null;
        });
    }
}

Using TransactionTemplate for More Control

Sometimes a hands-on approach is necessary, especially for customizing parts of an order or addressing unique situations. TransactionTemplate offers this level of control, enabling precise definitions of how transactions should be managed.

First, let’s set up TransactionConfig:

@Configuration
public class TransactionConfig {

    @Bean
    public TransactionTemplate transactionTemplate(PlatformTransactionManager transactionManager) {
        return new TransactionTemplate(transactionManager);
    }

}

public void execTransactionsWithRetryTemplateAndTransactionTemplate() {
    // ... (Programmatic transaction management)
    retryTemplate.execute(context -> {
        // Check if retry is happening
        if (RetrySynchronizationManager.getContext().getRetryCount() > 0) {
            System.out.println("RETRY COUNT:[" + RetrySynchronizationManager.getContext().getRetryCount() + "]");
        }

        transactionTemplate.execute(new TransactionCallbackWithoutResult() {
            protected void doInTransactionWithoutResult(TransactionStatus status) {
                try {
	      // Your transactional logic here
                    jdbcTemplate.update(...); // Transaction 1
         jdbcTemplate.update(...); // Transaction 2
	     jdbcTemplate.update(...); // Transaction 3
                }
                catch (Exception ex) {
                    System.out.println("Going to rollback the transaction");
                    status.setRollbackOnly();
                    throw ex;
                }
            }
        });
        return null;
    });
}

Using TransactionManager Directly for Full Control

In complex scenarios, such as managing large catering orders with various special requests, full control is essential. Using TransactionManager directly allows you to oversee every order detail, ensuring everything is just right.

public void execTranxWithRetryTemplateAndTrxManager() {
    retryTemplate.execute(context -> {
        // Check if retry is happening
        if (RetrySynchronizationManager.getContext().getRetryCount() > 0) {
            System.out.println("RETRY COUNT:[" + RetrySynchronizationManager.getContext().getRetryCount() + "]");
        }

        DefaultTransactionDefinition def = new DefaultTransactionDefinition();
        // explicitly setting the transaction name is something that can be done only programmatically
        def.setName("TxnName:"+uuid);
        def.setPropagationBehavior(TransactionDefinition.PROPAGATION_REQUIRED);

        TransactionStatus status = txManager.getTransaction(def);
			  try {
            // Your transactional logic here
            jdbcTemplate.update(...); // Transaction 1
            jdbcTemplate.update(...); // Transaction 2
            jdbcTemplate.update(...); // Transaction 3
        } catch (Exception ex) {
            System.out.println("Going to rollback the transaction");
            txManager.rollback(status);
            throw ex;
        }
        txManager.commit(status);

        return null;
    });
}

Investigate the Retry Mechanism with YugabyteDB and Our Sample App

Install YugabyteDB

Use the YugabyteDB quick start guide to walk you through installation.

Download the sample Workload Simulator App

Download the sample app jar file from Github.

wget https://github.com/YugabyteDB-Samples/yb-workload-simulator/releases/download/v0.0.8/yb-workload-sim-0.0.8.jar

Start the sample application

Navigate to the directory of the downloaded jar file and use the following command by plugging in your node IP address, database username, and password.

    java -Dspring.workload=retryExampleWorkload \
    -Dnode= \
    -Ddbuser= \
    -Ddbpassword= \
    -jar yb-workload-sim-0.0.8.jar

For example:

java -Dspring.workload=retryExampleWorkload \
    -Dnode=127.0.0.1 \
    -Ddbuser=yugabyte \
    -Ddbpassword=yugabyte \
    -jar yb-workload-sim-0.0.8.jar

Additional parameters for this App are available on Github README file.

Start and run simulations from the app UI

Open the following link on browser:

http://:8080

Example:

http://localhost:8080

This should bring up the app UI like this:

App UI for RetryExample workload simulation. Using the UI, we will trigger the simulation.

Click on the “top left hand” hamburger menu to bring up the app options. Choose “Usable Operations” and then select “Create Tables.” Clicking the “Run Create Tables Workload” will create two tables — “products” and “orders” — in the Yugabyte database. We will use these tables to run our simulations.

Workload Management Retry Example — App UI showing initial table creation process from the UI

Once the database tables are created, choose the “Seed Data” option to insert some dummy data in your tables:

App UI showing the initial data inserts.

Start the workload:

Three options will be listed in the “Test Type” dropdown. Choose one of them and click the “Run Test Retries on Transactions Workload” button. You will start seeing the throughput and latency metrics on the UI:

To test retries, stop the database node; after a few seconds bring it back up. You can do this by running the following command:

yugabyted stop

yugabyted start

Yugabyted-stop-and-start-operation.- — Screenshot showing the “Yugabyted” stop and start operation.

If you navigate to your app logs, you will see the entries for “retries”:

Retry mechanism is happening — Screenshot of logs showing that the “retry” option is happening when the cluster disruption occurs.

Since this is a single-node cluster, the UI will also reflect the throughput falling to 0 but then quickly rising back to normal once the database is brought back up again.

throughput falling to 0 but then quickly rising back to normal

Try different “Retry options” from the simulation dropdown and observe the logs. I have provided a working code example based on the above methodology: Retry Example Workload Simulator

Wrapping Up

To build resilient applications, you need to be ready for anything, similar to how a coffee shop prepares for a rush. Setting up a reliable retry mechanism and mastering transaction management are key to ensure that your app — much like a well-run coffee shop — delivers a smooth, consistent user experience, no matter what comes its way.

Just as every coffee order is unique, so is each application and situation. You have the flexibility to adjust retry settings and choose the transaction management strategy that best fits your needs. For detailed instructions, refer to the Spring Retry and programmatic transaction management sections in the Spring Framework documentation

Consider these mechanisms as your blueprint (or recipe!) for success in your journey to build resilient applications. They ensure that every ‘order’—or transaction—your application processes is handled efficiently and dependably every single time.

February 27, 2024