@Retryable in Spring Boot

A comprehensive guide to implementing resilient retry logic

Table of Contents

Overview
How @Retryable Uses AOP
Benefits
Pitfalls & Dangers
Cost of Usage
Senior Architecture Thinking
Best Practices
Guardrails
Examples

Overview

@Retryable is a Spring Retry annotation that automatically retries a method when it throws specified exceptions. It's part of the Spring Retry project and provides declarative retry logic without cluttering your code with boilerplate.

@Retryable(
    value = {IOException.class},
    maxAttempts = 3,
    backoff = @Backoff(delay = 1000, multiplier = 2.0)
)
public String callExternalApi() {
    // This method will be retried up to 3 times with exponential backoff
}

How @Retryable Uses AOP (Advanced)

Understanding the Magic Behind @Retryable

@Retryable uses Aspect-Oriented Programming (AOP) to intercept method calls and wrap them with retry logic. Let's break down how it works:

Step-by-Step Execution Flow

Step	What Happens	Who Does It
1. Bean Creation	Spring detects @Retryable annotation during bean initialization	Spring's BeanPostProcessor
2. Proxy Creation	Spring wraps the bean with a CGLIB proxy that intercepts method calls	Spring AOP Engine
3. Method Call	When you call the method, the proxy intercepts it (not the original)	CGLIB Proxy
4. Aspect Logic	Aspect applies retry logic: try → catch → wait → retry	RetryTemplate Aspect
5. Exception Handling	If max retries exceeded, exception propagates to caller	Aspect / Original Exception

Visual Diagram: How AOP Wraps Your Method

WITHOUT AOP (What you wrote):
┌─────────────────────┐
│ myMethod()          │
│ {                   │
│   // Your code      │
│ }                   │
└─────────────────────┘

WITH AOP PROXY (What actually executes):
┌────────────────────────────────────────────┐
│ CGLIB Proxy                                │
│ ┌──────────────────────────────────────┐  │
│ │ RetryAspect (Added by Spring)        │  │
│ │ ┌──────────────────────────────────┐ │  │
│ │ │ for (int attempt = 0; ...)       │ │  │
│ │ │   try {                          │ │  │
│ │ │     return myMethod() // ACTUAL  │ │  │
│ │ │   } catch (Exception e) {        │ │  │
│ │ │     if (shouldRetry) {           │ │  │
│ │ │       sleep(backoff)             │ │  │
│ │ │       continue // retry          │ │  │
│ │ │     } else throw                 │ │  │
│ │ │   }                              │ │  │
│ │ └──────────────────────────────────┘ │  │
│ └──────────────────────────────────────┘  │
└────────────────────────────────────────────┘

Example 1: What AOP Actually Does Under the Hood

// YOUR CODE (what you write):
@Service
public class PaymentService {
    @Retryable(
        value = {SocketTimeoutException.class},
        maxAttempts = 3,
        backoff = @Backoff(delay = 1000)
    )
    public PaymentResult processPayment(Order order) {
        logger.info("Processing payment for order {}", order.getId());
        return paymentGateway.charge(order);  // External API call
    }
}

// WHAT SPRING CREATES (AOP Magic):
@Service
public class PaymentServiceProxy extends PaymentService {  // PROXY CLASS
    private PaymentService target = new PaymentService();

    @Override
    public PaymentResult processPayment(Order order) {
        RetryTemplate retryTemplate = new RetryTemplate();
        retryTemplate.setMaxAttempts(3);
        retryTemplate.setBackOffPolicy(new FixedBackOffPolicy(1000));

        return retryTemplate.execute(context -> {
            try {
                logger.info("Attempt {} of {}",
                    context.getRetryCount() + 1, 3);
                return target.processPayment(order);  // Call actual method
            } catch (SocketTimeoutException e) {
                if (context.getRetryCount() < 2) {  // 0-indexed
                    logger.warn("Timeout occurred, will retry");
                    throw e;  // Trigger retry
                } else {
                    logger.error("Max retries exceeded");
                    throw e;  // Give up
                }
            }
        });
    }
}

Example 2: Seeing the AOP Proxy in Action

@SpringBootApplication
@EnableRetry  // IMPORTANT: Enables AOP for @Retryable
public class Application {
    public static void main(String[] args) {
        SpringApplication.run(Application.class, args);
    }
}

@Component
public class DebugAopExample {
    @Autowired
    private PaymentService paymentService;

    public void demonstrateAop() {
        // Check if this is actually a proxy
        System.out.println("Class: " + paymentService.getClass().getName());
        // Output: Class: com.example.PaymentService$$EnhancerBySpringCGLIB$$12345
        //         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        //         Notice the $$EnhancerBySpringCGLIB - this is the proxy!

        System.out.println("Is Proxy: " +
            paymentService.getClass().getName().contains("EnhancerBySpringCGLIB"));
        // Output: true

        // The proxy intercepts all method calls
        paymentService.processPayment(order);  // Goes through proxy first!
    }
}

Example 3: Execution Timeline with Logging

// This is what happens when you call a @Retryable method:

paymentService.processPayment(order);
    ↓
[AOP Proxy Intercepts]
    ↓
[RetryTemplate starts loop: attempt 1]
    ↓
Logger: "Processing payment for order 123"
    ↓
paymentGateway.charge(order)  // ACTUAL METHOD CALL
    ↓
[SocketTimeoutException thrown!]
    ↓
[AOP Catches exception]
    ↓
Logger: "Timeout occurred, will retry"
    ↓
[Sleep 1000ms - backoff delay]
    ↓
[RetryTemplate attempts retry: attempt 2]
    ↓
Logger: "Processing payment for order 123"
    ↓
paymentGateway.charge(order)  // RETRY CALL
    ↓
[Success! Returns PaymentResult]
    ↓
[AOP stops looping, returns result to caller]

Example 4: AOP + @Recover (Error Recovery)

@Service
public class PaymentService {
    @Retryable(
        value = {SocketTimeoutException.class},
        maxAttempts = 3,
        backoff = @Backoff(delay = 1000)
    )
    public PaymentResult processPayment(Order order) {
        return paymentGateway.charge(order);
    }

    // AOP automatically calls this if all retries fail
    @Recover
    public PaymentResult recoverPayment(
        SocketTimeoutException ex,
        Order order) {
        // This is called by AOP when max retries exceeded
        logger.error("All retries failed. Queuing for async retry", ex);
        asyncQueue.add(order);
        return PaymentResult.QUEUED;  // Return degraded response
    }
}

// WHAT HAPPENS INTERNALLY:
// If processPayment() fails 3 times:
//   1. AOP catches final exception
//   2. AOP finds @Recover method
//   3. AOP calls recoverPayment() automatically
//   4. Returns: PaymentResult.QUEUED instead of throwing

Important: Why @EnableRetry is Required

// WITHOUT @EnableRetry:
@SpringBootApplication
public class Application {
    // @Retryable annotations are IGNORED
    // No proxy is created, no retries happen
}

// WITH @EnableRetry:
@SpringBootApplication
@EnableRetry  // ← This tells Spring to create AOP proxies for @Retryable
public class Application {
    // @Retryable annotations work!
    // Proxies are created, retries work
}

// What @EnableRetry does:
// 1. Scans for @Retryable annotations
// 2. Creates CGLIB proxies for those methods
// 3. Registers RetryTemplate and retry advice (the aspect)
// 4. Enables method interception

Key AOP Concepts You Should Know

Concept	What It Does	In @Retryable
Joinpoint	The point in code where aspect can be applied	Your @Retryable method
Pointcut	Selects which joinpoints to apply aspect to	@Retryable annotation
Aspect	The logic that gets applied (the "before/after" code)	RetryTemplate + retry loop
Proxy	The wrapper object that intercepts calls	PaymentService$$EnhancerBySpringCGLIB
Advice	The actual code that runs (around, before, after)	The try-catch-retry loop

Benefits

1. Improved Resilience

Handles transient failures gracefully without user intervention. Network glitches, temporary service unavailability, and rate limiting are automatically recovered from.

2. Clean Code

Eliminates manual try-catch loops and retry counters. Your business logic remains focused and readable.

3. Configurable Backoff Strategies

Supports multiple backoff strategies:

Fixed Delay: Wait the same duration between retries
Exponential Backoff: Increase wait time exponentially (1s, 2s, 4s, 8s...)
Random Delay: Add jitter to prevent thundering herd

4. Recovery Callbacks

Use @Recover to handle final failures gracefully with fallback logic.

5. Declarative & Consistent

Consistent retry behavior across your application with minimal configuration.

Pitfalls & Dangers

❌ Critical Pitfall #1: Retrying Non-Idempotent Operations

Retrying operations that modify state can cause duplicate writes, double-charging customers, or corrupted data.

// 🚨 DANGEROUS - Will charge multiple times if retried
@Retryable(value = TimeoutException.class)
public void chargeCustomer(String customerId, double amount) {
    paymentService.charge(customerId, amount);
    // If this times out and retries, customer charged twice!
}

❌ Critical Pitfall #2: Exponential Backoff Without Jitter

Multiple clients retrying in sync causes thundering herd problem. All requests hit your service at the same time, making it worse.

// 🚨 BAD - All clients retry at 1s, 2s, 4s (synchronized)
@Retryable(backoff = @Backoff(delay = 1000, multiplier = 2.0))
public String callService() { }

// ✅ GOOD - Add jitter to randomize retry times
@Retryable(backoff = @Backoff(
    delay = 1000,
    multiplier = 2.0,
    maxDelay = 30000,
    random = true  // Add randomness
))
public String callService() { }

❌ Critical Pitfall #3: Retrying Permanent Errors

Some errors are permanent (401 Unauthorized, 403 Forbidden, validation errors). Retrying them is waste of time and resources.

// 🚨 BAD - Retries 401 Unauthorized 3 times (pointless)
@Retryable(value = HttpClientErrorException.class)
public String callApi() {
    return restTemplate.getForObject("https://api.example.com/data", String.class);
}

// ✅ GOOD - Only retry transient errors
@Retryable(
    value = {HttpServerErrorException.class, TimeoutException.class},
    exclude = {HttpClientErrorException.class}
)
public String callApi() { }

⚠️ Pitfall #4: Cascading Retries

If service A retries → calls service B → service B retries → calls service C with retries, you get exponential retry storms.

Example: 3 retries × 3 retries × 3 retries = 27 actual attempts from a single request!

⚠️ Pitfall #5: Infinite Retry Loops

Without proper exception handling or max attempts, retries can loop forever, hanging threads and exhausting resources.

⚠️ Pitfall #6: Missing Monitoring

Without logging/metrics on retries, you won't know if your system is constantly failing and recovering in the background.

Cost of Usage

CPU & Memory Cost

Minimal overhead: @Retryable uses AOP (aspect-oriented programming) which adds ~1-2% CPU per method call
Thread blocking: During backoff delays, threads are blocked (use reactive code if this is critical)
Increased latency: With 3 retries and exponential backoff, worst case is 1 + 2 + 4 = 7 seconds added latency

Resource Cost

Connection timeouts: Each retry consumes database/network connections during backoff
Cascading failures: Can multiply load on failing services by retry factor
Log volume: Each retry generates logs, potentially doubling/tripling log volume

Business Cost

Degraded UX: 7-second request latency frustrates users
Failed operations: If all retries fail, user still gets error (but after waiting)
Rate limiting: Your retries might trigger rate limits on downstream APIs

Configuration	Max Latency (Best Case)	Max Latency (Worst Case)	Cost Assessment
3 attempts, 1s fixed delay	~100ms	2-3 seconds	Acceptable for most APIs
3 attempts, exponential backoff (1s, 2s, 4s)	~100ms	7-8 seconds	Too slow for user-facing APIs
5 attempts, exponential backoff	~100ms	31+ seconds	Unacceptable

Senior Architecture Thinking

Principle #1: Separate Concerns by Failure Domain

Not all services fail the same way. Your retry strategy should differ:

// For flaky internal services: Be aggressive
@Retryable(
    value = {SocketTimeoutException.class},
    maxAttempts = 3,
    backoff = @Backoff(delay = 100)
)
public User getUserFromInternalCache() { }

// For external APIs: Be conservative
@Retryable(
    value = {HttpServerErrorException.class},
    maxAttempts = 2,
    backoff = @Backoff(delay = 500)
)
public String callThirdPartyAPI() { }

// For databases: No retries (let connection pool handle it)
public User getUserFromDatabase() { }

Principle #2: Use Circuit Breakers, Not Just Retries

Retries alone don't protect against systemic failures. Combine with circuit breaker pattern (e.g., Resilience4j):

@Retryable(maxAttempts = 2)
@CircuitBreaker(name = "paymentAPI")  // Stops retries if API is down
public void processPayment(Order order) {
    paymentGateway.charge(order);
}

Why? If a service is completely down, retrying just wastes time and resources. Circuit breaker detects this and fast-fails after a threshold.

Principle #3: Idempotency is Prerequisite

Before adding @Retryable to ANY method, ask: "Can I safely call this 3 times?"

Idempotent (safe to retry): GET requests, read operations, queries with unique IDs
NOT idempotent (dangerous to retry): POST payments, CREATE orders, DELETE operations

Principle #4: Let Upstream Handle Retries

If your service calls another service, don't retry both. One should handle it:

// Pattern A: Service A retries calling Service B
@Retryable  // Service A handles failure
public void callServiceB() {
    serviceB.process();  // Service B doesn't retry (simple)
}

// Pattern B: Service B retries internally
public void callServiceB() {
    serviceB.process();  // Service B has @Retryable internally
}

// ❌ Anti-pattern: Both retry (creates exponential storms)
@Retryable
public void callServiceB() {
    @Retryable
    serviceB.process();
}

Principle #5: Observability is Non-Negotiable

You must know when and why retries happen. Add logging and metrics:

@Retryable(
    value = TimeoutException.class,
    maxAttempts = 3
)
@Recover
public String callAPI() throws TimeoutException {
    log.info("Attempting to call external API");
    return restTemplate.getForObject("...", String.class);
}

@Recover  // Called after all retries fail
public String recover(TimeoutException e) {
    log.error("All retries exhausted for API call", e);
    metrics.increment("api.call.failed");
    return "fallback-response";
}

Best Practices

1. Keep Retry Logic Minimal

Only retry at the exact point of failure, not at high-level business logic:

// ✅ GOOD - Retry where the real failure can happen
@Service
public class PaymentService {
    @Retryable(value = TimeoutException.class)
    public PaymentResponse callPaymentGateway(Order order) {
        return gateway.charge(order);  // Only this fails
    }

    public void processOrder(Order order) {
        PaymentResponse response = callPaymentGateway(order);
        // Rest of business logic (no retries here)
    }
}

2. Use Appropriate Backoff Strategies

// For external APIs: Exponential backoff with jitter
@Retryable(backoff = @Backoff(
    delay = 1000,
    multiplier = 2.0,
    maxDelay = 10000,
    random = true
))

// For internal services: Fixed short delay
@Retryable(backoff = @Backoff(delay = 100))

// For batch jobs: Long exponential backoff
@Retryable(backoff = @Backoff(
    delay = 5000,
    multiplier = 3.0,
    maxDelay = 60000,
    random = true
))

3. Be Specific About Exceptions

Only retry exceptions that MAY be transient:

// ✅ GOOD
@Retryable(value = {
    SocketTimeoutException.class,
    HttpServerErrorException.class,  // 5xx errors
    ServiceUnavailableException.class
})

// ❌ BAD
@Retryable(value = Exception.class)  // Retries EVERYTHING including bugs!

4. Always Provide Recovery Logic

Never let retries fail silently:

@Retryable(maxAttempts = 3)
public String fetchUserData(String userId) {
    return userService.getUser(userId);
}

@Recover
public String recoverFetchUserData(Throwable e, String userId) {
    log.error("Failed to fetch user {} after retries", userId);
    // Option 1: Return cached data
    return cache.getOrDefault(userId, "{}");

    // Option 2: Return sensible default
    // return buildDefaultUser(userId);

    // Option 3: Rethrow with better context
    // throw new UserServiceException("Could not fetch user", e);
}

5. Test Retry Behavior Explicitly

@Test
public void testRetryOnTimeout() {
    when(externalApi.call())
        .thenThrow(new TimeoutException())
        .thenThrow(new TimeoutException())
        .thenReturn("success");

    String result = service.callWithRetry();

    assertEquals("success", result);
    verify(externalApi, times(3)).call();  // Verify it retried exactly
}

Guardrails

Guardrail #1: Maximum 3 Retries for User-Facing APIs

More than 3 retries means 7+ seconds of waiting. That's unacceptable UX. Instead, implement async processing or queuing.

Guardrail #2: Never Retry Write Operations Without Idempotency Keys

Require idempotency keys (unique request IDs) before retrying any POST/PUT/DELETE:

// Require idempotency key
@Retryable
public void transferMoney(TransferRequest request) {
    // Service must track request.idempotencyKey to prevent duplicates
}

Guardrail #3: Monitor Retry Rates

Set up alerts if retry rate exceeds threshold:

// If retries > 5% of calls, something is wrong
if (metrics.getRetryRate() > 0.05) {
    alerts.critical("High retry rate detected");
}

Guardrail #4: Combine with Timeouts

Never retry without a timeout. Set request timeout shorter than total retry time:

// Timeout: 2s, Retry 3x with 1s delay = max 5 seconds total
@Retryable(maxAttempts = 3, backoff = @Backoff(delay = 1000))
@Timeout(2000)  // Each attempt times out in 2 seconds
public String callApi() { }

Guardrail #5: Document Why Retries Exist

Add comments explaining why this specific method needs retries:

/**
 * Calls Stripe API to process payment.
 * Retried because Stripe occasionally returns 5xx during peak load.
 * Uses exponential backoff to avoid overwhelming their service.
 * Requires idempotency key to prevent duplicate charges.
 */
@Retryable(value = {HttpServerErrorException.class}, maxAttempts = 3)
public StripeResponse processPayment(PaymentRequest req) { }

Real-World Examples

Example 1: Calling External Payment API

@Service
public class PaymentService {
    @Retryable(
        value = {HttpServerErrorException.class},  // 5xx errors only
        maxAttempts = 2,  // Don't retry too much (UX)
        backoff = @Backoff(delay = 500)
    )
    public PaymentResponse charge(Order order) {
        try {
            return stripeClient.charge(order.getId(), order.getAmount());
        } catch (HttpServerErrorException e) {
            log.warn("Stripe returned 5xx, will retry", e);
            throw e;
        }
    }

    @Recover
    public PaymentResponse chargeRecover(Throwable e, Order order) {
        log.error("Payment failed for order {} after retries", order.getId(), e);
        // Fallback: Mark as pending and retry later via async job
        pendingPayments.add(order);
        return new PaymentResponse("pending", order.getId());
    }
}

Example 2: Database Connection Retrieval

@Service
public class UserRepository {
    @Retryable(
        value = {CannotGetJdbcConnectionException.class},
        maxAttempts = 3,
        backoff = @Backoff(delay = 100)  // Short delay (DB pool recovering)
    )
    public User findById(Long userId) {
        return jdbcTemplate.queryForObject(
            "SELECT * FROM users WHERE id = ?",
            new Object[]{userId},
            userRowMapper
        );
    }

    @Recover
    public User findByIdRecover(Throwable e, Long userId) {
        log.error("Database connection failed for user {}", userId);
        // Cache miss is acceptable for reads
        return cache.getIfPresent(userId);
    }
}

Example 3: Message Queue Publishing (Idempotent)

@Service
public class EventPublisher {
    // Safe to retry because message IDs ensure deduplication
    @Retryable(
        value = {BrokerNotAvailableException.class},
        maxAttempts = 5,
        backoff = @Backoff(delay = 1000, multiplier = 2.0)
    )
    public void publishEvent(DomainEvent event) {
        // Event has unique messageId for idempotency
        messageBroker.publish(event.getMessageId(), event);
    }

    @Recover
    public void publishEventRecover(Throwable e, DomainEvent event) {
        log.error("Failed to publish event {}, will be retried by async processor", event.getMessageId());
        deadLetterQueue.add(event);
    }
}