@Retryable in Spring Boot
A comprehensive guide to implementing resilient retry logic
Table of Contents
Overview
@Retryable is a Spring Retry annotation that automatically retries a method when it throws specified exceptions. It's part of the Spring Retry project and provides declarative retry logic without cluttering your code with boilerplate.
@Retryable(
value = {IOException.class},
maxAttempts = 3,
backoff = @Backoff(delay = 1000, multiplier = 2.0)
)
public String callExternalApi() {
// This method will be retried up to 3 times with exponential backoff
}
How @Retryable Uses AOP (Advanced)
Understanding the Magic Behind @Retryable
@Retryable uses Aspect-Oriented Programming (AOP) to intercept method calls and wrap them with retry logic. Let's break down how it works:
Step-by-Step Execution Flow
| Step | What Happens | Who Does It |
|---|---|---|
| 1. Bean Creation | Spring detects @Retryable annotation during bean initialization | Spring's BeanPostProcessor |
| 2. Proxy Creation | Spring wraps the bean with a CGLIB proxy that intercepts method calls | Spring AOP Engine |
| 3. Method Call | When you call the method, the proxy intercepts it (not the original) | CGLIB Proxy |
| 4. Aspect Logic | Aspect applies retry logic: try → catch → wait → retry | RetryTemplate Aspect |
| 5. Exception Handling | If max retries exceeded, exception propagates to caller | Aspect / Original Exception |
Visual Diagram: How AOP Wraps Your Method
WITHOUT AOP (What you wrote):
┌─────────────────────┐
│ myMethod() │
│ { │
│ // Your code │
│ } │
└─────────────────────┘
WITH AOP PROXY (What actually executes):
┌────────────────────────────────────────────┐
│ CGLIB Proxy │
│ ┌──────────────────────────────────────┐ │
│ │ RetryAspect (Added by Spring) │ │
│ │ ┌──────────────────────────────────┐ │ │
│ │ │ for (int attempt = 0; ...) │ │ │
│ │ │ try { │ │ │
│ │ │ return myMethod() // ACTUAL │ │ │
│ │ │ } catch (Exception e) { │ │ │
│ │ │ if (shouldRetry) { │ │ │
│ │ │ sleep(backoff) │ │ │
│ │ │ continue // retry │ │ │
│ │ │ } else throw │ │ │
│ │ │ } │ │ │
│ │ └──────────────────────────────────┘ │ │
│ └──────────────────────────────────────┘ │
└────────────────────────────────────────────┘
Example 1: What AOP Actually Does Under the Hood
// YOUR CODE (what you write):
@Service
public class PaymentService {
@Retryable(
value = {SocketTimeoutException.class},
maxAttempts = 3,
backoff = @Backoff(delay = 1000)
)
public PaymentResult processPayment(Order order) {
logger.info("Processing payment for order {}", order.getId());
return paymentGateway.charge(order); // External API call
}
}
// WHAT SPRING CREATES (AOP Magic):
@Service
public class PaymentServiceProxy extends PaymentService { // PROXY CLASS
private PaymentService target = new PaymentService();
@Override
public PaymentResult processPayment(Order order) {
RetryTemplate retryTemplate = new RetryTemplate();
retryTemplate.setMaxAttempts(3);
retryTemplate.setBackOffPolicy(new FixedBackOffPolicy(1000));
return retryTemplate.execute(context -> {
try {
logger.info("Attempt {} of {}",
context.getRetryCount() + 1, 3);
return target.processPayment(order); // Call actual method
} catch (SocketTimeoutException e) {
if (context.getRetryCount() < 2) { // 0-indexed
logger.warn("Timeout occurred, will retry");
throw e; // Trigger retry
} else {
logger.error("Max retries exceeded");
throw e; // Give up
}
}
});
}
}
Example 2: Seeing the AOP Proxy in Action
@SpringBootApplication
@EnableRetry // IMPORTANT: Enables AOP for @Retryable
public class Application {
public static void main(String[] args) {
SpringApplication.run(Application.class, args);
}
}
@Component
public class DebugAopExample {
@Autowired
private PaymentService paymentService;
public void demonstrateAop() {
// Check if this is actually a proxy
System.out.println("Class: " + paymentService.getClass().getName());
// Output: Class: com.example.PaymentService$$EnhancerBySpringCGLIB$$12345
// ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
// Notice the $$EnhancerBySpringCGLIB - this is the proxy!
System.out.println("Is Proxy: " +
paymentService.getClass().getName().contains("EnhancerBySpringCGLIB"));
// Output: true
// The proxy intercepts all method calls
paymentService.processPayment(order); // Goes through proxy first!
}
}
Example 3: Execution Timeline with Logging
// This is what happens when you call a @Retryable method:
paymentService.processPayment(order);
↓
[AOP Proxy Intercepts]
↓
[RetryTemplate starts loop: attempt 1]
↓
Logger: "Processing payment for order 123"
↓
paymentGateway.charge(order) // ACTUAL METHOD CALL
↓
[SocketTimeoutException thrown!]
↓
[AOP Catches exception]
↓
Logger: "Timeout occurred, will retry"
↓
[Sleep 1000ms - backoff delay]
↓
[RetryTemplate attempts retry: attempt 2]
↓
Logger: "Processing payment for order 123"
↓
paymentGateway.charge(order) // RETRY CALL
↓
[Success! Returns PaymentResult]
↓
[AOP stops looping, returns result to caller]
Example 4: AOP + @Recover (Error Recovery)
@Service
public class PaymentService {
@Retryable(
value = {SocketTimeoutException.class},
maxAttempts = 3,
backoff = @Backoff(delay = 1000)
)
public PaymentResult processPayment(Order order) {
return paymentGateway.charge(order);
}
// AOP automatically calls this if all retries fail
@Recover
public PaymentResult recoverPayment(
SocketTimeoutException ex,
Order order) {
// This is called by AOP when max retries exceeded
logger.error("All retries failed. Queuing for async retry", ex);
asyncQueue.add(order);
return PaymentResult.QUEUED; // Return degraded response
}
}
// WHAT HAPPENS INTERNALLY:
// If processPayment() fails 3 times:
// 1. AOP catches final exception
// 2. AOP finds @Recover method
// 3. AOP calls recoverPayment() automatically
// 4. Returns: PaymentResult.QUEUED instead of throwing
Important: Why @EnableRetry is Required
// WITHOUT @EnableRetry:
@SpringBootApplication
public class Application {
// @Retryable annotations are IGNORED
// No proxy is created, no retries happen
}
// WITH @EnableRetry:
@SpringBootApplication
@EnableRetry // ← This tells Spring to create AOP proxies for @Retryable
public class Application {
// @Retryable annotations work!
// Proxies are created, retries work
}
// What @EnableRetry does:
// 1. Scans for @Retryable annotations
// 2. Creates CGLIB proxies for those methods
// 3. Registers RetryTemplate and retry advice (the aspect)
// 4. Enables method interception
Key AOP Concepts You Should Know
| Concept | What It Does | In @Retryable |
|---|---|---|
| Joinpoint | The point in code where aspect can be applied | Your @Retryable method |
| Pointcut | Selects which joinpoints to apply aspect to | @Retryable annotation |
| Aspect | The logic that gets applied (the "before/after" code) | RetryTemplate + retry loop |
| Proxy | The wrapper object that intercepts calls | PaymentService$$EnhancerBySpringCGLIB |
| Advice | The actual code that runs (around, before, after) | The try-catch-retry loop |
Benefits
1. Improved Resilience
Handles transient failures gracefully without user intervention. Network glitches, temporary service unavailability, and rate limiting are automatically recovered from.
2. Clean Code
Eliminates manual try-catch loops and retry counters. Your business logic remains focused and readable.
3. Configurable Backoff Strategies
Supports multiple backoff strategies:
- Fixed Delay: Wait the same duration between retries
- Exponential Backoff: Increase wait time exponentially (1s, 2s, 4s, 8s...)
- Random Delay: Add jitter to prevent thundering herd
4. Recovery Callbacks
Use @Recover to handle final failures gracefully with fallback logic.
5. Declarative & Consistent
Consistent retry behavior across your application with minimal configuration.
Pitfalls & Dangers
Retrying operations that modify state can cause duplicate writes, double-charging customers, or corrupted data.
// 🚨 DANGEROUS - Will charge multiple times if retried
@Retryable(value = TimeoutException.class)
public void chargeCustomer(String customerId, double amount) {
paymentService.charge(customerId, amount);
// If this times out and retries, customer charged twice!
}
Multiple clients retrying in sync causes thundering herd problem. All requests hit your service at the same time, making it worse.
// 🚨 BAD - All clients retry at 1s, 2s, 4s (synchronized)
@Retryable(backoff = @Backoff(delay = 1000, multiplier = 2.0))
public String callService() { }
// ✅ GOOD - Add jitter to randomize retry times
@Retryable(backoff = @Backoff(
delay = 1000,
multiplier = 2.0,
maxDelay = 30000,
random = true // Add randomness
))
public String callService() { }
Some errors are permanent (401 Unauthorized, 403 Forbidden, validation errors). Retrying them is waste of time and resources.
// 🚨 BAD - Retries 401 Unauthorized 3 times (pointless)
@Retryable(value = HttpClientErrorException.class)
public String callApi() {
return restTemplate.getForObject("https://api.example.com/data", String.class);
}
// ✅ GOOD - Only retry transient errors
@Retryable(
value = {HttpServerErrorException.class, TimeoutException.class},
exclude = {HttpClientErrorException.class}
)
public String callApi() { }
If service A retries → calls service B → service B retries → calls service C with retries, you get exponential retry storms.
Example: 3 retries × 3 retries × 3 retries = 27 actual attempts from a single request!
Without proper exception handling or max attempts, retries can loop forever, hanging threads and exhausting resources.
Without logging/metrics on retries, you won't know if your system is constantly failing and recovering in the background.
Cost of Usage
CPU & Memory Cost
- Minimal overhead: @Retryable uses AOP (aspect-oriented programming) which adds ~1-2% CPU per method call
- Thread blocking: During backoff delays, threads are blocked (use reactive code if this is critical)
- Increased latency: With 3 retries and exponential backoff, worst case is 1 + 2 + 4 = 7 seconds added latency
Resource Cost
- Connection timeouts: Each retry consumes database/network connections during backoff
- Cascading failures: Can multiply load on failing services by retry factor
- Log volume: Each retry generates logs, potentially doubling/tripling log volume
Business Cost
- Degraded UX: 7-second request latency frustrates users
- Failed operations: If all retries fail, user still gets error (but after waiting)
- Rate limiting: Your retries might trigger rate limits on downstream APIs
| Configuration | Max Latency (Best Case) | Max Latency (Worst Case) | Cost Assessment |
|---|---|---|---|
| 3 attempts, 1s fixed delay | ~100ms | 2-3 seconds | Acceptable for most APIs |
| 3 attempts, exponential backoff (1s, 2s, 4s) | ~100ms | 7-8 seconds | Too slow for user-facing APIs |
| 5 attempts, exponential backoff | ~100ms | 31+ seconds | Unacceptable |
Senior Architecture Thinking
Principle #1: Separate Concerns by Failure Domain
Not all services fail the same way. Your retry strategy should differ:
// For flaky internal services: Be aggressive
@Retryable(
value = {SocketTimeoutException.class},
maxAttempts = 3,
backoff = @Backoff(delay = 100)
)
public User getUserFromInternalCache() { }
// For external APIs: Be conservative
@Retryable(
value = {HttpServerErrorException.class},
maxAttempts = 2,
backoff = @Backoff(delay = 500)
)
public String callThirdPartyAPI() { }
// For databases: No retries (let connection pool handle it)
public User getUserFromDatabase() { }
Principle #2: Use Circuit Breakers, Not Just Retries
Retries alone don't protect against systemic failures. Combine with circuit breaker pattern (e.g., Resilience4j):
@Retryable(maxAttempts = 2)
@CircuitBreaker(name = "paymentAPI") // Stops retries if API is down
public void processPayment(Order order) {
paymentGateway.charge(order);
}
Why? If a service is completely down, retrying just wastes time and resources. Circuit breaker detects this and fast-fails after a threshold.
Principle #3: Idempotency is Prerequisite
Before adding @Retryable to ANY method, ask: "Can I safely call this 3 times?"
- Idempotent (safe to retry): GET requests, read operations, queries with unique IDs
- NOT idempotent (dangerous to retry): POST payments, CREATE orders, DELETE operations
Principle #4: Let Upstream Handle Retries
If your service calls another service, don't retry both. One should handle it:
// Pattern A: Service A retries calling Service B
@Retryable // Service A handles failure
public void callServiceB() {
serviceB.process(); // Service B doesn't retry (simple)
}
// Pattern B: Service B retries internally
public void callServiceB() {
serviceB.process(); // Service B has @Retryable internally
}
// ❌ Anti-pattern: Both retry (creates exponential storms)
@Retryable
public void callServiceB() {
@Retryable
serviceB.process();
}
Principle #5: Observability is Non-Negotiable
You must know when and why retries happen. Add logging and metrics:
@Retryable(
value = TimeoutException.class,
maxAttempts = 3
)
@Recover
public String callAPI() throws TimeoutException {
log.info("Attempting to call external API");
return restTemplate.getForObject("...", String.class);
}
@Recover // Called after all retries fail
public String recover(TimeoutException e) {
log.error("All retries exhausted for API call", e);
metrics.increment("api.call.failed");
return "fallback-response";
}
Best Practices
1. Keep Retry Logic Minimal
Only retry at the exact point of failure, not at high-level business logic:
// ✅ GOOD - Retry where the real failure can happen
@Service
public class PaymentService {
@Retryable(value = TimeoutException.class)
public PaymentResponse callPaymentGateway(Order order) {
return gateway.charge(order); // Only this fails
}
public void processOrder(Order order) {
PaymentResponse response = callPaymentGateway(order);
// Rest of business logic (no retries here)
}
}
2. Use Appropriate Backoff Strategies
// For external APIs: Exponential backoff with jitter
@Retryable(backoff = @Backoff(
delay = 1000,
multiplier = 2.0,
maxDelay = 10000,
random = true
))
// For internal services: Fixed short delay
@Retryable(backoff = @Backoff(delay = 100))
// For batch jobs: Long exponential backoff
@Retryable(backoff = @Backoff(
delay = 5000,
multiplier = 3.0,
maxDelay = 60000,
random = true
))
3. Be Specific About Exceptions
Only retry exceptions that MAY be transient:
// ✅ GOOD
@Retryable(value = {
SocketTimeoutException.class,
HttpServerErrorException.class, // 5xx errors
ServiceUnavailableException.class
})
// ❌ BAD
@Retryable(value = Exception.class) // Retries EVERYTHING including bugs!
4. Always Provide Recovery Logic
Never let retries fail silently:
@Retryable(maxAttempts = 3)
public String fetchUserData(String userId) {
return userService.getUser(userId);
}
@Recover
public String recoverFetchUserData(Throwable e, String userId) {
log.error("Failed to fetch user {} after retries", userId);
// Option 1: Return cached data
return cache.getOrDefault(userId, "{}");
// Option 2: Return sensible default
// return buildDefaultUser(userId);
// Option 3: Rethrow with better context
// throw new UserServiceException("Could not fetch user", e);
}
5. Test Retry Behavior Explicitly
@Test
public void testRetryOnTimeout() {
when(externalApi.call())
.thenThrow(new TimeoutException())
.thenThrow(new TimeoutException())
.thenReturn("success");
String result = service.callWithRetry();
assertEquals("success", result);
verify(externalApi, times(3)).call(); // Verify it retried exactly
}
Guardrails
More than 3 retries means 7+ seconds of waiting. That's unacceptable UX. Instead, implement async processing or queuing.
Require idempotency keys (unique request IDs) before retrying any POST/PUT/DELETE:
// Require idempotency key
@Retryable
public void transferMoney(TransferRequest request) {
// Service must track request.idempotencyKey to prevent duplicates
}
Set up alerts if retry rate exceeds threshold:
// If retries > 5% of calls, something is wrong
if (metrics.getRetryRate() > 0.05) {
alerts.critical("High retry rate detected");
}
Never retry without a timeout. Set request timeout shorter than total retry time:
// Timeout: 2s, Retry 3x with 1s delay = max 5 seconds total
@Retryable(maxAttempts = 3, backoff = @Backoff(delay = 1000))
@Timeout(2000) // Each attempt times out in 2 seconds
public String callApi() { }
Add comments explaining why this specific method needs retries:
/**
* Calls Stripe API to process payment.
* Retried because Stripe occasionally returns 5xx during peak load.
* Uses exponential backoff to avoid overwhelming their service.
* Requires idempotency key to prevent duplicate charges.
*/
@Retryable(value = {HttpServerErrorException.class}, maxAttempts = 3)
public StripeResponse processPayment(PaymentRequest req) { }
Real-World Examples
Example 1: Calling External Payment API
@Service
public class PaymentService {
@Retryable(
value = {HttpServerErrorException.class}, // 5xx errors only
maxAttempts = 2, // Don't retry too much (UX)
backoff = @Backoff(delay = 500)
)
public PaymentResponse charge(Order order) {
try {
return stripeClient.charge(order.getId(), order.getAmount());
} catch (HttpServerErrorException e) {
log.warn("Stripe returned 5xx, will retry", e);
throw e;
}
}
@Recover
public PaymentResponse chargeRecover(Throwable e, Order order) {
log.error("Payment failed for order {} after retries", order.getId(), e);
// Fallback: Mark as pending and retry later via async job
pendingPayments.add(order);
return new PaymentResponse("pending", order.getId());
}
}
Example 2: Database Connection Retrieval
@Service
public class UserRepository {
@Retryable(
value = {CannotGetJdbcConnectionException.class},
maxAttempts = 3,
backoff = @Backoff(delay = 100) // Short delay (DB pool recovering)
)
public User findById(Long userId) {
return jdbcTemplate.queryForObject(
"SELECT * FROM users WHERE id = ?",
new Object[]{userId},
userRowMapper
);
}
@Recover
public User findByIdRecover(Throwable e, Long userId) {
log.error("Database connection failed for user {}", userId);
// Cache miss is acceptable for reads
return cache.getIfPresent(userId);
}
}
Example 3: Message Queue Publishing (Idempotent)
@Service
public class EventPublisher {
// Safe to retry because message IDs ensure deduplication
@Retryable(
value = {BrokerNotAvailableException.class},
maxAttempts = 5,
backoff = @Backoff(delay = 1000, multiplier = 2.0)
)
public void publishEvent(DomainEvent event) {
// Event has unique messageId for idempotency
messageBroker.publish(event.getMessageId(), event);
}
@Recover
public void publishEventRecover(Throwable e, DomainEvent event) {
log.error("Failed to publish event {}, will be retried by async processor", event.getMessageId());
deadLetterQueue.add(event);
}
}