Spring Boot microservices with Resilience4j Circuit Breakers: Circuit Breakers
The most surprising thing about circuit breakers is that they don’t actually fix anything; they just prevent things from getting worse.
Imagine a microservice that calls another service to fetch user data. If the downstream user service is slow or unavailable, our microservice could get bogged down, exhausting its own resources trying to make calls that will inevitably fail. This is where a circuit breaker comes in.
Let’s say we have a UserService that calls a UserClient (a Feign client or similar) to get user details. We want to wrap the call to UserClient.getUserById(userId) with a Resilience4j circuit breaker.
First, add the Resilience4j dependency:
<dependency>
<groupId>io.github.resilience4j</groupId>
<artifactId>resilience4j-spring-boot2</artifactId>
<version>1.7.0</version>
</dependency>
<dependency>
<groupId>io.github.resilience4j</groupId>
<artifactId>resilience4j-circuitbreaker</artifactId>
<version>1.7.0</version>
</dependency>
Now, configure the circuit breaker in application.yml:
resilience4j.circuitbreaker:
instances:
userServiceClient:
registerHealthIndicator: true
slidingWindowType: COUNT_BASED
slidingWindowSize: 10
failureRateThreshold: 50
slowCallRateThreshold: 100
slowCallDuration: 5s
permittedNumberOfCallsInHalfOpenState: 5
automaticTransitionFromHalfOpenToClosedEnabled: true
waitDurationInOpenState: 30s
recordExceptions:
- org.springframework.web.client.HttpServerErrorException
- java.util.concurrent.TimeoutException
- java.io.IOException
This configuration defines a circuit breaker named userServiceClient.
slidingWindowType: COUNT_BASED: We’ll track events over a fixed number of calls.slidingWindowSize: 10: The window will consider the last 10 calls.failureRateThreshold: 50: If 50% of calls in the window fail, the circuit opens. So, if 5 out of the last 10 calls fail, it opens.slowCallRateThreshold: 100: If 100% of calls are slow, the circuit opens.slowCallDuration: 5s: A call is considered slow if it takes longer than 5 seconds.permittedNumberOfCallsInHalfOpenState: 5: After the circuit opens, it stays open forwaitDurationInOpenState. Then, it transitions to half-open and allows 5 calls through.automaticTransitionFromHalfOpenToClosedEnabled: true: If all 5 calls in the half-open state succeed, it closes. Otherwise, it opens again.waitDurationInOpenState: 30s: The circuit stays open for 30 seconds.recordExceptions: These are the exceptions that will be treated as failures, causing the circuit breaker to trip.
Now, let’s apply this to our UserService:
import io.github.resilience4j.circuitbreaker.annotation.CircuitBreaker;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.stereotype.Service;
@Service
public class UserService {
private final UserClient userClient;
@Autowired
public UserService(UserClient userClient) {
this.userClient = userClient;
}
@CircuitBreaker(name = "userServiceClient", fallbackMethod = "getDefaultUser")
public User getUserById(Long userId) {
return userClient.getUserById(userId);
}
public User getDefaultUser(Long userId, Throwable t) {
System.err.println("Circuit breaker is open or user service is unavailable. Returning default user for ID: " + userId);
return new User("Default User", "default@example.com"); // Return a default or cached user
}
}
The @CircuitBreaker(name = "userServiceClient", fallbackMethod = "getDefaultUser") annotation tells Resilience4j to wrap the getUserById method with the circuit breaker configuration named userServiceClient. If the circuit breaker trips (opens), or if getUserById throws an exception that is configured to be recorded as a failure (like HttpServerErrorException or TimeoutException), the getDefaultUser method will be invoked instead.
Here’s how it works mechanically:
- CLOSED State: The circuit breaker allows calls to pass through to the
UserClient. It monitors the success/failure of these calls and their duration. - OPEN State: If the failure rate or slow call rate exceeds the configured thresholds within the sliding window, the circuit breaker "trips" and enters the OPEN state. In this state, it immediately rejects all further calls to
getUserByIdwithout even attempting to execute them. Instead, it directly calls thefallbackMethod(getDefaultUser). This prevents the downstream service from being overwhelmed and protects our own service from cascading failures. - HALF-OPEN State: After a
waitDurationInOpenState(30 seconds in our example), the circuit breaker transitions to the HALF-OPEN state. It allows a limited number of calls (permittedNumberOfCallsInHalfOpenState, which is 5) to pass through.- If all these calls succeed, the circuit breaker assumes the downstream service has recovered and transitions back to the CLOSED state.
- If any of these calls fail, the circuit breaker immediately returns to the OPEN state, starting the
waitDurationInOpenStatetimer again.
The fallbackMethod is crucial. It provides a graceful degradation path when the primary service is unavailable. This could involve returning cached data, returning a default object, or performing an alternative, less resource-intensive operation.
This setup is particularly effective when dealing with external dependencies that are prone to transient failures or performance degradation. By preventing repeated calls to a failing service, circuit breakers significantly improve the overall resilience and stability of your microservices architecture.
The next thing you’ll likely encounter when using circuit breakers is managing their state and monitoring their behavior in production.