Resilience4j’s circuit breaker doesn’t just stop calls to a failing service; it actively learns from failures to decide when to let calls through again.

Let’s see it in action. Imagine a ProductServiceClient that calls an external product-service. We want to protect this call.

// Original service call
String productDetails = productServiceClient.getProductDetails(productId);

// With Resilience4j Circuit Breaker
String productDetails = circuitBreaker.executeCallable(() -> productServiceClient.getProductDetails(productId));

The circuitBreaker.executeCallable() is where the magic happens. Resilience4j intercepts the call. If the productServiceClient.getProductDetails(productId) throws an exception (like a ConnectTimeoutException or HttpStatus.INTERNAL_SERVER_ERROR), the circuit breaker counts it as a failure.

Here’s how the core components work together:

  • CLOSED State: Initially, the circuit breaker is closed, allowing calls to pass through to the product-service. If calls succeed, the failure rate stays low. If calls fail, the failure rate increases.
  • OPEN State: Once the failure rate exceeds a configured threshold (e.g., 50% of calls fail within a rolling window), the circuit breaker snaps open. All subsequent calls to product-service immediately fail with a CallNotPermittedException, without actually attempting to call the service. This gives the failing service time to recover.
  • HALF-OPEN State: After a configured timeout period in the OPEN state (e.g., 30 seconds), the circuit breaker transitions to HALF-OPEN. It allows a single call to the product-service. If this call succeeds, the circuit breaker assumes the service has recovered and transitions back to CLOSED. If it fails, it snaps back to OPEN, resetting the timeout.

This state machine is configured with parameters that define its behavior. A typical configuration might look like this in your application.yml:

resilience4j.circuitbreaker:
  instances:
    productService:
      registerHealthIndicator: true
      slidingWindowType: COUNT_BASED
      slidingWindowSize: 100
      minimumNumberOfCalls: 10
      waitDurationInOpenState: 5s
      failureRateThreshold: 50
      automaticTransitionFromOpenToHalfOpenEnabled: true
      permittedNumberOfCallsInHalfOpenState: 5
      recordExceptions:
        - org.springframework.web.client.HttpServerErrorException
        - java.util.concurrent.TimeoutException
        - java.io.IOException

Let’s break down what these mean for our productService circuit breaker:

  • slidingWindowType: COUNT_BASED: The circuit breaker tracks successes and failures over the last slidingWindowSize calls. TIME_BASED tracks over a duration.
  • slidingWindowSize: 100: The window will consider the last 100 calls.
  • minimumNumberOfCalls: 10: At least 10 calls must occur within the window before a decision (opening the circuit) can be made. This prevents a few early failures from prematurely tripping the breaker.
  • waitDurationInOpenState: 5s: If the circuit opens, it will stay open for at least 5 seconds before considering a transition to HALF-OPEN.
  • failureRateThreshold: 50: If 50% or more of the calls within the window are failures, the circuit breaker will open.
  • automaticTransitionFromOpenToHalfOpenEnabled: true: The circuit breaker will automatically move to HALF-OPEN after waitDurationInOpenState. If false, you’d need to manually reset it.
  • permittedNumberOfCallsInHalfOpenState: 5: When in HALF-OPEN, up to 5 calls are allowed to test the service. If any of these 5 fail, it goes back to OPEN.
  • recordExceptions: This is crucial. It specifies which exceptions should be treated as failures. By default, only RuntimeException and Error are considered failures. Here, we explicitly add HTTP server errors, timeouts, and IO errors.

When a CallNotPermittedException is thrown because the circuit is OPEN, you can catch this specific exception and provide an alternative response, like a cached result or a default value, instead of letting the downstream service be hammered.

try {
    String productDetails = circuitBreaker.executeCallable(() -> productServiceClient.getProductDetails(productId));
    // Process productDetails
} catch (CallNotPermittedException e) {
    // Circuit is open, provide fallback response
    log.warn("Product service is unavailable, circuit breaker is open.");
    return getFallbackProductDetails(productId);
}

The recordExceptions list is critical because it defines what constitutes a "failure." If an exception occurs that isn’t in this list (and isn’t a RuntimeException or Error by default), the circuit breaker won’t count it as a failure, and the breaker might not open even if the service is having issues that don’t manifest as these specific exceptions.

The most surprising thing about the circuit breaker’s HALF-OPEN state is how it can be configured to allow multiple calls, not just one. This is controlled by permittedNumberOfCallsInHalfOpenState. If this number is greater than 1, the breaker will allow that many calls through. If any of those calls fail, the breaker immediately returns to the OPEN state, and the waitDurationInOpenState timer restarts. This allows for a more robust test of recovery without risking overwhelming the service if it’s still unstable.

After implementing circuit breakers, the next common challenge is managing distributed tracing across these protected calls.

Want structured learning?

Take the full Spring-boot course →