PATTERN

Time-limiter wrapping chained calls¶

Pattern¶

When a caller with its own SLA fans out to two or more sequential downstreams, set each per-call timeout to the downstream's p99.9 and wrap the whole chain in an outer time limiter equal to the caller's SLA. The outer limiter enforces the SLA; each per-call timeout keeps false timeouts near the natural tail rate.

Zalando's canonical example:

"Option 2: Introduce a TimeLimiter for your API. Since different services will not simultaneously respond with the maximum delay, you can wrap the chained calls in a time limiter and set the maximum acceptable timeout for both services. In this case you could create a time limiter 1 sec and set a timeout 700 ms for downstream services." (Source: )

Shape¶

Caller SLA: 1000 ms

  ┌──── outer time limiter: 1000 ms ─────┐
  │                                      │
  │   Order Service        Payment       │
  │   request-timeout      request-      │
  │   = 700 ms (p99.9)     timeout       │
  │                        = 700 ms      │
  │                        (p99.9)       │
  └──────────────────────────────────────┘

Per-call timeouts are each sized to the downstream's actual p99.9 (full headroom for each call's tail). The outer limiter protects the caller's SLA only if both downstreams tail simultaneously.

The statistical bet¶

The pattern pays off because downstream tails are not fully correlated. Even if each downstream's p99.9 is 700 ms, P(both tail > 500 ms simultaneously) is much less than 0.1% for independent tails — so the 1000 ms outer limiter rarely fires. In the Zalando worked example, budget sharing (500 + 500) produces up-front false timeouts against each downstream's p99.9, while the time-limiter wrap produces false timeouts only when both tail at once.

If tails are correlated (shared backend, shared infrastructure, global traffic spike), the bet degrades and time budget sharing becomes safer because its hard per-call caps guarantee SLA compliance.

Strategy	Per-call timeout	SLA enforcement	False-timeout rate
Budget sharing	< p99.9	Per-call caps	Elevated per-call
Time-limiter wrap	≥ p99.9	Outer wrapper	Near p99.9 baseline

Both guarantee the caller's SLA. The time-limiter variant produces fewer per-call timeouts; budget sharing is more defensive against correlated downstream tails.

Java implementation¶

The Zalando post cites two JVM implementations:

1. CompletableFuture.orTimeout:

CompletableFuture
    .supplyAsync(() -> orderService.placeOrder(...))
    .thenApply(order -> paymentService.updateBalance(...))
    .orTimeout(1, TimeUnit.SECONDS);

The outer .orTimeout(1, TimeUnit.SECONDS) is the time limiter; each service's own request-timeout is set to p99.9 on its client configuration.

2. Resilience4j TimeLimiter:

TimeLimiter timeLimiter = TimeLimiter.of(Duration.ofSeconds(1));
timeLimiter.executeFutureSupplier(() -> callChain());

Resilience4j composes cleanly with its CircuitBreaker, Retry, and Bulkhead modules, so the outer time limiter can be one layer in a consistent resilience stack.

Trade-offs¶

Better: near-zero per-call false timeouts, SLA still enforced.
Worse: when the outer limiter fires, the caller has held thread-seconds for the full SLA duration (not bounded by per-call caps). Under sustained downstream degradation this is more load than budget sharing.
Worse: debugging a time-limiter timeout requires knowing which per-call segment was slow. Per-call metrics / tracing are essential.

concepts/request-timeout — the per-call bound this pattern sets to p99.9.
concepts/time-budget-sharing — the alternative resolution.
concepts/false-timeout-rate — the metric this pattern keeps near the natural tail.
patterns/explicit-timeout-on-remote-calls — the house- style rule both time-limiter and budget-sharing implement.
systems/resilience4j / systems/java-completablefuture — JVM implementations.

Seen in¶

— canonical wiki home. Zalando's Option 2 for chained-call SLA budgeting.