CONCEPT Cited by 1 source
Time budget sharing¶
Definition¶
Time budget sharing is the chained-call timeout strategy where a caller's SLA is divided among its sequential downstream calls, so that per-call timeouts sum to at most the caller's own deadline. It guarantees SLA compliance by construction at the cost of under-sizing each per-call budget relative to its downstream's observed latency tail.
Zalando's timeouts post uses the canonical example:
"Imagine your service has SLA 1000 ms and it calls sequentially Order Service with p99.9 = 700 ms and then Payment Service with p99.9 = 700 ms. How to configure timeout and not breach the SLA?
Option 1: Share your time budget. One option would be to share your time budget (your SLA) between services and set timeouts accordingly 500 ms for Order Service and 500 ms for Payment Service. In this case, you have a guarantee that you will not breach your SLA but you might have some false positive timeouts." (Source: sources/2023-07-25-zalando-all-you-need-to-know-about-timeouts)
The trade-off¶
Time budget sharing guarantees SLA compliance: the caller literally cannot breach its SLA because the sum of per-call budgets is the SLA.
The cost is a false-timeout rate above the baseline: each per-call budget is lower than the downstream's p99.9, so the caller will timeout on the downstream's natural tail more frequently than the downstream itself reports as "slow." Concretely, in the Zalando example, 500 ms vs. 700 ms p99.9 means far more than 0.1% of calls tail past the cap.
Contrast: time-limiter wrap¶
The alternative resolution in the Zalando post is patterns/time-limiter-wrapping-chained-calls: leave each per-call timeout at p99.9 (or above) and wrap the whole chain in an outer time limiter equal to the SLA. This exploits the observation that both downstreams rarely tail simultaneously. Fewer per-call false timeouts; still SLA- safe; but enforcement relies on the outer wrapper firing when the aggregate budget is blown.
| Strategy | Per-call budget | SLA guarantee | False-timeout rate |
|---|---|---|---|
| Time budget sharing | < downstream p99.9 | Hard | Elevated |
| Time-limiter wrap | ≥ downstream p99.9 | Hard (via outer limit) | Near p99.9 baseline |
When budget sharing is the right choice¶
- Downstream latencies sum to more than the caller's SLA even at p50 — there is no slack; the only correct choice is to declare budgets explicitly.
- Downstream tails are correlated (shared infrastructure, global resource pressure) so the time-limiter bet doesn't pay off.
- Downstream tails are fat enough that the time-limiter approach often breaches.
Seen in¶
- sources/2023-07-25-zalando-all-you-need-to-know-about-timeouts — canonical Zalando framing as Option 1 of two chained-call resolutions.
Related¶
- concepts/request-timeout — the per-call bound that budget sharing partitions.
- concepts/false-timeout-rate — the tunable whose value rises under budget sharing.
- concepts/tail-latency-at-scale — the distributional framing for why budget sharing vs. time-limiter differ in practice.
- patterns/time-limiter-wrapping-chained-calls — the Option-2 alternative in the same post.
- patterns/explicit-timeout-on-remote-calls — the house- style rule that forces the budget-sharing question in the first place.