PATTERN Cited by 2 sources
Partial return on SLO breach¶
Definition¶
Partial return on SLO breach is the server-side pattern of aborting an in-flight read mid-execution when the request has already breached its configured latency SLO and returning whatever data has been collected up to that point with a continuation token, rather than continuing to grind toward a complete answer. The trade is latency over completeness: the caller gets a response within their deadline, knowing the response may be incomplete, and can optionally fetch more on a follow-up.
This pattern is the TimeSeries-side variant of the broader SLO-aware early response pattern (canonicalised on the wiki via the 2024-09-19 Netflix KV DAL post). The 2026-06-03 TimeSeries dynamic-partition-splitting post (Source: sources/2026-06-03-netflix-dynamically-splitting-wide-partitions-in-cassandra-for-time-series-workloads) names Partial Returns as a mid-stack remedy for wide partitions that don't qualify for dynamic splitting.
Mechanism¶
"We implemented a 'Partial Return' feature, which aborts an inflight request if it has breached a configured latency SLO, while returning whatever data it has collected up until that point. This is a great option for clients who care more about latency than fetching all the data."
The shape:
- Client sends a query with implicit or explicit latency budget.
- Server starts processing.
- Server tracks elapsed time and partial result accumulation.
- If
elapsed > SLO, server aborts the underlying storage read, packages the accumulated result, returns it to the client with a flag indicating the response is partial (and optionally a continuation token). - Client decides: accept the partial result, or call again to continue.
In Netflix's TimeSeries shape this composes with bucketed partitioning — when a single read fans out across multiple partitions, the server can return data from the partitions that completed before the SLO and continue with the rest on a follow-up call.
Why this is a remedy for wide partitions¶
When dynamic partition splitting hasn't (yet) caught up with a particular wide partition, reads against it produce seconds-scale tail latency. Three response options:
| Option | Trade |
|---|---|
| Wait for the read to complete | Client times out at the RPC layer; all server work is wasted; client has no partial data |
| Return partial data within SLO | Client gets bounded-latency answer; some data may be missing |
| Fail-fast with no data | Client retries from scratch; no progress; high amplification |
Partial return is the middle option. The client decides what to do with the incompleteness based on its own latency / completeness trade-off:
- Latency-prioritising clients (interactive UI, realtime decisions): take the partial answer and ignore the missing bit.
- Completeness-prioritising clients (batch aggregations, analytics, audit queries): use the continuation token to fetch the remainder over multiple round-trips.
Where it sits in the wide-partition response hierarchy¶
The 2026-06-03 post enumerates the full hierarchy of remedies:
| Remedy | When |
|---|---|
| Do nothing | App-level metrics aren't impacted by the wide partition |
| Partial return on SLO breach (this pattern) | Clients prefer latency over completeness |
| Block adversarial IDs | Spam / test / known-bad IDs the system should refuse |
| Dynamic partition splitting | Valid + important IDs that legitimately need lots of events |
Partial return is the runtime, per-request, no-data-movement option — when neither doing nothing, blocking, nor splitting fits the situation.
Trade-offs¶
| Pro | Con |
|---|---|
| Bounded-latency response under any circumstance | Partial responses must be representable in the API |
| Server-side decision avoids client-side timeout amplification | Clients must be able to handle partial responses (API + downstream consumers) |
| Composable with continuation-token pagination | Continuation token must encode enough state to resume |
| Reduces work-amplification on retried-after-timeout reads | Some clients (audit, analytics) genuinely need completeness — partial is wrong for them |
| Composes with dynamic partition splitting (covers the gap before splits catch up) | Detection of "SLO at risk" must be cheap (mid-flight server timing) |
| Applies to extreme partitions (post says 500 MB+ partitions paginated successfully via this mechanism) | Partial is less safe if the missing data is the part the client cared about |
Caveats¶
- Partial-vs-complete signalling must be explicit. A client that mistakes a partial response for a complete one will silently use incomplete data — see concepts/availability-vs-data-loss-tradeoff.
- SLO must be set per request, not just at the server. Different clients have different latency budgets; the server should ideally accept a per-request SLO.
- Continuation token semantics are non-trivial. Resuming partial reads requires the token to encode partition position, ordering, and any in-flight state.
- Not appropriate for write paths. The same problem exists for slow writes, but the mid-flight abort semantics are very different (idempotency, partial commits, etc.).
Seen in¶
- sources/2026-06-03-netflix-dynamically-splitting-wide-partitions-in-cassandra-for-time-series-workloads — Canonical wiki home for the wide-partition-remedy framing. Named explicitly as one of the mid-stack options for handling wide partitions when dynamic splitting isn't applicable (e.g. mutable partitions, ultra-extreme outliers). The 500 MB+ partition pagination example demonstrates partial returns trading 41 seconds of latency for availability — the alternative would be timing out and producing no result at all.
- sources/2024-09-19-netflix-netflixs-key-value-data-abstraction-layer — Earlier sibling: KV DAL's patterns/slo-aware-early-response applies the same shape at the byte-pagination boundary (returning the page filled so far when the server projects it will miss the SLO). Same primitive, different read API.
Related¶
- patterns/slo-aware-early-response — sibling at the KV DAL byte-pagination altitude.
- concepts/wide-partition-problem — the failure mode this remedy mid-stacks against.
- concepts/tail-latency-at-scale — the broader latency concept this pattern responds to.
- concepts/availability-vs-data-loss-tradeoff — the deeper trade-off (returning partial vs nothing).
- systems/netflix-timeseries-abstraction — the canonical instance.
- systems/netflix-kv-dal — earlier sibling instance of the same shape.