PATTERN Cited by 2 sources

Partial return on SLO breach¶

Definition¶

Partial return on SLO breach is the server-side pattern of aborting an in-flight read mid-execution when the request has already breached its configured latency SLO and returning whatever data has been collected up to that point with a continuation token, rather than continuing to grind toward a complete answer. The trade is latency over completeness: the caller gets a response within their deadline, knowing the response may be incomplete, and can optionally fetch more on a follow-up.

This pattern is the TimeSeries-side variant of the broader SLO-aware early response pattern (canonicalised on the wiki via the 2024-09-19 Netflix KV DAL post). The 2026-06-03 TimeSeries dynamic-partition-splitting post (Source: sources/2026-06-03-netflix-dynamically-splitting-wide-partitions-in-cassandra-for-time-series-workloads) names Partial Returns as a mid-stack remedy for wide partitions that don't qualify for dynamic splitting.

Mechanism¶

"We implemented a 'Partial Return' feature, which aborts an inflight request if it has breached a configured latency SLO, while returning whatever data it has collected up until that point. This is a great option for clients who care more about latency than fetching all the data."

The shape:

Client sends a query with implicit or explicit latency budget.
Server starts processing.
Server tracks elapsed time and partial result accumulation.
If elapsed > SLO, server aborts the underlying storage read, packages the accumulated result, returns it to the client with a flag indicating the response is partial (and optionally a continuation token).
Client decides: accept the partial result, or call again to continue.

In Netflix's TimeSeries shape this composes with bucketed partitioning — when a single read fans out across multiple partitions, the server can return data from the partitions that completed before the SLO and continue with the rest on a follow-up call.

Why this is a remedy for wide partitions¶

When dynamic partition splitting hasn't (yet) caught up with a particular wide partition, reads against it produce seconds-scale tail latency. Three response options:

Option	Trade
Wait for the read to complete	Client times out at the RPC layer; all server work is wasted; client has no partial data
Return partial data within SLO	Client gets bounded-latency answer; some data may be missing
Fail-fast with no data	Client retries from scratch; no progress; high amplification

Partial return is the middle option. The client decides what to do with the incompleteness based on its own latency / completeness trade-off:

Latency-prioritising clients (interactive UI, realtime decisions): take the partial answer and ignore the missing bit.
Completeness-prioritising clients (batch aggregations, analytics, audit queries): use the continuation token to fetch the remainder over multiple round-trips.

Where it sits in the wide-partition response hierarchy¶

The 2026-06-03 post enumerates the full hierarchy of remedies:

Remedy	When
Do nothing	App-level metrics aren't impacted by the wide partition
Partial return on SLO breach (this pattern)	Clients prefer latency over completeness
Block adversarial IDs	Spam / test / known-bad IDs the system should refuse
Dynamic partition splitting	Valid + important IDs that legitimately need lots of events

Partial return is the runtime, per-request, no-data-movement option — when neither doing nothing, blocking, nor splitting fits the situation.

Trade-offs¶

Pro	Con
Bounded-latency response under any circumstance	Partial responses must be representable in the API
Server-side decision avoids client-side timeout amplification	Clients must be able to handle partial responses (API + downstream consumers)
Composable with continuation-token pagination	Continuation token must encode enough state to resume
Reduces work-amplification on retried-after-timeout reads	Some clients (audit, analytics) genuinely need completeness — partial is wrong for them
Composes with dynamic partition splitting (covers the gap before splits catch up)	Detection of "SLO at risk" must be cheap (mid-flight server timing)
Applies to extreme partitions (post says 500 MB+ partitions paginated successfully via this mechanism)	Partial is less safe if the missing data is the part the client cared about

Caveats¶

Partial-vs-complete signalling must be explicit. A client that mistakes a partial response for a complete one will silently use incomplete data — see concepts/availability-vs-data-loss-tradeoff.
SLO must be set per request, not just at the server. Different clients have different latency budgets; the server should ideally accept a per-request SLO.
Continuation token semantics are non-trivial. Resuming partial reads requires the token to encode partition position, ordering, and any in-flight state.
Not appropriate for write paths. The same problem exists for slow writes, but the mid-flight abort semantics are very different (idempotency, partial commits, etc.).

Seen in¶

sources/2026-06-03-netflix-dynamically-splitting-wide-partitions-in-cassandra-for-time-series-workloads — Canonical wiki home for the wide-partition-remedy framing. Named explicitly as one of the mid-stack options for handling wide partitions when dynamic splitting isn't applicable (e.g. mutable partitions, ultra-extreme outliers). The 500 MB+ partition pagination example demonstrates partial returns trading 41 seconds of latency for availability — the alternative would be timing out and producing no result at all.
sources/2024-09-19-netflix-netflixs-key-value-data-abstraction-layer — Earlier sibling: KV DAL's patterns/slo-aware-early-response applies the same shape at the byte-pagination boundary (returning the page filled so far when the server projects it will miss the SLO). Same primitive, different read API.

patterns/slo-aware-early-response — sibling at the KV DAL byte-pagination altitude.
concepts/wide-partition-problem — the failure mode this remedy mid-stacks against.
concepts/tail-latency-at-scale — the broader latency concept this pattern responds to.
concepts/availability-vs-data-loss-tradeoff — the deeper trade-off (returning partial vs nothing).
systems/netflix-timeseries-abstraction — the canonical instance.
systems/netflix-kv-dal — earlier sibling instance of the same shape.