Skip to content

PATTERN Cited by 2 sources

Partial return on SLO breach

Definition

Partial return on SLO breach is the server-side pattern of aborting an in-flight read mid-execution when the request has already breached its configured latency SLO and returning whatever data has been collected up to that point with a continuation token, rather than continuing to grind toward a complete answer. The trade is latency over completeness: the caller gets a response within their deadline, knowing the response may be incomplete, and can optionally fetch more on a follow-up.

This pattern is the TimeSeries-side variant of the broader SLO-aware early response pattern (canonicalised on the wiki via the 2024-09-19 Netflix KV DAL post). The 2026-06-03 TimeSeries dynamic-partition-splitting post (Source: sources/2026-06-03-netflix-dynamically-splitting-wide-partitions-in-cassandra-for-time-series-workloads) names Partial Returns as a mid-stack remedy for wide partitions that don't qualify for dynamic splitting.

Mechanism

"We implemented a 'Partial Return' feature, which aborts an inflight request if it has breached a configured latency SLO, while returning whatever data it has collected up until that point. This is a great option for clients who care more about latency than fetching all the data."

The shape:

  1. Client sends a query with implicit or explicit latency budget.
  2. Server starts processing.
  3. Server tracks elapsed time and partial result accumulation.
  4. If elapsed > SLO, server aborts the underlying storage read, packages the accumulated result, returns it to the client with a flag indicating the response is partial (and optionally a continuation token).
  5. Client decides: accept the partial result, or call again to continue.

In Netflix's TimeSeries shape this composes with bucketed partitioning — when a single read fans out across multiple partitions, the server can return data from the partitions that completed before the SLO and continue with the rest on a follow-up call.

Why this is a remedy for wide partitions

When dynamic partition splitting hasn't (yet) caught up with a particular wide partition, reads against it produce seconds-scale tail latency. Three response options:

Option Trade
Wait for the read to complete Client times out at the RPC layer; all server work is wasted; client has no partial data
Return partial data within SLO Client gets bounded-latency answer; some data may be missing
Fail-fast with no data Client retries from scratch; no progress; high amplification

Partial return is the middle option. The client decides what to do with the incompleteness based on its own latency / completeness trade-off:

  • Latency-prioritising clients (interactive UI, realtime decisions): take the partial answer and ignore the missing bit.
  • Completeness-prioritising clients (batch aggregations, analytics, audit queries): use the continuation token to fetch the remainder over multiple round-trips.

Where it sits in the wide-partition response hierarchy

The 2026-06-03 post enumerates the full hierarchy of remedies:

Remedy When
Do nothing App-level metrics aren't impacted by the wide partition
Partial return on SLO breach (this pattern) Clients prefer latency over completeness
Block adversarial IDs Spam / test / known-bad IDs the system should refuse
Dynamic partition splitting Valid + important IDs that legitimately need lots of events

Partial return is the runtime, per-request, no-data-movement option — when neither doing nothing, blocking, nor splitting fits the situation.

Trade-offs

Pro Con
Bounded-latency response under any circumstance Partial responses must be representable in the API
Server-side decision avoids client-side timeout amplification Clients must be able to handle partial responses (API + downstream consumers)
Composable with continuation-token pagination Continuation token must encode enough state to resume
Reduces work-amplification on retried-after-timeout reads Some clients (audit, analytics) genuinely need completeness — partial is wrong for them
Composes with dynamic partition splitting (covers the gap before splits catch up) Detection of "SLO at risk" must be cheap (mid-flight server timing)
Applies to extreme partitions (post says 500 MB+ partitions paginated successfully via this mechanism) Partial is less safe if the missing data is the part the client cared about

Caveats

  • Partial-vs-complete signalling must be explicit. A client that mistakes a partial response for a complete one will silently use incomplete data — see concepts/availability-vs-data-loss-tradeoff.
  • SLO must be set per request, not just at the server. Different clients have different latency budgets; the server should ideally accept a per-request SLO.
  • Continuation token semantics are non-trivial. Resuming partial reads requires the token to encode partition position, ordering, and any in-flight state.
  • Not appropriate for write paths. The same problem exists for slow writes, but the mid-flight abort semantics are very different (idempotency, partial commits, etc.).

Seen in

Last updated · 542 distilled / 1,571 read