Skip to content

CONCEPT Cited by 1 source

Byte-size pagination

Byte-size pagination is the API-design choice of bounding a paged response by total byte size rather than by number of items. The motivation is predictable latency SLOs: when items are variable in size, capping by count makes page latency a function of what the caller happens to land — capping by bytes makes it a property of the page itself.

The Netflix anchor

Netflix's KV DAL uses bytes-per-page as the pagination limit and cites the SLO reason directly:

"We chose payload size in bytes as the limit per response page rather than the number of items because it allows us to provide predictable operation SLOs. For instance, we can provide a single-digit millisecond SLO on a 2 MiB page read. Conversely, using the number of items per page as the limit would result in unpredictable latencies due to significant variations in item size." (Source: sources/2024-09-19-netflix-netflixs-key-value-data-abstraction-layer)

A 10-item page of 1 KiB items and a 10-item page of 1 MiB items are the same API-surface request but ~1000× different in deserialization / network / downstream work — the caller has no way to bound their page-handling time without knowing row sizes.

The implementation friction

Most backing stores (Cassandra, DynamoDB, etc.) paginate by row count, not bytes. KV DAL translates a caller's byte budget into backing-store queries using a static row-count limit, then either:

  • Over-fetches (items smaller than expected) → multiple backing-store queries to reach the byte budget → read amplification.
  • Under-fetches discarded (items larger than expected) → throws away rows it pulled → same read amplification.

This is the precise problem concepts/adaptive-pagination addresses: use observed item size to tune the underlying-store limit adaptively.

Trade-offs

  • Predictable response-size SLO. The page is bounded in bytes, so downstream code can plan buffers, network budgets, and GC.
  • Implementation complexity. Adds a server-side layer of size-estimation + loop / backfill logic. (Adaptive pagination is the standard partner.)
  • Partial-page pagination becomes normal. Callers can't pre-compute "total pages" — and shouldn't try to.
  • Server has to serialize enough to know the size. There's no way to byte-count cheaply without doing most of the work. But if the server stops early (see patterns/slo-aware-early-response), this is fine — work done so far gets returned.
  • Doesn't solve agent-context token budgets directly. Agent tools whose callers are LLMs need to bound by tokens, not bytes — see patterns/token-budget-pagination. Byte-size pagination and token-budget pagination are siblings: both reject row-count limits, but differ on the scarce-resource axis (bytes vs tokens, which are not linearly related).

Cross-pattern placement

Scarce resource Limit unit Canonical instance
End-to-end request latency at variable row size bytes Netflix KV DAL (this concept)
Agent token-context budget tokens Datadog MCP server — patterns/token-budget-pagination
Linked-list / log traversal records Classic ?limit=N pagination (what Netflix + Datadog both reject)

Seen in

Last updated · 319 distilled / 1,201 read