CONCEPT Cited by 1 source

Adaptive pagination¶

Adaptive pagination tunes the page-fetch limit issued to a backing store based on observed item sizes — so that a server expressing a byte-budget SLO to its caller doesn't over- or under- fetch rows from an underlying store that only understands row-count limits.

The problem it solves¶

A server exposes byte-size pagination to its callers but the storage engine (Cassandra, DynamoDB, etc.) paginates by row count. With a static row- count limit:

Big items → fetched many rows, only a fraction fit in the byte budget, discarded excess = wasted I/O.
Small items → fetched the static count, didn't fill the byte budget, issued more queries = multi-trip latency.

Either way is read amplification and hurts both cost and tail latency.

The mechanism¶

Two feedback paths, per Netflix KV DAL's description:

Path 1: follow-up pages use the previous page's observed size¶

"As the consumer processes these results, the system tracks the number of items consumed and the total size used. This data helps calculate an approximate item size, which is stored in the page token. For subsequent page requests, this stored information allows the server to apply the appropriate limits to the underlying storage, reducing unnecessary work and minimizing read amplification." (Source: sources/2024-09-19-netflix-netflixs-key-value-data-abstraction-layer)

The page token carries the per-item size estimate forward: page N+1 knows how many rows to ask the backing store for based on what page N actually saw.

Path 2: initial pages use a per-namespace rolling estimate¶

"In addition to storing item size information in the page token, the server also estimates the average item size for a given namespace and caches it locally. This cached estimate helps the server set a more optimal limit on the backing store for the initial request, improving efficiency."

First-page requests don't have a prior page token — so KV DAL keeps a local per-namespace average and uses it. "The server continuously adjusts this limit based on recent query patterns or other factors to keep it accurate."

Why it's more than a simple tuning heuristic¶

Together the two paths close a two-timescale feedback loop:

Per-query feedback via the page token — fast, precise to this id's size distribution.
Per-namespace feedback via the local estimate — slow, catches the typical-case average for first-page requests.

This separation matters because a namespace's item-size distribution can be heavy-tailed: one hot id with outlier-big items shouldn't recalibrate the first-page estimate for the whole namespace.

Trade-offs¶

Cold start. A brand-new namespace has no estimate; the first few requests eat more read amplification while the estimate stabilizes.
Bimodal distributions — some ids small, some huge — degrade both paths' estimates. Large items still live on the chunking path which separates them out anyway.
Per-namespace adaptive state is now operational surface. If it caches stale, reads amplify; if it's reset on process bounce, adaptive behaviour resets with it.
Combined with patterns/slo-aware-early-response, adaptive pagination bounds read amplification in the typical case and early-return bounds it in the worst case — both are needed.

Seen in¶

sources/2024-09-19-netflix-netflixs-key-value-data-abstraction-layer — canonical instance with both feedback paths (page-token + per-namespace estimate) spelled out.

concepts/byte-size-pagination — the byte-budget surface that makes adaptive pagination necessary in the first place.
systems/netflix-kv-dal — canonical instance.
patterns/slo-aware-early-response — complementary worst-case bound.
systems/apache-cassandra · systems/dynamodb — row-count- limit stores under the translation layer.