PATTERN

Zero-allocation cache payload¶

Problem¶

On a low-latency JVM serving tier, GC pauses directly become tail latency. A 50 ms young-generation pause on a Netty NIO thread freezes every channel that thread owns. The dominant source of young-gen pressure at cache-heavy serving tiers is the cached value's object graph and the transient allocations created while assembling a response — decoded JSON trees, map/list wrappers, per-request byte-buffer copies, gunzip-then-regzip pipelines.

On Zalando's PRAPI (1,000-line product JSON, single-digit-ms P99 target), these allocations are the delta between "blazingly fast" and "ordinary fast."

Pattern¶

Apply the principle the post extracts:

"The best way to eliminate GC pauses is to avoid object allocation altogether." (Source: .)

Two concrete tactics, both from PRAPI:

1. Cache single items as `ByteArray`, not `ObjectNode`¶

Instead of deserialising into a parsed object graph (com.fasterxml.jackson.databind.node.ObjectNode), keep the value as a raw byte array matching the on-wire bytes of the response:

One allocation per cache entry instead of a tree of N nodes.
Zero deserialisation cost on cache read.
Zero re-serialisation cost on response (write the bytes straight into the response buffer).

Trade-off: you lose ergonomic in-process transformation of the value. The serving-tier code operates on opaque bytes, so format transforms have to happen at a different layer.

2. Store pre-gzipped chunks in Okio buffers, concatenate for batch¶

For batch responses, instead of:

Decode each chunk from its gzipped form;
Merge them into one JSON array;
Re-gzip the result;

…keep each item already gzipped in an Okio Buffer. Concatenating Okio buffers is segment-pointer assignment, not byte copy. Gzip is self-concatenating — gzip(A) + gzip(B) is a valid gzip stream decodable as A + B. So the batch response is:

Iterate the requested item IDs.
For each, look up the already-gzipped chunk.
Append its Okio buffer to the response's Okio buffer (pointer, not copy).
Write the combined buffer straight to the wire.

Zero gunzip, zero re-gzip, zero intermediate ObjectNode/ArrayList allocation. The per-batch work is linear in item count at pointer-assignment cost.

Why this matters specifically on Netty¶

Netty's EventLoop threads serve many channels per thread. A GC pause on an EventLoop thread stalls every channel that thread owns. The two disciplines combine:

No blocking tasks on EventLoop — prescribed directly by the Netty model; PRAPI enforces it with JFR
JDK Mission Control inspection.
No allocations on EventLoop either, when avoidable — the zero-allocation cache payload is the discipline for the data the EventLoop touches most often.

Trade-offs¶

Transform cost moves elsewhere. If cache values are opaque bytes, any transformation (field projection, content-type negotiation) happens either before caching (pre-compute N formats) or after — not on read.
Memory bookkeeping is manual. Buffer pools and ByteArray lifetimes need explicit ownership. Leaked buffers are native-memory leaks (in Netty's direct buffer case).
Debug ergonomics worse. ByteArray contents aren't visible in a debugger the way ObjectNode is.

When it's overkill¶

Latency budget is tens of ms, not single-digit ms. For most serving tiers, JSON parsing/re-serialisation is acceptable; the engineering effort to go zero-allocation is not repaid.
Cache values mutate per-request before emission. If every response needs the payload transformed, zero- allocation caching just shifts allocation to the transform stage.
Cache is cold / rarely hit. The dominant allocation is the origin fetch, not the cache hit.

Seen in¶

— PRAPI's canonical instance, both tactics documented together with the principle extraction "The best way to eliminate GC pauses is to avoid object allocation altogether."

concepts/garbage-collection — storage GC (adjacent use of the term)
systems/okio — the zero-copy buffer library
systems/caffeine · systems/netty · systems/zalando-prapi