PATTERN Cited by 1 source

Offline train, online resolve (compression)¶

Problem¶

Format-aware compression requires picking the right transform sequence + parameters for each data shape. If this picking happens in the hot path (at encode time, per-frame), the compressor pays the search cost every frame — slow + fragile. If it happens only once at integration time, it can't adapt to data drift and can't emit Pareto-curve tradeoff points for different workloads.

Solution¶

Split the work into two phases with a config object (a Plan) flowing between them:

Offline training. Trainer consumes shape description (SDDL, parser function, or preset) + sample data; runs a budgeted search over transform choices and parameters; emits a Plan (or a Pareto-set of Plans spanning speed × ratio tradeoffs).
Online resolve + encode. Encoder consumes the Plan + the current frame; resolves the Plan into a concrete Resolved Graph (picking branches at any control points); records the Resolved Graph into the frame; emits compressed bytes.
Decode anywhere. Any OpenZL decoder reads the Resolved Graph from the frame and executes the inverse transform sequence. No out-of-band state.

Loop closes via Managed Compression: periodically re-sample production data, re-run the trainer, roll out updated Plans "like any other config change" (Source: sources/2025-10-06-meta-openzl-an-open-source-format-aware-compression-framework).

Canonical instance¶

OpenZL (Meta, 2025) — "describe (SDDL) → train (produce a plan) → compress (emit frames with resolved graphs) → decode anywhere with the same binary." Precise statement of the pattern.

Why the split matters¶

Three architectural properties fall out of it:

Training cost is amortized over many frames; the hot path just resolves, doesn't search.
Plans are first-class config objects — stored, versioned, rolled out, compared, A/B tested.
The decoder only needs to know how to run a Resolved Graph — it doesn't need to know the Plan format or the trainer version. This is what makes the universal decoder property feasible.

Precedents + siblings¶

Zstandard dictionary compression (Meta, 2018) — the earlier form of the pattern at Meta. Dictionary was trained offline against samples, then distributed; the zstd binary consumed the dictionary at encode + decode time. The "Plan" was just a dictionary; there was no DAG of transforms. OpenZL generalizes this by making the config a graph instead of a flat dictionary.
Query planner / executor split in databases — the query planner runs expensive optimization once against a statistics snapshot, produces a plan, and the executor runs the plan against each row. OpenZL's trainer is the query planner; its encoder is the executor.
JIT compilation — hot-path traces get heavily optimized once, then executed many times.

Seen in¶

sources/2025-10-06-meta-openzl-an-open-source-format-aware-compression-framework — canonical wiki source. Meta's own phrasing: "describe → train → compress → decode anywhere."

systems/openzl · systems/managed-compression-meta
concepts/compression-plan — the object that flows through the pipeline.
concepts/universal-decoder — the decoder-side invariant this pattern makes possible.
concepts/format-aware-compression — the parent category.
patterns/embedded-decode-recipe-in-frame — what makes "decode anywhere" work.
patterns/graceful-upgrade-via-monoversion-decoder — the rollout discipline paired with this pattern.