Skip to content

PATTERN Cited by 1 source

Offline train, online resolve (compression)

Problem

Format-aware compression requires picking the right transform sequence + parameters for each data shape. If this picking happens in the hot path (at encode time, per-frame), the compressor pays the search cost every frame — slow + fragile. If it happens only once at integration time, it can't adapt to data drift and can't emit Pareto-curve tradeoff points for different workloads.

Solution

Split the work into two phases with a config object (a Plan) flowing between them:

  1. Offline training. Trainer consumes shape description (SDDL, parser function, or preset) + sample data; runs a budgeted search over transform choices and parameters; emits a Plan (or a Pareto-set of Plans spanning speed × ratio tradeoffs).
  2. Online resolve + encode. Encoder consumes the Plan + the current frame; resolves the Plan into a concrete Resolved Graph (picking branches at any control points); records the Resolved Graph into the frame; emits compressed bytes.
  3. Decode anywhere. Any OpenZL decoder reads the Resolved Graph from the frame and executes the inverse transform sequence. No out-of-band state.

Loop closes via Managed Compression: periodically re-sample production data, re-run the trainer, roll out updated Plans "like any other config change" (Source: sources/2025-10-06-meta-openzl-an-open-source-format-aware-compression-framework).

Canonical instance

  • OpenZL (Meta, 2025) — "describe (SDDL) → train (produce a plan) → compress (emit frames with resolved graphs) → decode anywhere with the same binary." Precise statement of the pattern.

Why the split matters

Three architectural properties fall out of it:

  • Training cost is amortized over many frames; the hot path just resolves, doesn't search.
  • Plans are first-class config objects — stored, versioned, rolled out, compared, A/B tested.
  • The decoder only needs to know how to run a Resolved Graph — it doesn't need to know the Plan format or the trainer version. This is what makes the universal decoder property feasible.

Precedents + siblings

  • Zstandard dictionary compression (Meta, 2018) — the earlier form of the pattern at Meta. Dictionary was trained offline against samples, then distributed; the zstd binary consumed the dictionary at encode + decode time. The "Plan" was just a dictionary; there was no DAG of transforms. OpenZL generalizes this by making the config a graph instead of a flat dictionary.
  • Query planner / executor split in databases — the query planner runs expensive optimization once against a statistics snapshot, produces a plan, and the executor runs the plan against each row. OpenZL's trainer is the query planner; its encoder is the executor.
  • JIT compilation — hot-path traces get heavily optimized once, then executed many times.

Seen in

Last updated · 319 distilled / 1,201 read