PATTERN Cited by 1 source

Offline compute / online lookup for learned config¶

Problem¶

You need the output of an expensive analysis in a latency-sensitive runtime hot path, but:

The analysis is too expensive to run on every request.
The underlying phenomenon changes slowly enough that periodic re-computation is acceptable.
You want a simple ops story — no streaming pipelines, no reactive invalidation graphs, no per-request fallbacks.

Solution¶

Split the system into three phases:

Continuous ingest — runtime producers write observations to a durable store (object storage, append-only log) as a side effect of normal processing. No extra per-request compute.
Offline compute — a periodically-triggered batch job consumes observations, runs the expensive analysis, performs anomaly detection (see patterns/conservative-anomaly-gated-config-update), and publishes the result as a small artefact to:
- A config store (fast, for runtime consumption).
- An archive (for debugging, audit, rollback).
Runtime lookup — runtime consumers load the artefact at initialisation and perform in-memory lookups per request. The expensive analysis never runs on the hot path.

Canonical instance — Pinterest MIQPS¶

MIQPS is the canonical wiki instance (Source: sources/2026-04-20-pinterest-smarter-url-normalization-at-scale-how-miqps-powers-content-deduplication):

Ingest — the content ingestion pipeline writes each observed URL to a per-domain corpus on S3.
Compute — an offline job downloads the URL corpus, runs MIQPS (group by query parameter pattern, sample, render, compare content IDs, classify), runs anomaly detection, publishes the MIQPS map to the config store + archives to S3.
Lookup — the URL Normalizer loads the MIQPS map at init; for each URL it processes, looks up the query parameter pattern, retrieves the non-neutral parameter set, strips the rest.

Pinterest's framing: "This separation of concerns means the expensive content ID comparison happens offline and asynchronously, while runtime URL normalization is a fast, in-memory lookup."

Why not real-time¶

Pinterest articulates three load-bearing reasons:

Latency: Each content ID computation requires rendering a full page, which takes seconds. Testing every parameter in a URL would multiply this cost, adding unacceptable latency to the content processing pipeline.

Cost: Offline analysis scales with the number of domains, while realtime analysis would scale with the number of URLs — orders of magnitude more expensive.

Reliability: Transient rendering failures in an offline job are isolated and retryable. In a realtime path, they would directly block content processing."

The cost argument is the pattern's killer feature: shifting the compute-cost axis from per-request to per-entity (per-domain in Pinterest's case) is worth orders of magnitude when the entity:request ratio is high.

Why offline staleness is tolerable¶

"URL parameter conventions change infrequently — on the order of weeks or months. The small amount of staleness between computation cycles is an acceptable tradeoff for the massive savings in cost, latency, and operational complexity."

The staleness SLO is derived from the underlying-phenomenon change rate, not from product requirements. This pattern is a good fit only when those two rates are well-separated.

Architecture invariants¶

The pattern enforces several invariants:

Runtime never runs the expensive analysis. The hot path is pure in-memory lookup.
The config artefact is small. Fits in runtime memory. Loads fast at init.
Ingest and lookup are decoupled from compute. Runtime can keep serving against stale config if the compute job fails; the current artefact is always the last successfully-gated publish.
Compute job is restartable. Transient failures retryable; bad runs rejected by anomaly gating without affecting runtime.

When to apply¶

You have an expensive analysis (rendering, ML inference, large-scale graph computation) that produces a small artefact (config map, lookup table, threshold set).
The phenomenon changes slowly relative to your offline cadence.
You have object storage for the corpus + a config store for publication.
Runtime can load the artefact at init and do in-memory lookups.

When not to apply¶

Phenomenon changes too fast — if the artefact is stale within the compute cycle, you need streaming / reactive updates instead.
Artefact is too big — if it doesn't fit in runtime memory / config-store size limits, you need sharding or lazy fetch patterns.
Per-request decisions depend on request state — if the decision isn't pre-computable, you can't push it offline.
Staleness cost is unbounded — safety-critical systems may require guaranteed freshness; e.g. revocation lists for security credentials.

patterns/conservative-anomaly-gated-config-update — the gate between compute and publish.
patterns/offline-teacher-online-student-distillation — same shape at the ML-model layer (expensive teacher, cheap student).
patterns/offline-train-online-resolve-compression — same shape for compression dictionary training.
patterns/precomputed-agent-context-files — same shape for agent-context documentation.

Caveats¶

Runtime reload cadence — when new config publishes, do runtime replicas reload immediately? Roll over one at a time? Pinterest doesn't disclose their strategy.
Cold-start coverage — new domains (or new patterns on existing domains) have no MIQPS entry until the next offline cycle. The conservative-default layer handles these, but the gap is real.
Compute-store version skew — what if a runtime replica running old config coexists with one running new config? Usually fine for URL normalisation (idempotent), but worth considering for other applications.
Archive vs live-config divergence — S3 archive retains history, but the config store holds only current. Rollback requires fetching from archive.

Seen in¶

sources/2026-04-20-pinterest-smarter-url-normalization-at-scale-how-miqps-powers-content-deduplication — canonical Pinterest wiki instance.