PATTERN Cited by 1 source
Conservative anomaly-gated config update¶
Problem¶
You have a learned or computed configuration artefact (classifier thresholds, routing tables, feature importances, rate-limit tiers, learned rules) that is re-computed periodically from upstream data. The upstream data can transiently misbehave (rendering outage, data pipeline delay, noisy batch) and produce a degenerate new config that — if published blindly — would cause runtime consumers to malfunction.
You need a safety gate at publish time that lets legitimate updates through while rejecting pathological ones.
Solution¶
Before publishing a newly-computed config, compare it against the previously-published version and classify each change:
- Safe changes — additions, disappearances, growth of the tolerable failure direction. Allowed freely.
- Dangerous changes — changes toward the catastrophic-failure direction. Counted as anomalies.
If more than A% of existing entries exhibit dangerous changes, reject the update entirely and retain the previous version.
Pinterest's MIQPS canonical rule set (Source: sources/2026-04-20-pinterest-smarter-url-normalization-at-scale-how-miqps-powers-content-deduplication):
| Change | Classification |
|---|---|
| Parameter flipped non-neutral → neutral | Anomaly (the dangerous direction — would start stripping a parameter previously judged important). |
| Parameter flipped neutral → non-neutral | Not anomaly (new important parameter discovered; worst case keeps slightly more than needed). |
| Pattern disappeared entirely | Not anomaly (URL structures legitimately evolve). |
Pinterest's threshold behaviour: "If more than A% of existing patterns are flagged as anomalous, the entire MIQPS update is rejected and the previous version is retained. This ensures the system never regresses — it errs on the side of over-keeping parameters rather than accidentally dropping ones that affect content identity."
Why asymmetric rules¶
A symmetric change detector (flag all changes) would be wrong:
- Too sensitive — legitimate learning iterations (discovering new non-neutral parameters, observing URL-structure evolution) would constantly trigger rejections. The config would be frozen.
- Blind to direction — doesn't distinguish a tolerable change (over-kept parameter) from a catastrophic one (silently merging distinct items).
Asymmetric rules match asymmetric costs — see concepts/anomaly-gated-config-update for the underlying concept. The rules embody the decision-maker's preference ordering over failure modes.
Canonical instance — Pinterest MIQPS¶
Applied to the MIQPS map (per-(domain, query-parameter-pattern) →
non-neutral-parameter set):
- Safe: new
(domain, pattern)entries, new non-neutral parameters added to existing entries,(domain, pattern)entries disappearing. - Dangerous: a
(domain, pattern, parameter)that was previously classified non-neutral now showing as neutral in the new MIQPS.
Pinterest doesn't disclose A (the rejection threshold), but the post makes the invariant explicit: "ensures the system never regresses."
When to apply¶
- Learned config is load-bearing — if it's wrong, runtime consumers malfunction.
- Failure modes are asymmetric — one change direction is much worse than the other.
- The previous version is a viable fallback — slightly stale but correct beats fresh but broken.
- The change delta is auditable — you can enumerate each entry's change and classify it.
When not to apply¶
- All changes are equally risky — no asymmetric cost → gate would either reject too much or let dangerous changes through.
- No previous version available — the gate has nothing to compare against.
- Changes are ordered / interdependent — partial rejection is non-sensical (e.g. routing tables where entries depend on each other).
- The recompute is cheap enough to canary — if you can run the new config against a small fraction of runtime traffic and measure its effect, canary rollout is more powerful than publish-time anomaly gating.
Interaction with other safety mechanisms¶
- patterns/offline-compute-online-lookup-config — anomaly gating sits between the "compute" and "publish" steps of the offline-batch architecture.
- patterns/multi-layer-normalization-strategy — defence-in- depth. Even if a bad MIQPS sneaks past anomaly gating, other classifier layers (static rules, regex, conservative default) still preserve the parameter.
- Canary / gradual rollout — complementary. Anomaly gating catches degenerate config at publish; canary catches subtle effects at rollout.
- Retain previous version — every legitimate rejection still leaves the system running on stale-but-known-good config. No outage.
Cost model¶
- Gate compute — O(config size), typically small compared to the learning pipeline itself.
- False-reject cost — a legitimate large update gets rejected (e.g. a genuine platform-wide URL structure change on many domains would show up as many non-neutral → neutral flips). Requires human investigation + threshold override.
- False-accept cost — a degenerate config below the A% threshold gets through. Runtime misbehaves on the fraction of entries that flipped the dangerous direction. The ensemble layers (see above) should catch this in practice.
Generalisation¶
Applies to any deployment of learned / computed config with asymmetric failure cost:
- Feature-importance tables for anti-abuse — dropping a critical feature from the high-importance tier is dangerous.
- Routing / placement tables — losing all routes to a region is dangerous; adding a new region is fine.
- Rate-limit tiers — contracting a limit below current consumption is dangerous; expanding is fine.
- Access-control tables — removing allow-list entries is dangerous; adding them is (usually) fine.
Caveats¶
- A selection is hard — too strict rejects legitimate updates; too lax lets bad ones through. Requires domain knowledge + observation window.
- Rare legitimate large-delta events — platform migrations, domain decommissions, mass URL-structure refactors. May require manual override.
- Doesn't detect silent correctness bugs — if the new config passes anomaly detection but is subtly wrong (e.g. mis-classifies 10% of new entries), the gate doesn't catch it. Needs supplementary canary or A/B testing.
- Previous-version baseline assumes previous version was good — if yesterday's MIQPS was itself degenerate, anomaly detection anchors off degenerate baseline. Periodic clean-room recomputation advisable.
Seen in¶
- sources/2026-04-20-pinterest-smarter-url-normalization-at-scale-how-miqps-powers-content-deduplication — canonical Pinterest wiki instance.