Skip to content

CONCEPT Cited by 1 source

Query parameter pattern

Definition

A query parameter pattern is the sorted set of parameter names present in a URL's query string, ignoring values. It is the grouping key that makes parameter classification tractable at scale: parameters can only be judged neutral-or-not in the context of which other parameters sit alongside them.

Example — these URLs have the same query parameter pattern {color, id, utm_source}:

https://example.com/shoes?id=42&color=red&utm_source=facebook
https://example.com/shoes?id=99&color=blue&utm_source=twitter

While this URL has a different pattern {category, page, sort}:

https://example.com/shop?category=shoes&page=2&sort=price

Why pattern matters — the ref example

Pinterest's (Source: sources/2026-04-20-pinterest-smarter-url-normalization-at-scale-how-miqps-powers-content-deduplication) canonical example of why classifying parameters independent of pattern is wrong:

"Moreover, the same parameter name can play different roles depending on its context. Consider the parameter ref: on a product page URL like example.com/product?id=42&ref=homepage, ref is purely a tracking parameter and is neutral — removing it doesn't change the product displayed. But on a comparison page URL like example.com/compare?ref=99, the same ref parameter identifies which items to compare and is non-neutral. By grouping URLs by their full parameter pattern, the algorithm evaluates each parameter within its specific context, correctly classifying it as neutral in one pattern and non-neutral in another."

This is why the classification key in MIQPS is (domain, pattern, parameter) — not (domain, parameter), and certainly not just parameter.

Role in MIQPS

MIQPS groups observed URLs by query parameter pattern, then picks the top K patterns by URL count for analysis — "focusing computational resources on the patterns that matter most." Patterns below the top-K cutoff are not analysed; parameters in those URLs default to the conservative-default treatment (kept).

Within each pattern, each parameter is tested independently by the removal-test.

Why sorted + names-only

  • Sorted — so ?a=1&b=2 and ?b=2&a=1 (logically equivalent) produce the same pattern.
  • Names only, not values — because the pattern is a structural grouping, not a content grouping. ?id=42&color=red and ?id=99&color=blue belong to the same pattern; they test what happens when color is removed from URLs of this shape.

Practical considerations

  • Pattern count per domain — a domain with many page types (product, category, search, checkout) has many patterns. MIQPS caps analysis at top-K.
  • Long-tail patterns — patterns appearing only a few times are dropped below the sample-size floor and hit the conservative default.
  • Pattern drift — a domain changing URL structure between MIQPS runs produces "patterns can naturally disappear as a domain's URL structure evolves" — this is explicitly not flagged as an anomaly (concepts/anomaly-gated-config-update).

Generalisation

The concept generalises to any classifier-over-composite-objects where context determines semantics:

  • Parameter-in-HTTP-header-set — what do authorisation headers mean when sitting alongside which others?
  • Feature-in-feature-vector — does feature X predict label Y in the same way regardless of what other features are active?
  • Option-in-option-set — does a CLI flag mean the same thing regardless of other flags passed?

In all cases, the insight is the same: don't classify the component in isolation when the composite is the unit of meaning.

Seen in

Last updated · 319 distilled / 1,201 read