PATTERN Cited by 1 source

Secondary index for high-cardinality attributes over object storage¶

Pattern statement¶

When a log (or log-shaped) system's base index is deliberately kept cheap and low-cardinality — indexing only labels like service, cluster, and environment — and queries on high-cardinality attributes (UUIDs, request IDs, trace IDs, user IDs) are slow because they force full chunk scans, do not replace the base index with a full-text inverted one. Instead, layer a secondary index specifically tuned for unique-value lookups over the same chunked object-storage data.

The base index remains optimal for the majority of label-scoped queries; the secondary index handles the narrow case of needle-in-haystack queries. The base system's cost/simplicity envelope is preserved — no regression to full-text-indexed storage costs.

Motivation¶

The binary choice between label indexing (cheap, low-cardinality only) and full-text indexing (expensive, any cardinality) is a false dichotomy. Log systems like Loki picked label indexing explicitly to avoid Elasticsearch-class storage costs. But label indexing is structurally weak on the one query shape operators most often need during incidents: "find the one log line that contains this UUID / request ID."

A secondary index tuned for that specific query shape, layered on the same object-storage substrate, bridges the gap. The base system keeps its cheap, operationally-light label-scoped query path; only unique-value queries pay the (smaller) cost of the secondary index.

Canonical wiki instance¶

Logline in Loki (2026-04):

Base index: Loki's label-based index over service, cluster, environment, region, etc.
Secondary index: Logline — acquired 2026-04-22 by Grafana Labs, designed specifically for high-cardinality attributes over object storage. Sits on the same object-storage substrate as Loki's chunks, rather than requiring a separate indexed-storage tier.
Base-design preservation: Grafana Labs is explicit that Logline "makes it much faster to find specific, highly unique values in large datasets, without changing Loki's core design."
Reported benefit: UUID lookup scan reduced from 3.5 TB to 8 GB (99.7 % reduction in data scanned). (Source: sources/2026-04-22-grafana-grafana-labs-acquires-logline)

Why it works¶

The base index stays cheap. Low-cardinality means the label index is small and cheap to maintain at scale. The pattern refuses to trade that away.
The secondary index is narrow. It only needs to accelerate one query shape (unique-value lookup), so it can use a specialised data structure (hash, Bloom filter, compact posting list) rather than a full-text inverted index. Grafana Labs characterises Logline as "much simpler indexing than anything else that's on the market today."
Object storage stays the data substrate. No separate indexed- storage tier means no storage-cost regression; the secondary index lives on the same chunks the data does.
Query routing is transparent. Queries on labels use the base index path; queries on high-cardinality attributes trigger the secondary index. The user-facing query language (LogQL for Loki) doesn't change.

Design envelope¶

When this pattern fits¶

Cheap base index optimised for a specific dimension class.
Query workload dominated by the base-index-friendly dimensions, but with a material minority of queries on high-cardinality attributes.
Cost sensitivity to storage — full-text indexing is a non-starter.
Data substrate is object storage, where any secondary index must respect high-latency / high-bandwidth access patterns.

When this pattern doesn't fit¶

Full-text search is actually the primary workload. If most queries are substring searches across log content (not unique-value lookups), a secondary unique-value index won't help; full-text indexing (Elasticsearch) is the better fit.
No object-storage substrate. If the bulk of data lives on directly-attached storage with low-latency random access, the index design space is different; object-storage-specific tricks (range reads, manifest pruning) don't apply.
Base index is already full-text. The pattern is specifically about augmenting cheap indexes that structurally can't handle high-cardinality — not about augmenting systems that already do.

Anti-patterns this pattern rejects¶

"Add all high-cardinality attributes to the base index." Makes the base index explode in size and cost; negates the original design choice.
"Replace label indexing with full-text indexing." Regresses to Elasticsearch-class storage cost; defeats Loki's whole design thesis.
"Tell users to scope queries tighter with labels." Works sometimes, but for missing-needle worst case (UUID not in corpus), label scoping only reduces the bucket — the scan still has to be complete to prove the needle absent.

Seen in¶

sources/2026-04-22-grafana-grafana-labs-acquires-logline — canonical wiki instance. Grafana Labs explicitly articulates all three design commitments (base-index preservation, secondary-index narrowness, object-storage substrate continuity) and reports a 99.7 % data-scanned reduction for the target query class.

systems/loki — the base-indexed log system.
systems/logline — the secondary-index implementation.
concepts/label-based-log-indexing — the base-index scheme.
concepts/high-cardinality-attribute-indexing-over-object-storage — the implementing-technique concept.
concepts/needle-in-haystack-log-query — the query class this pattern targets.
concepts/compute-storage-separation — the architectural backdrop.
concepts/secondary-index — general parent concept.