Skip to content

PATTERN Cited by 1 source

Secondary index for high-cardinality attributes over object storage

Pattern statement

When a log (or log-shaped) system's base index is deliberately kept cheap and low-cardinality — indexing only labels like service, cluster, and environment — and queries on high-cardinality attributes (UUIDs, request IDs, trace IDs, user IDs) are slow because they force full chunk scans, do not replace the base index with a full-text inverted one. Instead, layer a secondary index specifically tuned for unique-value lookups over the same chunked object-storage data.

The base index remains optimal for the majority of label-scoped queries; the secondary index handles the narrow case of needle-in-haystack queries. The base system's cost/simplicity envelope is preserved — no regression to full-text-indexed storage costs.

Motivation

The binary choice between label indexing (cheap, low-cardinality only) and full-text indexing (expensive, any cardinality) is a false dichotomy. Log systems like Loki picked label indexing explicitly to avoid Elasticsearch-class storage costs. But label indexing is structurally weak on the one query shape operators most often need during incidents: "find the one log line that contains this UUID / request ID."

A secondary index tuned for that specific query shape, layered on the same object-storage substrate, bridges the gap. The base system keeps its cheap, operationally-light label-scoped query path; only unique-value queries pay the (smaller) cost of the secondary index.

Canonical wiki instance

Logline in Loki (2026-04):

  • Base index: Loki's label-based index over service, cluster, environment, region, etc.
  • Secondary index: Logline — acquired 2026-04-22 by Grafana Labs, designed specifically for high-cardinality attributes over object storage. Sits on the same object-storage substrate as Loki's chunks, rather than requiring a separate indexed-storage tier.
  • Base-design preservation: Grafana Labs is explicit that Logline "makes it much faster to find specific, highly unique values in large datasets, without changing Loki's core design."
  • Reported benefit: UUID lookup scan reduced from 3.5 TB to 8 GB (99.7 % reduction in data scanned). (Source: sources/2026-04-22-grafana-grafana-labs-acquires-logline)

Why it works

  • The base index stays cheap. Low-cardinality means the label index is small and cheap to maintain at scale. The pattern refuses to trade that away.
  • The secondary index is narrow. It only needs to accelerate one query shape (unique-value lookup), so it can use a specialised data structure (hash, Bloom filter, compact posting list) rather than a full-text inverted index. Grafana Labs characterises Logline as "much simpler indexing than anything else that's on the market today."
  • Object storage stays the data substrate. No separate indexed- storage tier means no storage-cost regression; the secondary index lives on the same chunks the data does.
  • Query routing is transparent. Queries on labels use the base index path; queries on high-cardinality attributes trigger the secondary index. The user-facing query language (LogQL for Loki) doesn't change.

Design envelope

When this pattern fits

  • Cheap base index optimised for a specific dimension class.
  • Query workload dominated by the base-index-friendly dimensions, but with a material minority of queries on high-cardinality attributes.
  • Cost sensitivity to storage — full-text indexing is a non-starter.
  • Data substrate is object storage, where any secondary index must respect high-latency / high-bandwidth access patterns.

When this pattern doesn't fit

  • Full-text search is actually the primary workload. If most queries are substring searches across log content (not unique-value lookups), a secondary unique-value index won't help; full-text indexing (Elasticsearch) is the better fit.
  • No object-storage substrate. If the bulk of data lives on directly-attached storage with low-latency random access, the index design space is different; object-storage-specific tricks (range reads, manifest pruning) don't apply.
  • Base index is already full-text. The pattern is specifically about augmenting cheap indexes that structurally can't handle high-cardinality — not about augmenting systems that already do.

Anti-patterns this pattern rejects

  • "Add all high-cardinality attributes to the base index." Makes the base index explode in size and cost; negates the original design choice.
  • "Replace label indexing with full-text indexing." Regresses to Elasticsearch-class storage cost; defeats Loki's whole design thesis.
  • "Tell users to scope queries tighter with labels." Works sometimes, but for missing-needle worst case (UUID not in corpus), label scoping only reduces the bucket — the scan still has to be complete to prove the needle absent.

Seen in

  • sources/2026-04-22-grafana-grafana-labs-acquires-logline — canonical wiki instance. Grafana Labs explicitly articulates all three design commitments (base-index preservation, secondary-index narrowness, object-storage substrate continuity) and reports a 99.7 % data-scanned reduction for the target query class.
Last updated · 517 distilled / 1,221 read