PATTERN Cited by 1 source
Secondary index for high-cardinality attributes over object storage¶
Pattern statement¶
When a log (or log-shaped) system's base index is deliberately kept cheap and low-cardinality — indexing only labels like service, cluster, and environment — and queries on high-cardinality attributes (UUIDs, request IDs, trace IDs, user IDs) are slow because they force full chunk scans, do not replace the base index with a full-text inverted one. Instead, layer a secondary index specifically tuned for unique-value lookups over the same chunked object-storage data.
The base index remains optimal for the majority of label-scoped queries; the secondary index handles the narrow case of needle-in-haystack queries. The base system's cost/simplicity envelope is preserved — no regression to full-text-indexed storage costs.
Motivation¶
The binary choice between label indexing (cheap, low-cardinality only) and full-text indexing (expensive, any cardinality) is a false dichotomy. Log systems like Loki picked label indexing explicitly to avoid Elasticsearch-class storage costs. But label indexing is structurally weak on the one query shape operators most often need during incidents: "find the one log line that contains this UUID / request ID."
A secondary index tuned for that specific query shape, layered on the same object-storage substrate, bridges the gap. The base system keeps its cheap, operationally-light label-scoped query path; only unique-value queries pay the (smaller) cost of the secondary index.
Canonical wiki instance¶
Logline in Loki (2026-04):
- Base index: Loki's label-based index over service, cluster, environment, region, etc.
- Secondary index: Logline — acquired 2026-04-22 by Grafana Labs, designed specifically for high-cardinality attributes over object storage. Sits on the same object-storage substrate as Loki's chunks, rather than requiring a separate indexed-storage tier.
- Base-design preservation: Grafana Labs is explicit that Logline "makes it much faster to find specific, highly unique values in large datasets, without changing Loki's core design."
- Reported benefit: UUID lookup scan reduced from 3.5 TB to 8 GB (99.7 % reduction in data scanned). (Source: sources/2026-04-22-grafana-grafana-labs-acquires-logline)
Why it works¶
- The base index stays cheap. Low-cardinality means the label index is small and cheap to maintain at scale. The pattern refuses to trade that away.
- The secondary index is narrow. It only needs to accelerate one query shape (unique-value lookup), so it can use a specialised data structure (hash, Bloom filter, compact posting list) rather than a full-text inverted index. Grafana Labs characterises Logline as "much simpler indexing than anything else that's on the market today."
- Object storage stays the data substrate. No separate indexed- storage tier means no storage-cost regression; the secondary index lives on the same chunks the data does.
- Query routing is transparent. Queries on labels use the base index path; queries on high-cardinality attributes trigger the secondary index. The user-facing query language (LogQL for Loki) doesn't change.
Design envelope¶
When this pattern fits¶
- Cheap base index optimised for a specific dimension class.
- Query workload dominated by the base-index-friendly dimensions, but with a material minority of queries on high-cardinality attributes.
- Cost sensitivity to storage — full-text indexing is a non-starter.
- Data substrate is object storage, where any secondary index must respect high-latency / high-bandwidth access patterns.
When this pattern doesn't fit¶
- Full-text search is actually the primary workload. If most queries are substring searches across log content (not unique-value lookups), a secondary unique-value index won't help; full-text indexing (Elasticsearch) is the better fit.
- No object-storage substrate. If the bulk of data lives on directly-attached storage with low-latency random access, the index design space is different; object-storage-specific tricks (range reads, manifest pruning) don't apply.
- Base index is already full-text. The pattern is specifically about augmenting cheap indexes that structurally can't handle high-cardinality — not about augmenting systems that already do.
Anti-patterns this pattern rejects¶
- "Add all high-cardinality attributes to the base index." Makes the base index explode in size and cost; negates the original design choice.
- "Replace label indexing with full-text indexing." Regresses to Elasticsearch-class storage cost; defeats Loki's whole design thesis.
- "Tell users to scope queries tighter with labels." Works sometimes, but for missing-needle worst case (UUID not in corpus), label scoping only reduces the bucket — the scan still has to be complete to prove the needle absent.
Seen in¶
- sources/2026-04-22-grafana-grafana-labs-acquires-logline — canonical wiki instance. Grafana Labs explicitly articulates all three design commitments (base-index preservation, secondary-index narrowness, object-storage substrate continuity) and reports a 99.7 % data-scanned reduction for the target query class.
Related¶
- systems/loki — the base-indexed log system.
- systems/logline — the secondary-index implementation.
- concepts/label-based-log-indexing — the base-index scheme.
- concepts/high-cardinality-attribute-indexing-over-object-storage — the implementing-technique concept.
- concepts/needle-in-haystack-log-query — the query class this pattern targets.
- concepts/compute-storage-separation — the architectural backdrop.
- concepts/secondary-index — general parent concept.