Skip to content

CONCEPT Cited by 1 source

High-cardinality attribute indexing over object storage

Definition

A class of indexing technique that makes unique-value lookups fast over datasets whose bulk storage lives on object storage (S3, GCS, Azure Blob) — without paying the cost of a full-text inverted index.

The problem space is defined by three simultaneous constraints:

  1. High cardinality. The values being indexed are unique or near-unique per record (UUIDs, request IDs, trace IDs, user IDs). A classical inverted index on all of them blows up the index size.
  2. Object storage substrate. The underlying chunks are on object storage, so index lookups have to fit the object-storage access model (high-latency, high-bandwidth, range-readable).
  3. Cost-per-query ceiling. The premise of using object storage is that storage cost is small; a high-cost index would regress the whole cost/performance envelope.

Why it is hard

The classical log-indexing designs fail this problem space in opposite directions:

  • Label indexes (label-based log indexing, e.g., Loki) are cheap but can only index low-cardinality dimensions. High-cardinality attributes would explode the index size.
  • Full-text inverted indexes (Elasticsearch) handle high cardinality but cost is proportional to corpus size and unique-term count — storage/memory/operational cost is high, often co-located with compute nodes rather than object storage.

The gap: no scheme that gives both cheap storage (object-storage- priced) and fast unique-value lookup.

The Logline framing

Grafana Labs positions its 2026-04 acquisition Logline as occupying exactly this gap:

"Logline brings a new indexing approach to Loki that's designed specifically for high-cardinality attributes over object storage. Ultimately, this makes it much faster to find specific, highly unique values in large datasets, without changing Loki's core design."

And:

"We want to drive down the time it takes to perform these searches without having to introduce techniques that are much more computationally expensive."

The implicit architectural genre is a secondary index layered on top of existing object-storage chunks — mapping high-cardinality values to the chunks that contain them, without re-indexing content as a full-text engine would. Exact mechanism (hash-based? approximate? sharded? Bloom-filter-based?) is not disclosed in the announcement. (Source: sources/2026-04-22-grafana-grafana-labs-acquires-logline)

Operational signature of the technique

The reported benchmark (3.5 TB → 8 GB scanned for a UUID lookup that returns no match) implies several properties of the index:

  • Effective on the missing-needle worst case. The scan reduction holds even when the needle isn't present, which means the index is sound enough to prove absence for most chunks (not just a lossy lookup). This is consistent with probabilistic structures like Bloom filters or compact sparse indexes.
  • Small index footprint. A 3.5 TB → 8 GB delta implies the index itself prunes the candidate chunk set aggressively — the 8 GB represents the residual data that still had to be scanned after index lookup, not the full corpus.
  • Co-located with the data. The index lives on the same object storage substrate Loki already uses — no separate indexed-storage tier required.

Design-space axes

Axis Label index High-cardinality-over-object-storage index Full-text inverted index
Cardinality handled Low High Any
Storage location Small dedicated tier Object storage alongside data Dedicated indexed tier
Storage cost Low Low High
Query cost (unique lookup) High (full chunk scan) Low Low
Query cost (label-scoped range) Low Low (when combined with labels) Low
Operational complexity Low Depends on technique High

Seen in

Last updated · 517 distilled / 1,221 read