CONCEPT Cited by 1 source

High-cardinality attribute indexing over object storage¶

Definition¶

A class of indexing technique that makes unique-value lookups fast over datasets whose bulk storage lives on object storage (S3, GCS, Azure Blob) — without paying the cost of a full-text inverted index.

The problem space is defined by three simultaneous constraints:

High cardinality. The values being indexed are unique or near-unique per record (UUIDs, request IDs, trace IDs, user IDs). A classical inverted index on all of them blows up the index size.
Object storage substrate. The underlying chunks are on object storage, so index lookups have to fit the object-storage access model (high-latency, high-bandwidth, range-readable).
Cost-per-query ceiling. The premise of using object storage is that storage cost is small; a high-cost index would regress the whole cost/performance envelope.

Why it is hard¶

The classical log-indexing designs fail this problem space in opposite directions:

Label indexes (label-based log indexing, e.g., Loki) are cheap but can only index low-cardinality dimensions. High-cardinality attributes would explode the index size.
Full-text inverted indexes (Elasticsearch) handle high cardinality but cost is proportional to corpus size and unique-term count — storage/memory/operational cost is high, often co-located with compute nodes rather than object storage.

The gap: no scheme that gives both cheap storage (object-storage- priced) and fast unique-value lookup.

The Logline framing¶

Grafana Labs positions its 2026-04 acquisition Logline as occupying exactly this gap:

"Logline brings a new indexing approach to Loki that's designed specifically for high-cardinality attributes over object storage. Ultimately, this makes it much faster to find specific, highly unique values in large datasets, without changing Loki's core design."

And:

"We want to drive down the time it takes to perform these searches without having to introduce techniques that are much more computationally expensive."

The implicit architectural genre is a secondary index layered on top of existing object-storage chunks — mapping high-cardinality values to the chunks that contain them, without re-indexing content as a full-text engine would. Exact mechanism (hash-based? approximate? sharded? Bloom-filter-based?) is not disclosed in the announcement. (Source: sources/2026-04-22-grafana-grafana-labs-acquires-logline)

Operational signature of the technique¶

The reported benchmark (3.5 TB → 8 GB scanned for a UUID lookup that returns no match) implies several properties of the index:

Effective on the missing-needle worst case. The scan reduction holds even when the needle isn't present, which means the index is sound enough to prove absence for most chunks (not just a lossy lookup). This is consistent with probabilistic structures like Bloom filters or compact sparse indexes.
Small index footprint. A 3.5 TB → 8 GB delta implies the index itself prunes the candidate chunk set aggressively — the 8 GB represents the residual data that still had to be scanned after index lookup, not the full corpus.
Co-located with the data. The index lives on the same object storage substrate Loki already uses — no separate indexed-storage tier required.

Design-space axes¶

Axis	Label index	High-cardinality-over-object-storage index	Full-text inverted index
Cardinality handled	Low	High	Any
Storage location	Small dedicated tier	Object storage alongside data	Dedicated indexed tier
Storage cost	Low	Low	High
Query cost (unique lookup)	High (full chunk scan)	Low	Low
Query cost (label-scoped range)	Low	Low (when combined with labels)	Low
Operational complexity	Low	Depends on technique	High

Seen in¶

sources/2026-04-22-grafana-grafana-labs-acquires-logline — Grafana Labs names the design target explicitly and positions Logline as the implementation. Benchmark datum 3.5 TB → 8 GB (99.7 % data scanned reduction).

concepts/label-based-log-indexing — the complementary base indexing scheme.
concepts/needle-in-haystack-log-query — the query class this indexing technique targets.
concepts/compute-storage-separation — the architectural backdrop that makes object-storage-based indexing attractive.
concepts/secondary-index — general index-theory parent concept.
systems/loki — the system Logline indexes.
systems/logline — canonical implementation target.
patterns/secondary-index-for-high-cardinality-over-object-storage — the architectural pattern.