CONCEPT Cited by 1 source
High-cardinality attribute indexing over object storage¶
Definition¶
A class of indexing technique that makes unique-value lookups fast over datasets whose bulk storage lives on object storage (S3, GCS, Azure Blob) — without paying the cost of a full-text inverted index.
The problem space is defined by three simultaneous constraints:
- High cardinality. The values being indexed are unique or near-unique per record (UUIDs, request IDs, trace IDs, user IDs). A classical inverted index on all of them blows up the index size.
- Object storage substrate. The underlying chunks are on object storage, so index lookups have to fit the object-storage access model (high-latency, high-bandwidth, range-readable).
- Cost-per-query ceiling. The premise of using object storage is that storage cost is small; a high-cost index would regress the whole cost/performance envelope.
Why it is hard¶
The classical log-indexing designs fail this problem space in opposite directions:
- Label indexes (label-based log indexing, e.g., Loki) are cheap but can only index low-cardinality dimensions. High-cardinality attributes would explode the index size.
- Full-text inverted indexes (Elasticsearch) handle high cardinality but cost is proportional to corpus size and unique-term count — storage/memory/operational cost is high, often co-located with compute nodes rather than object storage.
The gap: no scheme that gives both cheap storage (object-storage- priced) and fast unique-value lookup.
The Logline framing¶
Grafana Labs positions its 2026-04 acquisition Logline as occupying exactly this gap:
"Logline brings a new indexing approach to Loki that's designed specifically for high-cardinality attributes over object storage. Ultimately, this makes it much faster to find specific, highly unique values in large datasets, without changing Loki's core design."
And:
"We want to drive down the time it takes to perform these searches without having to introduce techniques that are much more computationally expensive."
The implicit architectural genre is a secondary index layered on top of existing object-storage chunks — mapping high-cardinality values to the chunks that contain them, without re-indexing content as a full-text engine would. Exact mechanism (hash-based? approximate? sharded? Bloom-filter-based?) is not disclosed in the announcement. (Source: sources/2026-04-22-grafana-grafana-labs-acquires-logline)
Operational signature of the technique¶
The reported benchmark (3.5 TB → 8 GB scanned for a UUID lookup that returns no match) implies several properties of the index:
- Effective on the missing-needle worst case. The scan reduction holds even when the needle isn't present, which means the index is sound enough to prove absence for most chunks (not just a lossy lookup). This is consistent with probabilistic structures like Bloom filters or compact sparse indexes.
- Small index footprint. A 3.5 TB → 8 GB delta implies the index itself prunes the candidate chunk set aggressively — the 8 GB represents the residual data that still had to be scanned after index lookup, not the full corpus.
- Co-located with the data. The index lives on the same object storage substrate Loki already uses — no separate indexed-storage tier required.
Design-space axes¶
| Axis | Label index | High-cardinality-over-object-storage index | Full-text inverted index |
|---|---|---|---|
| Cardinality handled | Low | High | Any |
| Storage location | Small dedicated tier | Object storage alongside data | Dedicated indexed tier |
| Storage cost | Low | Low | High |
| Query cost (unique lookup) | High (full chunk scan) | Low | Low |
| Query cost (label-scoped range) | Low | Low (when combined with labels) | Low |
| Operational complexity | Low | Depends on technique | High |
Seen in¶
- sources/2026-04-22-grafana-grafana-labs-acquires-logline — Grafana Labs names the design target explicitly and positions Logline as the implementation. Benchmark datum 3.5 TB → 8 GB (99.7 % data scanned reduction).
Related¶
- concepts/label-based-log-indexing — the complementary base indexing scheme.
- concepts/needle-in-haystack-log-query — the query class this indexing technique targets.
- concepts/compute-storage-separation — the architectural backdrop that makes object-storage-based indexing attractive.
- concepts/secondary-index — general index-theory parent concept.
- systems/loki — the system Logline indexes.
- systems/logline — canonical implementation target.
- patterns/secondary-index-for-high-cardinality-over-object-storage — the architectural pattern.