Skip to content

CONCEPT Cited by 1 source

Label-based log indexing

Definition

Label-based log indexing is the architectural choice to index only a small set of low-cardinality dimensions — labels — attached to log lines, rather than indexing the log content itself. The canonical implementation is Grafana Loki, which patterns its data model on Prometheus: every log stream is identified by a set of labels (service, cluster, environment, region, namespace, level) and the raw log lines are stored in chunks on object storage without full-text indexing.

Why this design

The deliberate engineering trade-off is storage and operational cost for query flexibility:

  • Index size stays small. Labels are low-cardinality — perhaps thousands of unique combinations per tenant, not millions. The index stays cheap to maintain, back up, and scale.
  • Storage cost stays low. Log content lives in object-storage chunks (S3, GCS, Azure Blob) at object-storage pricing. No inverted full-text index to store or maintain.
  • Label-scoped queries are fast. A query like "all 5xx logs from service=payments, cluster=prod-eu, last 1h" is satisfied by looking up chunks matching those labels and scanning them within the time window.

"Loki is powered by a label-based indexing approach that keeps storage costs low and operations lightweight. For the majority of use cases, this design works extremely well." (Source: sources/2026-04-22-grafana-grafana-labs-acquires-logline)

The structural limitation

Labels are low-cardinality by design — if the number of distinct label combinations grows without bound, the index approaches the size of the raw data and the cost advantage disappears. This means labels cannot include high-cardinality attributes — UUIDs, request IDs, trace IDs, job IDs, user IDs — which are precisely the attributes operators most often want to search by during incidents.

Consequence: needle-in-haystack queries (finding a single log line containing a specific UUID across a large corpus) degrade to scanning whole chunks to find the text match. See sources/2026-04-22-grafana-grafana-labs-acquires-logline for the headline number: a UUID lookup that scanned 3.5 TB without returning a result — pure structural failure of label-only indexing on the missing-needle worst case.

The remedy is not to replace the label index (that would regress to Elasticsearch-class storage cost) but to layer a secondary index — see concepts/high-cardinality-attribute-indexing-over-object-storage and patterns/secondary-index-for-high-cardinality-over-object-storage.

Contrast with full-text indexing

Dimension Label-based (Loki) Full-text (Elasticsearch)
What is indexed Labels only Every token in every log line
Index size Small (labels × time × service) Large (tokens × documents)
Storage cost Low (object storage) High (indexed, often on disk)
Label-scoped query Fast Fast
Unique-value lookup Slow — scan chunks Fast — inverted index lookup
Ingestion cost Low High (tokenisation + index update)
Operational complexity Low High (shard management, cluster state)

Seen in

  • sources/2026-04-22-grafana-grafana-labs-acquires-logline — the canonical wiki articulation of the label-indexing thesis and its high-cardinality blind spot. Grafana Labs explicitly names the design choice and explicitly names its limitation as motivation for the Logline acquisition.
Last updated · 517 distilled / 1,221 read