CONCEPT Cited by 1 source

Label-based log indexing¶

Definition¶

Label-based log indexing is the architectural choice to index only a small set of low-cardinality dimensions — labels — attached to log lines, rather than indexing the log content itself. The canonical implementation is Grafana Loki, which patterns its data model on Prometheus: every log stream is identified by a set of labels (service, cluster, environment, region, namespace, level) and the raw log lines are stored in chunks on object storage without full-text indexing.

Why this design¶

The deliberate engineering trade-off is storage and operational cost for query flexibility:

Index size stays small. Labels are low-cardinality — perhaps thousands of unique combinations per tenant, not millions. The index stays cheap to maintain, back up, and scale.
Storage cost stays low. Log content lives in object-storage chunks (S3, GCS, Azure Blob) at object-storage pricing. No inverted full-text index to store or maintain.
Label-scoped queries are fast. A query like "all 5xx logs from service=payments, cluster=prod-eu, last 1h" is satisfied by looking up chunks matching those labels and scanning them within the time window.

"Loki is powered by a label-based indexing approach that keeps storage costs low and operations lightweight. For the majority of use cases, this design works extremely well." (Source: sources/2026-04-22-grafana-grafana-labs-acquires-logline)

The structural limitation¶

Labels are low-cardinality by design — if the number of distinct label combinations grows without bound, the index approaches the size of the raw data and the cost advantage disappears. This means labels cannot include high-cardinality attributes — UUIDs, request IDs, trace IDs, job IDs, user IDs — which are precisely the attributes operators most often want to search by during incidents.

Consequence: needle-in-haystack queries (finding a single log line containing a specific UUID across a large corpus) degrade to scanning whole chunks to find the text match. See sources/2026-04-22-grafana-grafana-labs-acquires-logline for the headline number: a UUID lookup that scanned 3.5 TB without returning a result — pure structural failure of label-only indexing on the missing-needle worst case.

The remedy is not to replace the label index (that would regress to Elasticsearch-class storage cost) but to layer a secondary index — see concepts/high-cardinality-attribute-indexing-over-object-storage and patterns/secondary-index-for-high-cardinality-over-object-storage.

Contrast with full-text indexing¶

Dimension	Label-based (Loki)	Full-text (Elasticsearch)
What is indexed	Labels only	Every token in every log line
Index size	Small (labels × time × service)	Large (tokens × documents)
Storage cost	Low (object storage)	High (indexed, often on disk)
Label-scoped query	Fast	Fast
Unique-value lookup	Slow — scan chunks	Fast — inverted index lookup
Ingestion cost	Low	High (tokenisation + index update)
Operational complexity	Low	High (shard management, cluster state)

Seen in¶

sources/2026-04-22-grafana-grafana-labs-acquires-logline — the canonical wiki articulation of the label-indexing thesis and its high-cardinality blind spot. Grafana Labs explicitly names the design choice and explicitly names its limitation as motivation for the Logline acquisition.

systems/loki — canonical implementation.
systems/logline — secondary index layered under label indexing to close the high-cardinality gap.
systems/elasticsearch — the full-text-indexed alternative.
concepts/needle-in-haystack-log-query — the query class where label indexing structurally breaks down.
concepts/high-cardinality-attribute-indexing-over-object-storage — the complementary indexing technique.
concepts/index-selectivity — general index theory behind why UUIDs don't fit label schemes.
concepts/compute-storage-separation — the architecture label indexing enables.