CONCEPT Cited by 1 source
Label-based log indexing¶
Definition¶
Label-based log indexing is the architectural choice to index only a small set of low-cardinality dimensions — labels — attached to log lines, rather than indexing the log content itself. The canonical implementation is Grafana Loki, which patterns its data model on Prometheus: every log stream is identified by a set of labels (service, cluster, environment, region, namespace, level) and the raw log lines are stored in chunks on object storage without full-text indexing.
Why this design¶
The deliberate engineering trade-off is storage and operational cost for query flexibility:
- Index size stays small. Labels are low-cardinality — perhaps thousands of unique combinations per tenant, not millions. The index stays cheap to maintain, back up, and scale.
- Storage cost stays low. Log content lives in object-storage chunks (S3, GCS, Azure Blob) at object-storage pricing. No inverted full-text index to store or maintain.
- Label-scoped queries are fast. A query like "all 5xx logs from service=payments, cluster=prod-eu, last 1h" is satisfied by looking up chunks matching those labels and scanning them within the time window.
"Loki is powered by a label-based indexing approach that keeps storage costs low and operations lightweight. For the majority of use cases, this design works extremely well." (Source: sources/2026-04-22-grafana-grafana-labs-acquires-logline)
The structural limitation¶
Labels are low-cardinality by design — if the number of distinct label combinations grows without bound, the index approaches the size of the raw data and the cost advantage disappears. This means labels cannot include high-cardinality attributes — UUIDs, request IDs, trace IDs, job IDs, user IDs — which are precisely the attributes operators most often want to search by during incidents.
Consequence: needle-in-haystack queries (finding a single log line containing a specific UUID across a large corpus) degrade to scanning whole chunks to find the text match. See sources/2026-04-22-grafana-grafana-labs-acquires-logline for the headline number: a UUID lookup that scanned 3.5 TB without returning a result — pure structural failure of label-only indexing on the missing-needle worst case.
The remedy is not to replace the label index (that would regress to Elasticsearch-class storage cost) but to layer a secondary index — see concepts/high-cardinality-attribute-indexing-over-object-storage and patterns/secondary-index-for-high-cardinality-over-object-storage.
Contrast with full-text indexing¶
| Dimension | Label-based (Loki) | Full-text (Elasticsearch) |
|---|---|---|
| What is indexed | Labels only | Every token in every log line |
| Index size | Small (labels × time × service) | Large (tokens × documents) |
| Storage cost | Low (object storage) | High (indexed, often on disk) |
| Label-scoped query | Fast | Fast |
| Unique-value lookup | Slow — scan chunks | Fast — inverted index lookup |
| Ingestion cost | Low | High (tokenisation + index update) |
| Operational complexity | Low | High (shard management, cluster state) |
Seen in¶
- sources/2026-04-22-grafana-grafana-labs-acquires-logline — the canonical wiki articulation of the label-indexing thesis and its high-cardinality blind spot. Grafana Labs explicitly names the design choice and explicitly names its limitation as motivation for the Logline acquisition.
Related¶
- systems/loki — canonical implementation.
- systems/logline — secondary index layered under label indexing to close the high-cardinality gap.
- systems/elasticsearch — the full-text-indexed alternative.
- concepts/needle-in-haystack-log-query — the query class where label indexing structurally breaks down.
- concepts/high-cardinality-attribute-indexing-over-object-storage — the complementary indexing technique.
- concepts/index-selectivity — general index theory behind why UUIDs don't fit label schemes.
- concepts/compute-storage-separation — the architecture label indexing enables.