Skip to content

CONCEPT Cited by 1 source

Service group

Definition

A service group is a clustered collection of workloads (pods, containers, processes, hosts) that share a common identity in observability data and are addressable as a single logical service. In the Kubernetes context, a service group typically corresponds to a Service + its backing Deployment/StatefulSet; in the non-K8s case, it corresponds to a coherent named service emitting metrics under a shared service label.

The service group is the granularity at which agent infrastructure memory is extracted and summarised — above individual pods (too narrow) and below entire namespaces or clusters (too broad).

Canonical framing (Grafana Assistant, 2026-05-01)

(Source: sources/2026-05-01-grafana-how-grafana-assistant-learns-your-infrastructure-before-you-even-ask)

"For each discovered service group, agents produce documentation covering five areas: what the service is, its key metrics and labels, how it's deployed, what it depends on, and how its logs are structured."

And:

"You can review what the assistant has learned by navigating to the Assistant settings and browsing the discovered service groups."

Why this granularity is load-bearing

The five-category service-knowledge schema (identity, metrics, topology, dependencies, log structure) is coherent at the service- group level but falls apart at adjacent granularities:

Granularity Why it fails
Single pod Too narrow — metrics / logs / deps are all shared with peers; per-pod memory would be redundant and volatile (pods are ephemeral)
Deployment Close, but a Deployment alone lacks the client-facing service name (K8s Service) users mention in queries
Namespace Too broad — a namespace contains N services with N different dependency graphs, log formats, metric schemas
Cluster Far too broad — clusters contain dozens of namespaces
Service group Matches how engineers think ("checkout-api"), matches metric label conventions (service=checkout-api), has coherent lifecycle, has well-defined dependencies

How it's detected

The exact detection heuristic used by Grafana Assistant is not disclosed. Plausible signals:

  • Prometheus label clustering. Workloads sharing service=X or app=X label values.
  • K8s API. Service objects and their selector-matched Deployments/StatefulSets.
  • Trace service names. OpenTelemetry trace service.name attribute groupings.
  • Name similarity. Workloads whose metric names share semantic prefixes (http_requests_total{handler=…} scoped by service label).

Most likely a combination of these: K8s API when available, falling back to Prometheus label conventions otherwise.

Relationship to adjacent concepts

  • Service (as in K8s Service) — the K8s-native primitive closest to service group, but K8s-only. Service group generalises to non-K8s stacks.
  • Deployment — a specific K8s controller, not a service abstraction on its own. A service group usually corresponds to exactly one Deployment but isn't identical to it.
  • Microservice — overloaded marketing term. "Service group" is more precise because it's defined by the observability data, not by code-organisation rhetoric.
  • concepts/critical-business-operation — CBO is at a different altitude (customer-facing operation across multiple services); service group is a substrate below that.

Failure modes in detection

  1. Label hygiene drift. Teams not using consistent service=/app= labels produce inferred service groups that don't match human-used names. "What does billing depend on?" fails if billing's workloads use label team=finance-apis and no service label.
  2. Multi-service K8s Pods. A sidecar-heavy pod may contribute metrics to multiple service groups; grouping purely by pod or Deployment fails.
  3. Batch jobs / CronJobs. Short-lived workloads may appear and disappear between refresh cycles, producing transient service groups.
  4. Monorepo deployments. A single codebase deployed under multiple names with the same metric schema may be conflated into one service group.

Seen in

Last updated · 445 distilled / 1,275 read