PATTERN Cited by 1 source
Central telemetry aggregation¶
Pattern¶
In a multi-account platform (especially account-per-tenant), forward logs / metrics / traces from every source account into a single central aggregation tier, define multi-alerts once against the aggregated data, and present a single pane of glass to engineers — while the underlying telemetry still originates from isolated accounts. Designed to recover cross-fleet visibility without re-coupling the accounts the isolation architecture was built to separate.
Canonical form¶
"Observability tooling should be centralized, but without reintroducing the very risks that accounts are meant to isolate." (Source: sources/2026-02-25-aws-6000-accounts-three-people-one-platform)
ProGlove's shape: forward logs + metrics to a central third-party observability application; multi-alerts are defined once and applied across tenant accounts individually. Engineers see the aggregated view; raw telemetry still lives per-account.
Load-bearing sub-prescriptions¶
- Don't replicate per-account alarms blindly — if you naively fan the same alarm out to every tenant account, you drown in alerts proportional to N_accounts, and alert fatigue makes the fleet less observable, not more. Use streaming + aggregation at the central tier; define threshold-breach logic once against the aggregated streams.
- Tag for context. Every metric and log must carry the source AWS account ID (and tenant-id / service / Region / environment tags) so that aggregated views can drill into single-tenant problems without losing the cross-fleet summary.
- Enforce tagging consistency. Consider AWS Organizations tag policies to enforce a consistent scheme — the aggregation layer is only as useful as the discipline of its inputs.
- Stay current with AWS primitives. AWS's Observability Access Manager, CloudWatch metric streams, and EventBridge integrations are all evolving and may reduce custom-pipeline surface area.
(All Source: sources/2026-02-25-aws-6000-accounts-three-people-one-platform)
What "without reintroducing the risks" means in practice¶
- Read-only aggregation access. The central observability role assumes cross-account read-only roles into source accounts; it cannot mutate the source accounts. Anything richer (e.g. admin-level CloudWatch access) re-opens the blast-radius boundary.
- Central-tier compromise is a fleet-wide incident. The single pane of glass is also a single point of compromise for read visibility; harden the aggregation account as you would a production-critical service, not as an internal tool.
- No back-channels. A legitimate "we need to act on this alert" path must go through the same per-account access controls (ChatOps / break-glass / scoped roles), not the aggregation layer's own credentials.
Scale signal¶
Per-account cost of telemetry is the enemy at high account count: "the volume of collected data can make per-account costs economically unsustainable. Instead, focus on understanding which metrics you need to monitor and select an observability approach that allows you to implement that." (Source: sources/2026-02-25-aws-6000-accounts-three-people-one-platform)
Canonical downstream heuristics:
- Sample high-volume low-signal metrics per-account before forwarding.
- Aggregate at the edge (per-account) so only rolled-up streams leave the account.
- Tier storage at the central side (hot → cold) with short retention for raw per-account streams and longer retention only for aggregated derived metrics.
Where AWS-native OAM fits¶
At build time, ProGlove used a third-party tool. OAM has since shipped and offers the same architectural shape (cross-account read-only visibility, no telemetry copying) as an AWS-native alternative. For new platforms, OAM is the starting point.
Seen in¶
- sources/2026-02-25-aws-6000-accounts-three-people-one-platform — canonical instance: third-party observability application aggregating telemetry from ~6,000 ProGlove tenant accounts, single-view engineer experience.
Related¶
- systems/aws-observability-access-manager — AWS-native implementation of the same shape.
- concepts/account-per-tenant-isolation, concepts/blast-radius — the architectural pressures that make this pattern necessary.
- patterns/platform-engineering-investment — the broader platform-investment context.