PATTERN Cited by 1 source
Event-driven config refresh¶
Shape¶
A reactive cache-invalidation pipeline for configuration data that eliminates the TTL-vs-staleness dilemma by pushing updates from the config source to live service instances the moment a change commits — no polling, no restart, no downtime.
Canonical AWS realization (from the multi-tenant-config post):
1. Config write to Parameter Store (or any EventBridge-emitting source)
2. EventBridge rule matches a content filter (e.g. path prefix)
3. Rule fires a Lambda, passing the full change event
4. Lambda extracts the affected tenant/scope from the event payload
5. Lambda queries AWS Cloud Map for healthy service instances for that scope
6. Lambda makes a gRPC refresh call to each instance in parallel
7. Each instance updates its in-memory cache synchronously on the refresh RPC
8. Updated configuration is live across the fleet (within seconds)
(Source: sources/2026-04-08-aws-build-a-multi-tenant-configuration-system-with-tagged-storage-patterns §D)
The problem being solved¶
Traditional configuration-refresh strategies force an unacceptable either/or as tenant counts grow:
- Polling — services re-read the config source on a schedule, generates API calls and money spent even when nothing changes, introduces a staleness window equal to the poll interval (seconds to minutes).
- TTL-based caches — same staleness window as the polling variant, plus tighter coupling between read latency and refresh freshness.
- Service restart on change — no staleness but drops active connections, disrupts user sessions; unacceptable for 24/7 SaaS.
The event-driven path resolves this: updates are reactive, bounded in latency by EventBridge delivery + Lambda cold start + fleet-wide gRPC fan-out (typically single-digit seconds), and the cache is invalidated in place on a live service without dropping in-flight requests.
Key mechanisms¶
Content-based routing via EventBridge rule. The rule filters on event path / source, so a single bus can carry changes for many config scopes and the Lambda only fires for the relevant subset. Content- based routing is what makes this pattern shine over raw SNS fanout — subscribers subscribe by event shape, not topic.
Service discovery at refresh time. The Lambda doesn't have a static list of service instances — it queries AWS Cloud Map at refresh time so auto-scaled / replaced / draining instances are handled automatically. The refresh fan-out is always to the current healthy set.
Direct gRPC call, not broadcast. The refresh is a point-to-point RPC per instance, not a pub/sub broadcast. This gives synchronous acknowledgment per instance (did the refresh succeed?) and explicit failure handling per instance (retry, circuit-break, log).
In-memory cache update, not restart. The refresh RPC's handler mutates a per-process in-memory map (or equivalent) under a lock, so in-flight requests continue serving the old value until the swap commits, then new reads see the new value. No connection drops; no restart.
Comparison to polling-based refresh¶
| Axis | Polling | Event-driven |
|---|---|---|
| Staleness window | = poll interval | Bounded by delivery + fan-out latency (seconds) |
| API calls when idle | N services × poll rate | 0 |
| API calls on change | N services × poll rate (amortized) | 1 event + N gRPC refreshes (per change) |
| Active-connection impact | None | None (in-place cache update) |
| Operational complexity | Low (just a schedule) | Higher (EventBridge rule + Lambda + Cloud Map + gRPC endpoint) |
| Observability | Poll success rate | Event-delivery metrics + Lambda invocations + refresh-RPC status |
Event-driven wins whenever writes are rare compared to reads (most config data) — polling cost is wasted the moment nothing changed.
Comparison to related wiki patterns¶
- patterns/stateless-invalidator (Figma LiveGraph) — same structural idea at a very different substrate. LiveGraph tails Postgres WAL per physical shard, emits invalidations over Kafka into cache replicas. The Config Service variant uses EventBridge events from Parameter Store and gRPC point-to-point invalidation. Both: a separate component observes the source of truth, pushes invalidations, and the cache layer is stateless. The difference is granularity — LiveGraph does per-row / per-query invalidation, the Config Service does per-config-key refresh.
- concepts/push-based-invalidation — the concept-level framing this pattern implements on AWS managed services.
- concepts/invalidation-based-cache — the target cache model.
Failure-mode surface¶
- EventBridge delivery failures — rare but non-zero; the rule can land in DLQ. A caller still holding stale cache sees old data until next natural refresh (application-level TTL on the cache entry as a safety net).
- Lambda cold start on rare-event path — refresh latency spikes when the trigger fires after a quiescent period. Provisioned concurrency mitigates at cost.
- Cloud Map stale discovery — freshly-terminating instances may still be in the healthy set; the refresh RPC against them fails and is swallowed. Acceptable (the instance is draining).
- Per-instance refresh failures — partial-fleet refresh is a real state (some instances see new config, others still see old). The pattern doesn't make this atomic; application-level tolerance for brief per-instance divergence is required.
- Event ordering — EventBridge delivery is at-least-once, not FIFO. Rapid successive writes can fire refreshes out of order; the Lambda should fetch the current value on refresh (not rely on the event payload) to collapse bursts.
Implementation checklist¶
- Config source emits typed change events. Parameter Store ships
this natively; other sources (DynamoDB Streams → EventBridge
Pipes, custom applications calling
PutEvents) need a glue step. - EventBridge rule with content-based filter scoped to the config-service's paths / keys. Avoid catch-all rules — per-event-shape rules localize Lambda firing.
- Lambda is stateless and idempotent. Fetches current value from the config source, fans out to Cloud Map instances, refreshes each. Idempotency handles retried events.
- Service exposes a refresh RPC endpoint (gRPC / HTTP). Handler authenticates the caller (Lambda's IAM role), looks up the new value, mutates the in-memory cache under a lock.
- Cloud Map service registration by every service instance at boot; deregistration on shutdown.
- Monitor end-to-end latency: change commit → refresh applied on last instance. This is the SLO for "how stale can my cache be".
- Keep an application-level TTL as belt-and-suspenders. If the push path fails silently, the cache eventually refreshes on its own — staleness bounded by TTL, not infinite.
Seen in¶
- sources/2026-04-08-aws-build-a-multi-tenant-configuration-system-with-tagged-storage-patterns
— canonical shape: Parameter Store writes → EventBridge rule on
/config-service/*→ Lambda extracts tenantId from path → Cloud Map lookup for Config Service instances → gRPC refresh RPC per instance → in-memory cache swap under lock. Explicit framing: "configuration updates within seconds while users experience no interruption."
Related¶
- concepts/cache-ttl-staleness-dilemma — the forcing function.
- concepts/push-based-invalidation — concept-level framing.
- concepts/invalidation-based-cache — the cache-model target.
- patterns/stateless-invalidator — the Figma LiveGraph instantiation; sibling pattern at a different substrate.
- systems/amazon-eventbridge — the change-event bus.
- systems/aws-lambda — the invalidator compute.
- systems/aws-cloud-map — service discovery for the fan-out.
- systems/grpc — the per-instance refresh RPC transport.
- systems/aws-parameter-store — the canonical EventBridge-emitting config source for this pattern.