PATTERN Cited by 1 source
S3 as policy-bundle source for availability¶
Problem¶
Your policy engine (OPA or equivalent) in the data plane needs to fetch policy bundles from somewhere. The natural answer is "from the policy control plane" (Styra DAS, a vendor SaaS, an internal bundle server). But if the control plane goes down, a policy engine that depends on it for fresh bundles is at risk — best case it serves the last cached bundle, worst case it fails closed and takes the data plane with it.
Solution¶
Publish bundles to object storage (AWS S3) and point the data plane at S3. The control plane (Styra DAS, build pipeline, etc.) writes to S3; the data plane reads from S3. The only hard runtime dependency of enforcement is S3 — not the control plane.
Rego authors (Git)
→ control plane (Styra DAS / build pipeline) ← failure OK
→ writes bundle → S3 bucket ← hard dep
← polls bundle ← embedded OPA in data plane
Why this works¶
- S3's availability SLA is stronger than most policy control planes. Swap a SaaS dependency for AWS's object-storage availability promise.
- Control-plane outages become non-events for enforcement. The data plane continues reading its most recent bundle from S3; publication is paused, but enforcement is not.
- Bundles are naturally immutable versioned artifacts. Perfect match for S3's data model.
- Scale properties are right. N data-plane pollers against a few bundle objects is an embarrassingly cache-friendly workload for S3.
Static-stability framing¶
This is a direct instance of concepts/static-stability applied to authorization: the data plane runs without needing to talk to the control plane on the request path or on the polling path. Compare to AWS's own internal "static stability" principle for control planes never being in the data path of compute launches.
Trade-offs¶
- Publish latency goes up. Each bundle update is now two hops (build → S3, then S3 → data plane poll interval) rather than one. Usually fine for authz — policies don't change that fast.
- S3 becomes a shared blast radius. S3-region outage = bundle- fetch outage. Typically still less risky than a SaaS control plane, and concepts/runtime-dependency-on-saas-provider is the alternative axis of risk.
- Authentication + authorization for bundle fetch must be wired (IAM roles / presigned URLs / bucket policies); the engine agent needs network + credential access to S3.
- Signing + integrity. Bundles should be signed so a compromise of the S3 write path cannot inject policy. Not an availability concern but an integrity one.
Generalises beyond OPA¶
The pattern is "use highly-available object storage as the data- plane substrate for control-plane artifacts". Applies to feature flags (publish to S3, SDKs poll S3), config distribution (see concepts/s3-signal-bucket-as-config-fanout), ML model artifacts, and any other control-plane-authored / data-plane- consumed blob.
Seen in¶
- sources/2024-12-05-zalando-open-policy-agent-in-skipper-ingress — canonical wiki instance. "To reduce the likelihood of outages due to an authorization infrastructure failure, we use AWS S3 and its availability promises as the source for policy bundles. Styra DAS, a commercial control plane for Open Policy Agent is used to source the bundles and publish them to S3. … This approach allows us to scale and fail-over despite failures of our OPA control plane and only depends on S3 being available." Zalando still pairs this with status + decision-log streams back to Styra DAS (which are lossy-OK channels), but the bundle fetch path is S3-only.