PATTERN Cited by 1 source
Scan planning as policy enforcement point¶
What it is¶
The architectural pattern of using the catalog's scan-planning request as the chokepoint at which governance policies are evaluated and a filtered scan plan is returned — instead of relying on each compute engine to apply policies client-side after reading data.
The pattern is the structural answer to "how do you enforce row-level / column-level governance uniformly across many compute engines that all read the same tables?" Engines speak a common scan-planning protocol to the catalog. The catalog is the policy decision point (PDP): it evaluates the policies, applies row filters, applies column masks, prunes columns, and returns a scan plan that already reflects the policy. Engines are policy enforcement points (PEPs) only insofar as they honour the plan they received.
Canonical instance: Unity Catalog cross-engine ABAC (2026-05-28)¶
The 2026-05-28 announcement of cross-engine ABAC in Unity Catalog Beta is the canonical wiki instance:
"When an external Iceberg engine requests access, UC evaluates the applicable policies during server-side scan planning. UC then returns a filtered scan plan so the engine only reads authorized data when processing the query."
The wire-protocol surface is the Iceberg REST Catalog Scan Planning API (added in Iceberg 1.11). Any engine that implements the scan-planning client gets ABAC enforcement on its queries; engines below the version floor either don't speak the API at all or fall back to a non-policy-aware scan path.
Architectural shape¶
┌──────────────────────────────────┐
│ Catalog server (PDP) │
│ │
plan-scan(...) │ for each table referenced: │
──────────────► │ 1. resolve identity + │
│ request context │
│ 2. fetch table tags + │
│ inherited tags │
│ 3. evaluate ABAC policies │
│ → row filter predicates │
│ → column mask UDFs │
│ → projection pruning │
│ 4. apply to scan plan │
│ │
◄────────────── │ Returns: filtered scan plan │
filtered plan │ - file list (subset) │
│ - column projections │
│ - residual predicates │
└──────────────────────────────────┘
│
│ engine reads only files
│ in the plan; applies
│ residual predicates;
│ produces result
▼
result set (already
reflects policy)
Why it works¶
- One PDP, many PEPs. Authoring policy once in the catalog is the only sustainable shape when N engines (and growing) all read the same tables. The N-engine fan-out doesn't multiply the policy-authoring or policy-consistency burden.
- No engine-side policy library required. Engines only need to honour a filtered scan plan — a much smaller and more stable contract than implementing the catalog's policy schema.
- Audit substrate. Every scan-plan request becomes an auditable event with principal, table, predicate, projected columns. The catalog sees every query intent.
- Policy can use catalog-side state. The policy can reference table tags, classification metadata, inherited tags from parent catalog/schema — all of which the catalog already holds. Engines wouldn't easily have access to this context.
- Policy evaluation is centralised, so cache-friendly. The catalog can cache policy evaluations per (principal, table, query-fingerprint) to amortise the cost of repeated identical queries.
Constraints / trade-offs¶
- Compatibility floor. Pattern requires the catalog and the engine to speak a scan-planning protocol. The Iceberg-1.11 floor in the canonical instance defines a hard line — engines below it can't participate.
- Catalog-side latency on the query path. Server-side scan planning adds a catalog-side compute step before each query. The catalog must be sized for query-rate scale, not just metadata-fetch scale.
- Catalog-side complexity. The catalog must implement query planning to a useful depth (predicate pushdown, partition pruning, column projection, statistics) — otherwise the scan plan it returns is naive and engines lose optimisation opportunities they previously did client-side.
- Trust model. Engines must honour the residual parts of the plan (predicates, masks). A misbehaving engine that ignores residuals can read more than authorised. Mitigations: (1) push as much of the policy as possible into the file/column list directly so residual evasion has no effect; (2) audit deltas between scan-plan output and engine-reported reads; (3) restrict storage credentials to the file list the plan returned (credential vending scoped to the planned file list).
- Some policy types may be hard to express at scan-plan time. Column masks that require stateful evaluation (cross-row computations) or randomised UDFs may not lower cleanly into a scan plan; partial enforcement falls back to the engine.
Comparison to alternatives¶
| Approach | Where policy lives | Trust model | Engine fan-out cost |
|---|---|---|---|
| Policy in the engine | Each engine implements policies | Trust every engine | Linear in engine count |
| Policy in a sidecar / proxy | A query-rewriter / proxy in front of the engine | Trust the proxy | Per-engine deployment of proxy |
| Single-engine governance | Only the catalog vendor's compute is allowed | Implicit (one engine) | Doesn't scale to multi-engine |
| Scan planning as PEP (this pattern) | Catalog PDP + engine honours plan | Trust the catalog + engine residual honour | Constant; engines just speak the API |
The pattern's trust model is better than per-engine policies because there's only one PDP, better than single-engine governance because it preserves engine choice, and structurally simpler than per-engine sidecars.
When this is the right shape¶
- Multi-engine open lakehouse with a central catalog.
- Customers / regulators require uniform governance regardless of engine.
- Engines are willing to update to a recent enough Iceberg version to speak the scan-planning client.
When it isn't¶
- Single-engine deployments — overhead of catalog-side scan planning isn't justified.
- Engines that can't update to the scan-planning client floor.
- Workloads where the catalog isn't the trust root (catalog-as-PDP isn't viable).
- Policies that fundamentally require per-row UDF evaluation that can't be expressed at plan time.
Caveats¶
- First canonical instance is Beta. Cross-engine ABAC on UC is in Beta as of 2026-05-28. Production reliability characteristics, latency profiles, and failure-mode behaviour are not yet disclosed.
- Iceberg-specific surface. The pattern as-canonicalised rides on the Iceberg REST scan-planning API. Equivalent patterns for Delta or Hudi would need their own scan-planning protocol additions.
- Policy expression-level constraints undisclosed. Which subset of UC ABAC policies is fully evaluable at scan-plan time vs partially evaluable is not specified.
- No latency / throughput numbers. The 2026-05-28 source provides no benchmarks for scan-plan latency overhead.
Seen in¶
- sources/2026-05-28-databricks-advancing-apache-iceberg-on-databricks-iceberg-v3-ga-open-sharing-and-unified-governance — Canonical wiki instance. Cross-engine ABAC in Unity Catalog Beta on the Iceberg REST Catalog Scan Planning API (Iceberg 1.11). The catalog-evaluates-policies-during-scan-planning shape canonicalised with verbatim mechanism description; Apache Spark and DuckDB named as compatible engines.
Related¶
- concepts/cross-engine-abac — the governance shape this pattern enables.
- concepts/attribute-based-access-control — base ABAC concept.
- systems/iceberg-rest-catalog-scan-api — the wire-protocol surface.
- systems/unity-catalog — first canonical catalog implementing the pattern.
- systems/unity-catalog-abac — UC's ABAC engine; cross-engine extension uses this pattern.
- systems/apache-iceberg — table format whose REST API enables this.
- concepts/governed-tag — attribute substrate the catalog policies reference.
- patterns/tag-driven-attribute-based-access-control — the Databricks-compute-only ABAC pattern this generalises across engines.
- patterns/central-proxy-choke-point — sibling pattern; scan-planning is a protocol-level chokepoint rather than a network-level one.