PATTERN Cited by 1 source

Scan planning as policy enforcement point¶

What it is¶

The architectural pattern of using the catalog's scan-planning request as the chokepoint at which governance policies are evaluated and a filtered scan plan is returned — instead of relying on each compute engine to apply policies client-side after reading data.

The pattern is the structural answer to "how do you enforce row-level / column-level governance uniformly across many compute engines that all read the same tables?" Engines speak a common scan-planning protocol to the catalog. The catalog is the policy decision point (PDP): it evaluates the policies, applies row filters, applies column masks, prunes columns, and returns a scan plan that already reflects the policy. Engines are policy enforcement points (PEPs) only insofar as they honour the plan they received.

Canonical instance: Unity Catalog cross-engine ABAC (2026-05-28)¶

The 2026-05-28 announcement of cross-engine ABAC in Unity Catalog Beta is the canonical wiki instance:

"When an external Iceberg engine requests access, UC evaluates the applicable policies during server-side scan planning. UC then returns a filtered scan plan so the engine only reads authorized data when processing the query."

(Source: sources/2026-05-28-databricks-advancing-apache-iceberg-on-databricks-iceberg-v3-ga-open-sharing-and-unified-governance)

The wire-protocol surface is the Iceberg REST Catalog Scan Planning API (added in Iceberg 1.11). Any engine that implements the scan-planning client gets ABAC enforcement on its queries; engines below the version floor either don't speak the API at all or fall back to a non-policy-aware scan path.

Architectural shape¶

                   ┌──────────────────────────────────┐
                   │  Catalog server (PDP)            │
                   │                                  │
   plan-scan(...)  │  for each table referenced:      │
   ──────────────► │    1. resolve identity +         │
                   │       request context            │
                   │    2. fetch table tags +         │
                   │       inherited tags             │
                   │    3. evaluate ABAC policies     │
                   │       → row filter predicates    │
                   │       → column mask UDFs         │
                   │       → projection pruning       │
                   │    4. apply to scan plan         │
                   │                                  │
   ◄────────────── │  Returns: filtered scan plan     │
   filtered plan   │   - file list (subset)           │
                   │   - column projections           │
                   │   - residual predicates          │
                   └──────────────────────────────────┘
                              │
                              │  engine reads only files
                              │  in the plan; applies
                              │  residual predicates;
                              │  produces result
                              ▼
                       result set (already
                       reflects policy)

Why it works¶

One PDP, many PEPs. Authoring policy once in the catalog is the only sustainable shape when N engines (and growing) all read the same tables. The N-engine fan-out doesn't multiply the policy-authoring or policy-consistency burden.
No engine-side policy library required. Engines only need to honour a filtered scan plan — a much smaller and more stable contract than implementing the catalog's policy schema.
Audit substrate. Every scan-plan request becomes an auditable event with principal, table, predicate, projected columns. The catalog sees every query intent.
Policy can use catalog-side state. The policy can reference table tags, classification metadata, inherited tags from parent catalog/schema — all of which the catalog already holds. Engines wouldn't easily have access to this context.
Policy evaluation is centralised, so cache-friendly. The catalog can cache policy evaluations per (principal, table, query-fingerprint) to amortise the cost of repeated identical queries.

Constraints / trade-offs¶

Compatibility floor. Pattern requires the catalog and the engine to speak a scan-planning protocol. The Iceberg-1.11 floor in the canonical instance defines a hard line — engines below it can't participate.
Catalog-side latency on the query path. Server-side scan planning adds a catalog-side compute step before each query. The catalog must be sized for query-rate scale, not just metadata-fetch scale.
Catalog-side complexity. The catalog must implement query planning to a useful depth (predicate pushdown, partition pruning, column projection, statistics) — otherwise the scan plan it returns is naive and engines lose optimisation opportunities they previously did client-side.
Trust model. Engines must honour the residual parts of the plan (predicates, masks). A misbehaving engine that ignores residuals can read more than authorised. Mitigations: (1) push as much of the policy as possible into the file/column list directly so residual evasion has no effect; (2) audit deltas between scan-plan output and engine-reported reads; (3) restrict storage credentials to the file list the plan returned (credential vending scoped to the planned file list).
Some policy types may be hard to express at scan-plan time. Column masks that require stateful evaluation (cross-row computations) or randomised UDFs may not lower cleanly into a scan plan; partial enforcement falls back to the engine.

Comparison to alternatives¶

Approach	Where policy lives	Trust model	Engine fan-out cost
Policy in the engine	Each engine implements policies	Trust every engine	Linear in engine count
Policy in a sidecar / proxy	A query-rewriter / proxy in front of the engine	Trust the proxy	Per-engine deployment of proxy
Single-engine governance	Only the catalog vendor's compute is allowed	Implicit (one engine)	Doesn't scale to multi-engine
Scan planning as PEP (this pattern)	Catalog PDP + engine honours plan	Trust the catalog + engine residual honour	Constant; engines just speak the API

The pattern's trust model is better than per-engine policies because there's only one PDP, better than single-engine governance because it preserves engine choice, and structurally simpler than per-engine sidecars.

When this is the right shape¶

Multi-engine open lakehouse with a central catalog.
Customers / regulators require uniform governance regardless of engine.
Engines are willing to update to a recent enough Iceberg version to speak the scan-planning client.

When it isn't¶

Single-engine deployments — overhead of catalog-side scan planning isn't justified.
Engines that can't update to the scan-planning client floor.
Workloads where the catalog isn't the trust root (catalog-as-PDP isn't viable).
Policies that fundamentally require per-row UDF evaluation that can't be expressed at plan time.

Caveats¶

First canonical instance is Beta. Cross-engine ABAC on UC is in Beta as of 2026-05-28. Production reliability characteristics, latency profiles, and failure-mode behaviour are not yet disclosed.
Iceberg-specific surface. The pattern as-canonicalised rides on the Iceberg REST scan-planning API. Equivalent patterns for Delta or Hudi would need their own scan-planning protocol additions.
Policy expression-level constraints undisclosed. Which subset of UC ABAC policies is fully evaluable at scan-plan time vs partially evaluable is not specified.
No latency / throughput numbers. The 2026-05-28 source provides no benchmarks for scan-plan latency overhead.

Seen in¶

sources/2026-05-28-databricks-advancing-apache-iceberg-on-databricks-iceberg-v3-ga-open-sharing-and-unified-governance — Canonical wiki instance. Cross-engine ABAC in Unity Catalog Beta on the Iceberg REST Catalog Scan Planning API (Iceberg 1.11). The catalog-evaluates-policies-during-scan-planning shape canonicalised with verbatim mechanism description; Apache Spark and DuckDB named as compatible engines.

concepts/cross-engine-abac — the governance shape this pattern enables.
concepts/attribute-based-access-control — base ABAC concept.
systems/iceberg-rest-catalog-scan-api — the wire-protocol surface.
systems/unity-catalog — first canonical catalog implementing the pattern.
systems/unity-catalog-abac — UC's ABAC engine; cross-engine extension uses this pattern.
systems/apache-iceberg — table format whose REST API enables this.
concepts/governed-tag — attribute substrate the catalog policies reference.
patterns/tag-driven-attribute-based-access-control — the Databricks-compute-only ABAC pattern this generalises across engines.
patterns/central-proxy-choke-point — sibling pattern; scan-planning is a protocol-level chokepoint rather than a network-level one.