PATTERN Cited by 1 source
Single VARIANT UDF for multi-type masking¶
Pattern summary¶
Write one column-masking UDF that accepts and returns VARIANT,
not type-specific signatures, so that a single UDF can mask INT,
DOUBLE, DECIMAL, STRUCT, and other column types at once. The
ABAC policy that invokes this UDF can apply uniformly to many columns
of different types — "cuts down on the number of policies
organizations need to maintain" (Source:
sources/2026-05-13-databricks-abac-row-filtering-and-column-masking-policies-governed-tags).
The architectural lever is type erasure via a dynamically-typed
wrapper: VARIANT carries a runtime type tag plus a value, so a
single function signature can dispatch on the runtime type internally
and return a value of the same type without per-type definitions.
What it replaces¶
-- WITHOUT this pattern: one UDF per column type
CREATE FUNCTION mask_int(v INT) RETURNS INT AS ...;
CREATE FUNCTION mask_double(v DOUBLE) RETURNS DOUBLE AS ...;
CREATE FUNCTION mask_decimal(v DECIMAL(...)) RETURNS DECIMAL(...) AS ...;
CREATE FUNCTION mask_struct(v STRUCT<...>) RETURNS STRUCT<...> AS ...;
-- … and one ABAC policy per type, each invoking the matching UDF.
-- WITH this pattern: one UDF over VARIANT
CREATE FUNCTION mask_any(v VARIANT) RETURNS VARIANT AS
CASE
WHEN typeof(v) = 'INT' THEN cast(0 AS VARIANT)
WHEN typeof(v) = 'DOUBLE' THEN cast(0.0 AS VARIANT)
WHEN typeof(v) = 'DECIMAL' THEN cast(0 AS VARIANT)
WHEN typeof(v) = 'STRUCT' THEN ...
END;
-- ONE ABAC policy invokes mask_any() over all matching columns.
The pattern collapses N UDFs and N ABAC policies into 1 UDF and 1 policy. Per-policy ceiling pressure (the GA-disclosed [[systems/unity-catalog-abac|10K-per-metastore / 100-per-catalog policy ceiling]]) eases proportionally.
Why VARIANT¶
VARIANT is the right type-erasure carrier because:
- It carries the runtime type alongside the value — the UDF can
dispatch on
typeof(v)at runtime. - The query engine preserves the type through the UDF call, so the masked return value flows back to the surrounding plan as if it were the original column type.
- It composes with
STRUCTcolumns transparently — nested fields inside aSTRUCT<VARIANT>keep their typing.
Other erasure shapes (STRING, BINARY) lose type information; the
mask would need to roundtrip through string serialisation and the
query plan would have to cast back to the original type, which breaks
optimisation.
Operational properties¶
- One masking convention per category, not per type. A team that
wants to "replace numeric values with zero" writes one UDF; ABAC
policies referencing that UDF cover every numeric column tagged
with the relevant governed tag — regardless of whether the column
is
INT,DOUBLE, orDECIMAL. - Lower policy count, which matters under the policy-ceiling constraints — see systems/unity-catalog-abac for the 10K/100 limits.
- Easier evolution: when the masking convention changes, one UDF is updated rather than N type-specific UDFs.
- Type checking deferred to runtime: the dispatch on
typeof(v)happens at query time. The cost is per-row dispatch; the savings come from policy + UDF count reduction.
When it fits¶
- Many columns of different types share the same masking semantics (replace with zero, replace with empty, replace with hash, etc.).
- The masking output is type-stable — the function returns a value of the same type as the input.
- The query engine has good
VARIANTsupport (Unity Catalog SQL, Databricks SQL, Snowflake VARIANT — but not all SQL dialects).
When it doesn't fit¶
- Type-specific masking semantics. "Mask SSN as
XXX-XX-+ last four" requires string-typed input/output; mixing it with numeric masking via VARIANT obscures the semantics. - Cost-sensitive UDF evaluation. The dispatch + VARIANT roundtrip cost matters at very high query volumes; per-type UDFs can be inlined or pre-compiled more aggressively.
- Legacy SQL dialects without VARIANT. The pattern is not portable to engines that don't carry runtime type tags.
Sibling patterns¶
- One mask UDF per data type — what this pattern replaces; the pre-VARIANT shape.
- Cast-and-roundtrip via STRING — a poor erasure proxy that loses type information.
- Type-class polymorphism in language UDFs (Scala, Python) — same idea but the dispatch happens in the host language; the UDF catalog still needs separate signatures per type unless the engine compiles a generic catalog entry.
Seen in¶
- sources/2026-05-13-databricks-abac-row-filtering-and-column-masking-policies-governed-tags — Unity Catalog ABAC GA names this as one of three GA enhancements: "One masking function for many column types. A single UDF that accepts and returns VARIANT can mask INT, DOUBLE, DECIMAL, and other numeric types at once, and the same approach extends to STRUCT columns. This cuts down on the number of policies organizations need to maintain."
Related¶
- concepts/attribute-based-access-control — the policy primitive that invokes the UDF.
- concepts/governed-tag — the attribute substrate; one tag can drive many columns through one VARIANT UDF.
- systems/unity-catalog-abac — host system.
- patterns/tag-driven-attribute-based-access-control — composes with this pattern for further policy-count reduction.