Skip to content

PATTERN Cited by 1 source

Tag-driven attribute-based access control

Pattern summary

Author one access-control policy that names a tag, not the tables it applies to. Tags propagate from a managed taxonomy via a mix of human stewards and automated classifiers; the policy evaluates against whichever objects carry the tag at query time. Adding new tables / columns under a tag automatically picks up the existing policy without any per-object configuration step.

The pattern is the operational composition of three primitives:

  1. Governed tag taxonomy — managed vocabulary of tags (concepts/governed-tag).
  2. ABAC policies that reference tags — declarative row filter + column mask rules whose match condition is "objects carrying tag X" (concepts/attribute-based-access-control).
  3. Tag application — humans + automated classifiers (concepts/agentic-data-classification) attach tags to data.

The result is organize → detect → protect as one continuous pipeline (Source: sources/2026-05-13-databricks-abac-row-filtering-and-column-masking-policies-governed-tags).

When it fits

Property Why it matters
Many tables share the same sensitivity A single tag-driven policy covers all of them; per-table policies multiply.
New tables arrive continuously New tables auto-inherit catalog tags; classifiers tag new sensitive columns; policies pick up the new objects automatically.
Policy must persist across schema changes Tag-driven policies don't break when a table is renamed or a column is added.
Multiple roles need to participate in governance The pattern naturally splits across taxonomy authors, stewards, and data producers — see concepts/separation-of-duties-data-governance.
Compliance vocabularies are stable GDPR / HIPAA / PCI categories rarely change; a tag-driven policy referencing the compliance tag is stable.

When it doesn't fit

Property Why it breaks
Per-object exception is the norm If most tables need bespoke filters / masks not derivable from a tag, the per-object configuration that the pattern was meant to replace is what you actually want.
Policies depend on table-specific business rules "Mask SSN if the customer's parent account has had > 3 chargebacks in the last 90 days" is not a tag — it's a join. Hybrid solutions are needed.
Tag substrate is unreliable If tags drift, are missing on new data, or are applied inconsistently, ABAC policies have nothing to evaluate against. The pattern depends on the organize step working.
Legacy per-table row filters Coexistence with prior ROW FILTER / MASK clauses on the same table can produce undefined precedence; migration is required.

Composition shape

1. Governance team defines tag taxonomy
   ┌───────────────────────────────────────────┐
   │  pii:ssn                                  │
   │  pii:email                                │
   │  sensitivity:public                       │
   │  sensitivity:confidential                 │
   │  compliance:hipaa                         │
   │  compliance:gdpr                          │
   └───────────────────────────────────────────┘

2. Governance team writes ABAC policies that name the tags
   ┌───────────────────────────────────────────┐
   │  POLICY mask_ssn_unless_compliance:       │
   │    columns matching tag `pii:ssn`         │
   │    apply mask(ssn) UNLESS                 │
   │      current_user() IN compliance_role    │
   │                                           │
   │  POLICY filter_eu_data_for_non_eu:        │
   │    rows in tables tagged                  │
   │      `sensitivity:gdpr-eu-only`           │
   │    visible only to users with             │
   │      attribute country IN ('FR','DE',...) │
   └───────────────────────────────────────────┘

3. Tags get applied (human + automated)
   ┌───────────────────────────────────────────┐
   │  steward:    customers.ssn  → pii:ssn     │
   │  classifier: orders.email   → pii:email   │
   │  steward:    eu_users       → sensitivity:│
   │                                gdpr-eu-only│
   │  inheritance: catalog `prod_eu` →         │
   │     all child tables get gdpr-eu-only     │
   └───────────────────────────────────────────┘

4. ABAC evaluates at query time
   ┌───────────────────────────────────────────┐
   │  user → query → planner sees tags on      │
   │     each referenced object →              │
   │     applies matching ABAC policies →      │
   │     rewrites with row filter + column     │
   │     mask                                  │
   └───────────────────────────────────────────┘

5. New data arrives → step 3 (classifier auto-tags) → step 4 (ABAC
   matches automatically); no step 2 update needed.

Operational properties

  • Coverage scales with tag coverage, not policy count. The leading metric is "fraction of sensitive columns that carry a governance tag"; ABAC follows automatically.
  • Audit clarity: ABAC enforcement decisions can be traced back through the (column → tag → policy) chain. Compliance auditors can ask "which policy is masking this column?" and get a deterministic answer.
  • Per-query overhead: ABAC adds policy evaluation at SQL planning time. The overhead scales with policies-in-scope, not policies-in-metastore. Per-catalog / per-schema policy ceilings bound the per-query cost.
  • Fragility on tag drift: a tag missing from a sensitive column silently disables ABAC enforcement on that column. The pattern depends on detect producing high-recall tags; the organize → detect → protect chain breaks at its weakest link.

Sibling / contrast patterns

Anti-patterns

  • Tag-as-string (no governed vocabulary). Stewards type free-form tag names; one team uses pii_ssn, another uses pii.ssn, a third uses social_security_number. Policy string-matches break silently.
  • Inline policy authoring per table. Authors paste a tag-aware policy template into a per-table configuration; defeats the centralisation that made the pattern valuable.
  • Policy ↔ tag drift. Renaming a tag without updating policies that reference it; deletion of a tag without an explicit policy-cleanup step.
  • Classifier confidence opaque to policy. Tagging by classifier with no confidence threshold in the policy evaluation; low-confidence detections drive low-precision masking.

Seen in

  • sources/2026-05-13-databricks-abac-row-filtering-and-column-masking-policies-governed-tags — Unity Catalog GA. Canonical first wiki instance of the organize → detect → protect pipeline as a single coherent governance shape spanning the three primitives. "There is no handoff between systems, and no manual step between discovery and protection." Customer testimonials emphasise the shape payoff: Atlassian ("significantly reducing the operational overhead of managing permissions at scale. What used to require extensive manual permission management now happens dynamically"), Udemy ("Fewer policies, lower costs, surgical precision").
Last updated · 542 distilled / 1,571 read