CONCEPT Cited by 1 source
Governed SHAP attribution table¶
A governed SHAP attribution table is a Delta Lake table — registered in Unity Catalog with the same RBAC + lineage + audit posture as any other production table — that stores per-prediction Shapley-value attributions alongside the prediction and its model version.
The substrate properties:
- Versioned in systems/mlflow — every row has a model-version identifier; rows from a retrained model live alongside rows from the prior model and the prior-prior model.
- Lineaged through UC — the lineage graph traces the prediction from the training-data source tables through the model artifact to the attribution row.
- SQL-queryable — a regulator's question ("why was this
recommendation made?") becomes a
SELECTagainst the table, not a vendor-support ticket against an opaque scoring service.
Canonical wiki source: sources/2026-05-13-databricks-clinical-operations-intelligence-belongs-on-the-lakehouse — "every prediction carries a SHAP attribution stored as a governed Unity Catalog Delta table — versioned in MLflow, lineaged through Unity Catalog, queryable — the rationale behind a site selection is as auditable as the score itself."
The decoupled alternative this is contrasted against¶
Conventional regulated-ML stacks treat explainability as a runtime artifact — generated on demand by a model-explainer service when a regulator asks. Two failure modes:
-
The attribution and the prediction don't ride together. A prediction was made on day N; the regulator asks on day N+90; the explainer is re-run on the current model state, which is no longer the model that produced the prediction. The audit chain has a gap.
-
Population-level audit is not possible. A regulator's question "are community sites systematically under-weighted across the recommendations you made last quarter?" cannot be answered by an explainer service that runs per-request. The query is structurally a population query —
GROUP BY site_typeover the attribution table — and the substrate has to be a queryable population, not a per-request re-evaluation.
The 2026-05-13 source frames the storage shift as the architectural move that closes both gaps: the attribution is stored at the moment of prediction, alongside the prediction, with the model version that produced it, in a queryable Delta table.
Why the three properties matter¶
1. Versioned via MLflow¶
Every attribution row carries the model-version identifier. When a regulator asks about a recommendation made nine months ago, the audit trail leads to the exact model version that produced it — not the current production version. MLflow is the registry that makes the version reference durable.
This is the temporal-correctness property: "the rationale behind a site selection is as auditable as the score itself" — same ephemeral lifetime, same versioning regime.
2. Lineaged through Unity Catalog¶
The lineage graph in UC traces:
training data tables
│
▼
model artifact (versioned in MLflow)
│
▼
prediction + SHAP attribution row
│
▼
audit / fairness / regulatory query
This is the upstream-completeness property: a regulator can walk the lineage backwards from the recommendation through the model to the training data, all in one governance system. No cross-system correlation.
3. SQL-queryable¶
The substrate is a Delta table, so:
- "Why was site X recommended?" →
SELECT feature, contribution FROM site_feasibility_attributions WHERE recommendation_id = ... - "Are community sites systematically under-weighted?" →
SELECT site_type, AVG(community_score_attribution) FROM ... GROUP BY site_type. - "Did this feature's contribution flip sign across model versions?"
→
SELECT model_version, AVG(feature_contribution) FROM ... WHERE feature = 'investigator_history' GROUP BY model_version.
This is the population-queryability property — the architectural enabler for "explainability as a fairness control" (concepts/explainable-ai-decision).
Fairness as a population query¶
The 2026-05-13 source canonicalises the fairness-via-SQL framing: "sponsors can audit recommendations for systematic under-weighting of community sites, minority-serving institutions, or first-time investigators — turning explainability into a fairness control." The key word is systematic: per-prediction inspection cannot detect a systematic bias; only a population aggregation can. The governed-SHAP- attribution-table substrate is the queryable population.
The regulatory backdrop: FDA's Diversity Action Plan expectations under FDORA 2022 require sponsors to show that site-selection processes don't systematically exclude under-represented populations. The SHAP-attribution-Delta-table substrate makes that demonstration a SQL query.
What this concept depends on¶
The governed-SHAP-attribution-table only works as an audit substrate when the surrounding stack provides:
- A governance system that registers ML artifacts as first-class data assets. Unity Catalog provides this — predictions and attributions are governed the same way as any other table.
- A model registry that issues durable, versioned identifiers. MLflow provides this; the model-version identifier in each row is what makes the audit chain temporally correct.
- A storage format with ACID + time-travel + schema evolution. Delta Lake provides this; population queries that filter by model_version + time range work without ETL.
- An application architecture where the audit chain isn't broken by sync boundaries. concepts/single-platform-application-architecture is the architectural enabler — when the app reads from UC and the model writes attributions to UC, the chain is unbroken.
Seen in¶
- sources/2026-05-13-databricks-clinical-operations-intelligence-belongs-on-the-lakehouse — Canonical wiki source for this concept. "Every prediction carries a SHAP attribution stored as a governed Unity Catalog Delta table — versioned in MLflow, lineaged through Unity Catalog, queryable — the rationale behind a site selection is as auditable as the score itself." Three load-bearing implications: (1) a clinical affairs team can "answer a question from a data monitoring committee with a SQL query, not a black-box vendor report", (2) "sponsors can audit recommendations for systematic under-weighting of community sites, minority-serving institutions, or first-time investigators — turning explainability into a fairness control", (3) the regulatory drivers (21 CFR Part 11, ICH E6(R3), FDA GMLP) make this "material considerations, not optional features." Reference implementation: systems/site-feasibility-workbench. Canonical instance of the patterns/shap-attribution-as-governed-delta-table pattern.
Related¶
- concepts/explainable-ai-decision — the explainability primitive this concept stores.
- concepts/single-platform-application-architecture — the architectural shape that makes this audit substrate coherent.
- patterns/shap-attribution-as-governed-delta-table — the implementation pattern.
- systems/delta-lake — the storage substrate.
- systems/unity-catalog — the governance + lineage substrate.
- systems/mlflow — the model-version registry.
- systems/databricks-apps — the application runtime that reads the attribution table for end-user-facing audit explanations.
- systems/site-feasibility-workbench — reference implementation.