Skip to content

PATTERN Cited by 1 source

SHAP attribution as governed Delta table

Pattern: when a regulated ML decision-support system makes a prediction, write the prediction's Shapley-value attribution to a governed Delta table in Unity Catalogalongside the prediction, lineaged through UC to the training data, versioned in systems/mlflow by the model that produced it. The attribution becomes a first-class queryable artifact with the same governance posture as any other production table.

The architectural payoff: "the rationale behind a site selection is as auditable as the score itself" — every regulator's question becomes a SQL query, every fairness audit becomes a population-level aggregate, every model-version trace becomes a one-line MLflow lookup.

Canonical wiki instance: sources/2026-05-13-databricks-clinical-operations-intelligence-belongs-on-the-lakehouse"Every prediction carries a SHAP attribution stored as a governed Unity Catalog Delta table — versioned in MLflow, lineaged through Unity Catalog, queryable."

Implementation shape

# At inference time, inside the prediction service:
prediction, shap_values = model.predict_with_explanation(features)

# Write prediction + attribution as one atomic write into UC.
spark.createDataFrame([{
    "recommendation_id":   uuid4(),
    "model_version":        mlflow.active_run().info.run_id,
    "prediction":           prediction,
    "feature_values":       features,            # struct
    "shap_attributions":    shap_values,         # array<struct<feature, contribution>>
    "predicted_at":         now(),
}]).write.format("delta") \
   .mode("append") \
   .saveAsTable("clinops.audit.site_feasibility_attributions")

The table is registered in UC with the same ABAC policies, governed tags, and data classifiers as any other production table. PHI handling rides on the catalog's HIPAA Safe Harbor / Expert Determination posture configured at the catalog or schema level.

Three property guarantees

The pattern's load-bearing property guarantees:

  1. Temporal correctness via MLflow versioning. "Versioned in MLflow" — every row carries the model-version identifier; the audit chain leads to the exact model version that produced the prediction, not the current production version.

  2. Upstream completeness via UC lineage. "Lineaged through Unity Catalog" — the lineage graph traces the prediction backwards to the training-data tables, the feature-engineering pipelines, and the data sources. A regulator can walk the chain end-to-end inside one governance system.

  3. Population queryability via SQL. "Queryable" — Delta is the substrate, so per-prediction inspection (SELECT WHERE recommendation_id) and population aggregation (GROUP BY site_type) both work directly without ETL.

When the pattern applies

  • Regulated ML decision-support systems where explainability is required not optional. The 2026-05-13 source frames this through three regulatory drivers: 21 CFR Part 11 (electronic records and signatures), ICH E6(R3) (good clinical practice), and FDA GMLP (good machine learning practice).
  • Fairness controls require population-level audit. Per the source: "Sponsors can audit recommendations for systematic under-weighting of community sites, minority-serving institutions, or first-time investigators — turning explainability into a fairness control." Systematic bias can only be detected by aggregation over a queryable population.
  • The substrate already has UC + MLflow + Delta. The pattern composes onto an existing Databricks-style governed Lakehouse; on other substrates the equivalent shape (registry + lineage + ACID storage with population queries) is needed before this pattern can be applied directly.

When it doesn't fit

  • Pure-text generative models where a Shapley-value-style attribution doesn't map to a feature vector. (LLM-as-judge audit shapes belong elsewhere — see concepts/llm-as-judge and patterns/llm-judge-as-inline-pipeline-stage.)
  • High-throughput recommendation systems where attribution generation cost is prohibitive. SHAP attribution at every inference is expensive; the pattern fits regulated low-volume high-stakes decision-support, not consumer-scale recommendation ranking.
  • Models where the substrate doesn't support time-travel or schema evolution. The pattern assumes you can store attribution rows from old model versions alongside attribution rows from the current model — Delta time-travel and schema-on-read make this cheap; non-ACID columnar substrates make it painful.

Trade-offs

Axis Cost Benefit
Storage Each prediction stores N feature-contribution values. With ~50 features × ~1M predictions = 50M rows of attributions per training cycle. Delta compresses well, but it's not free. The attribution population is the substrate for fairness audit, regulatory inquiry, model-debugging, and post-hoc analysis.
Compute SHAP at inference is 5-50× the cost of the prediction itself for tree-based models, more for deep networks. The cost is paid once at inference; querying the attribution table later is just a SELECT.
Lineage authoring Someone has to wire the model-version + UC-lineage references into the write path. One-time cost; pays off on every audit query.
Schema evolution When the feature set changes, the attribution schema changes — schema-on-read evolution required. Delta supports this natively; the audit chain spans schema versions.

Adjacent patterns

  • patterns/governed-delta-tables-per-modality — same Delta-as-governance-substrate framing but for raw multimodal training data. The two patterns pair: training data lands in modality-tagged Delta tables under UC, predictions plus SHAP attributions land in audit-tagged Delta tables under the same UC.
  • **This pattern
  • patterns/in-workspace-app-as-decision-support** — the audit table is read by an in-workspace app to render per-prediction explanations to end users via the same SQL Statement API that serves the rest of the app's data path.
  • patterns/llm-judge-as-inline-pipeline-stage — LLM-judge scores stored as governed Delta tables for pipeline-quality audit; same architectural shape applied at a different ML altitude.
  • concepts/explainability-log-shaped patterns — the generalised shape of "decision + explanation + version stored together" appears across regulated-ML literature; SHAP-as-governed-Delta is the Lakehouse-native instantiation.

Why the substrate matters

The naive alternative — "generate the explanation on demand when a regulator asks" — has two structural failure modes:

  • The model has changed. A prediction made on day N gets re-explained on day N+90 against the current production model, which is not the model that produced the prediction. Audit chain broken.
  • Population audit is impossible. The question "are community sites systematically under-weighted?" requires aggregation over thousands of past predictions. An on-demand explainer service can't produce that population — only the storage-as-population substrate can.

The pattern eliminates both failure modes by making the attribution the same kind of artifact as the prediction: stored, versioned, governed, queryable.

Seen in

Last updated · 542 distilled / 1,571 read