SYSTEM Cited by 1 source
Site Feasibility Workbench¶
The Site Feasibility Workbench is an open-source clinical-trial site-selection application released by Databricks in May 2026 — pitched as the first public reference implementation of a Databricks App composed onto systems/lakebase, systems/unity-catalog, AI/BI Genie, and systems/mlflow.
Stub page. The wiki tracks this system primarily as a reference implementation of the single-platform application architecture thesis, not for the clinical-trials domain content. Source repository: databricks-industry-solutions/site-feasibility-workbench-open.
What it does¶
Six-step guided workflow for clinical-trial site selection:
- Protocol selection — therapeutic area / phase / inclusion-and-exclusion criteria.
- Score constraints — diversity weighting, minimum-enrollment thresholds, geographic preferences.
- Geographic overview — site distribution map.
- Site ranking — composite feasibility scores from TA-segmented LightGBM models.
- SHAP-driven site deep dive — per-prediction feature attribution for each candidate site.
- Final shortlist — saved shortlists persisted to Lakebase for team sharing.
Cross-cutting: AI/BI Genie answers cross-domain natural-language questions against the same Unity Catalog tables the ML models trained on.
Stack¶
| Layer | Component | Notes |
|---|---|---|
| Frontend | React | Workflow UI + map / charts / shortlist tables. |
| Backend | FastAPI (Python) | Routes through the Databricks Apps runtime. |
| Auth | Workspace service principal | App identity is provisioned by the workspace identity system. |
| Analytical data | Unity Catalog via SQL Statement API | Site features, historical performance, predictions, SHAP attributions, audit log. |
| Operational state | systems/lakebase (Postgres) | Saved shortlists, team-sharing state. |
| NL query | AI/BI Genie via workspace REST API | Embedded in the workflow, not a separate UI. |
| ML | TA-segmented LightGBM models | Trained on sponsor's CTMS / EDC / IRT history. |
| Lineage | systems/mlflow + Unity Catalog | Every model version + every SHAP attribution lineaged. |
| Deployment | Databricks workspace with Unity Catalog | "Approximately 30 minutes of technical deployment time, before sponsor-specific security review and validation." |
All connections internal. No external API calls. No separate operational-DB infrastructure outside the workspace.
Composite feasibility score inputs¶
From the post: "Composite feasibility scores combine real-world evidence, patient access data, historical site performance, site qualification history, Open Payments KOL signal, and protocol execution factors — all driven by TA-segmented LightGBM models trained on the organization's own CTMS, EDC, and IRT history."
- CTMS — Clinical Trial Management System (sponsor-owned trial metadata + site relationships).
- EDC — Electronic Data Capture (per-trial subject-level data).
- IRT — Interactive Response Technology (randomization + drug supply per trial).
- CMS Open Payments — public dataset; "when used appropriately, correlates with research engagement and infrastructure and it is freely available."
- Real-world evidence + patient access data — the post does not detail data-source specifics for this leg.
The post emphasises the architecture-level claim: institutional sponsor data is the training data, not industry-aggregate data from a CRO scoring product. "Every new study makes the prediction better, and every new site relationship is reflected in the next training run."
Diversity as a first-class scoring dimension¶
Per the post: "Diversity considerations are a first-class scoring dimension, aligned with FDA's Diversity Action Plan expectations under FDORA 2022."
The architectural enabler is the governed SHAP attribution Delta table: "Sponsors can audit recommendations for systematic under-weighting of community sites, minority-serving institutions, or first-time investigators — turning explainability into a fairness control." The fairness audit is a SQL query against the queryable-attribution population, not a per-prediction inspection.
What it is not¶
Per the post: "This is a decision-support layer, not a source-of-record system. The CTMS/EDC/IRT remain authoritative. The workbench produces predictions whose lineage is governed in Unity Catalog and MLflow."
Position in the broader Clinical Operations Intelligence Hub¶
The post names the Site Feasibility Workbench as "the first public release of a broader architecture — the Databricks Clinical Operations Intelligence Hub — covering the full trial lifecycle":
- Site Feasibility and Selection — this Workbench.
- Patient Cohort and Recruitment — protocol-aligned cohort building from EHR + RWE at Lakehouse scale.
- Enrollment Velocity Optimizer — ML stall prediction per site per month with a 1–3 month forward horizon.
- Risk-Based Monitoring and Compliance — continuous monitoring for enrollment anomalies, data lags, and protocol deviations.
"All four deploy as Databricks Apps. All four query Unity Catalog directly. None make external API calls."
Why this matters for system design¶
The Site Feasibility Workbench is the wiki's first canonical instance of a fully open-source Databricks App that can be inspected as a reference for the single-platform architecture pattern. It's the existence proof that the "Architecture Argument" in the source post can be implemented end-to-end in a deployable artifact, not just described as a thesis.
For practitioners it answers a concrete question: "if I want to build a regulated decision-support app inside a Databricks workspace, what does the actual code shape look like?" — FastAPI + React, SQL Statement API for data, Lakebase for app state, Genie REST API for NL query, MLflow for model versioning, governed Delta tables for SHAP attributions.
Seen in¶
- sources/2026-05-13-databricks-clinical-operations-intelligence-belongs-on-the-lakehouse — First wiki disclosure, and to date the only one. Open-source release announcement; reference implementation framing for concepts/single-platform-application-architecture and patterns/shap-attribution-as-governed-delta-table. "Deploying into an existing Databricks workspace with Unity Catalog takes approximately 30 minutes of technical deployment time, before sponsor-specific security review and validation." GitHub: databricks-industry-solutions/site-feasibility-workbench-open.
Related¶
- systems/databricks-apps — the deployment model the Workbench exemplifies.
- systems/lakebase — operational-DB layer for saved shortlists.
- systems/unity-catalog — governance + access-control substrate for site features, predictions, SHAP attributions.
- systems/databricks-genie — embedded NL-query layer for cross-domain questions in the workflow.
- systems/mlflow — model versioning + lineage for TA-segmented LightGBM models and their SHAP attributions.
- systems/delta-lake — storage substrate for the governed attribution tables.
- concepts/single-platform-application-architecture — the architectural thesis the Workbench implements.
- patterns/shap-attribution-as-governed-delta-table — the audit pattern the Workbench canonicalises.
- patterns/in-workspace-app-as-decision-support — the deployment pattern.