DATABRICKS 2026-05-15

Databricks — Backstage with Lakebase (Part 2: Governance)¶

Summary¶

Thoughtworks Part 2 of a three-part series (Part 1: Deployment Cycles, Part 2: Governance, Part 3: FinOps) on running Backstage (Spotify's state-heavy Internal Developer Portal) on Databricks Lakebase. Where Part 1 quantified the developer-cycle payoff of compute–storage separation (1.09 s branch / 3.78 s PITR), Part 2 makes the governance payoff mechanically precise: because Lakebase is natively embedded inside Databricks, Unity Catalog extends directly over the operational Postgres database — the operational tier and the analytical tier no longer live in two security paradigms. The POC exposes the Backstage catalog as a foreign catalog (lakebase_bs) in Unity Catalog via Lakehouse Federation; standard UC grants then control who sees what, with no Postgres-level role management. Three coordinated governance properties drop out: a unified audit trail (every control-plane branch action lands in system.access.audit in Unity Catalog, queryable by one SQL statement); automated branch-level cost attribution (compute costs break down by project_id, branch_id, and endpoint_id against UC's billing tables — the production branch billed 31.6130 DBU, the dropped test branch 0.0107 DBU); and branch-propagated masking policies (UC's attribute-level masks inherit automatically when a Lakebase branch is created, so the CI / QA / feature branches that Part 1 made cheap can't leak unmasked production PII). Two Thoughtworks-built open-source companions show what the DBA does with the time freed up: LakebaseOps (Databricks App; three agents — Provisioning / Performance / Health — replacing 51 historical DBA tickets, plus seven scheduled Databricks Jobs replacing pg_cron, a 9-KPI adoption dashboard, and a migration wizard scoring ten source engines including Aurora / RDS / Cloud SQL / AlloyDB / Cosmos DB) and Lakebase MCP (Model Context Protocol server exposing 46 tools to MCP-capable AI agents — Claude / Copilot / GPT — with dual-layer governance: a SQL-statement guard and a per-tool access guard, four pre-built profiles read_only / analyst / developer / admin, and every query attributable through tool tags on the originating statement). The thesis: when the operational DB lives inside the same governance substrate as the warehouse, the "compliance side-channel" problem disappears — "one SQL query instead of three services."

Key takeaways¶

Unity Catalog absorbs the operational database — the single most load-bearing claim in the post. Verbatim: "Because Lakebase is natively embedded inside Databricks, Unity Catalog extends directly over the operational Postgres database… We didn't just change where the data lived; we changed where the access policy lived." The mechanism: in this POC, Lakehouse Federation exposes the Backstage catalog as a foreign catalog (lakebase_bs) in Unity Catalog. Once federated, "standard UC grants control who can see what, no Postgres-level role management required." Canonical instance of concepts/operational-analytical-governance-unification + patterns/foreign-catalog-federation-for-operational-db-governance. (Source: sources/2026-05-15-databricks-backstage-with-lakebase-part-2)
The traditional alternative the post explicitly contrasts with is a three-service compliance forensic exercise. Verbatim: "To audit a single table drop on RDS, you'd need to cross-reference CloudTrail for the IAM principal, pg_stat_activity or pgaudit logs for the SQL statement, and CloudWatch for the timestamp, three services, three query languages, three access policies. The operational database becomes a compliance side-channel." The post-migration claim: one SQL query against system.access.audit replaces all three.
Every Lakebase control-plane action is automatically recorded in system.access.audit — Unity Catalog's audit substrate. Verbatim: "every control-plane action against the operational database is automatically recorded in system.access.audit." The Part 1 disaster-recovery experiment is the worked example: "Every branch creation and deletion from our Part 1 experiments is logged. Each event is tied to a specific OAuth user identity and source IP, captured automatically, and governed by the exact same Row-Level Security controls as every other audit table in Unity Catalog." The single-substrate property — audit logs sitting in the same place, governed the same way, queried with the same SQL — is what makes "one SQL query instead of three services" mechanically true.
Branch-level cost attribution is automatic via UC's system billing tables. Verbatim: "Because Lakebase integrates natively with Unity Catalog's system billing tables, compute costs break down automatically by project_id, branch_id, and endpoint_id." Concrete numbers from the POC: "the production branch was billed at 31.6130 DBU, while the dropped test branch was independently attributed 0.0107 DBU. The audit trail and the cost trail are governed in the exact same place." The structural property: cost-attribution and audit-attribution share the same identity dimensions (project_id / branch_id / endpoint_id) and the same substrate (UC system tables), eliminating the "custom CloudWatch tagging strategies that often miss short-lived workloads" problem with ephemeral RDS instances. Canonical instance of concepts/branch-level-cost-attribution.
Branch-propagated masking policies — the load-bearing composition with Part 1's branching primitive. Verbatim: "When a Lakebase branch is created, Unity Catalog's attribute-level masking policies propagate automatically to the new branch. A developer working on their feature branch never sees unmasked production data — not because someone remembered to configure it, but because the governance layer enforces it at creation time. The CI branch that runs your PR tests is governed identically to production. The QA branch where a tester runs destructive scenarios is governed identically to production." The structural claim: the team-of-six / 30–40-branches-per-sprint workflow Part 1 enabled would create 30–40 PII-leak surfaces under traditional manual-masking provisioning; branch-creation-time policy propagation makes the sprint-rate-of-branches independent of the manual-masking bottleneck. Canonical instance of concepts/branch-level-governance-propagation. The economic anchor cited: "According to Perforce's 2025 State of Data Compliance report, 60% of organizations have experienced breaches or theft in non-production environments where sensitive data was inadequately anonymized."
The DBA role evolution thesis: from 30+ operational tickets per sprint to <5 high-value policy reviews. Verbatim: "What opens up is the work that actually leverages the DBA's expertise: defining branching policies, designing governance rules, architecting promotion workflows, tuning performance, and establishing the guardrails that make self-service safe. The DBA shifts from doing the work to designing how the work gets done — from 30+ operational tickets per sprint to fewer than 5 high-value policy reviews. The audit trail demonstrated above isn't just a compliance artifact — it's the DBA's new strategic dashboard, a real-time view of how the platform is being used and where to invest next." Caveat — this is the aspirational organizational-shift framing, not a measured organizational outcome from the POC; the architectural prereq (self-service branching + automatic governance) is delivered, the role-shift is asserted, not measured.
LakebaseOps — the platform-side leg of "what opens up." Verbatim: "LakebaseOps is what the platform does on its own. Three agents — Provisioning, Performance, and Health — replace 51 of the tasks a DBA used to file tickets for. Seven of them run as scheduled Databricks Jobs and replace the pg_cron crontab a DBA would otherwise hand-maintain. A monitoring UI surfaces live pg_stat metrics, slow-query regressions, branch TTL enforcement, and a 9-KPI adoption dashboard. A migration wizard scores ten source engines (Aurora, RDS, Cloud SQL, AlloyDB, Cosmos DB, and more) against Lakebase, with live pricing from the AWS and Azure APIs." Deployed as a Databricks App; governed by "the same Unity Catalog grants and audit trail described above." Open source: github.com/suryasai87/lakebase-ops-platform.
Lakebase MCP — the DBA-on-top-of-the- platform leg, with two safety design choices canonicalised. Verbatim: "Lakebase MCP is what the DBA does on top of the platform. A Model Context Protocol server exposing 46 tools to any MCP-capable AI agent (Claude, Copilot, GPT). The DBA stops opening pgAdmin and starts describing intent." Two design choices keep this safe: "First, dual-layer governance: a SQL-statement guard and a per-tool access guard, with four pre-built profiles (read_only, analyst, developer, admin) that map onto the same UC access patterns shown above. A coding assistant runs as read_only and physically cannot drop a table. Second, every query is attributable — the server tags every statement with the originating tool." Combined with the branch-level cost attribution: "you can answer 'which agent on which branch generated the 4 AM CPU spike?' in one SQL query." Open source: github.com/suryasai87/lakebase-mcp. Canonical instances of patterns/dual-layer-governance-sql-and-tool-guards + patterns/tool-tagged-query-attribution.
The tool-tag attribution mechanism is the missing leg of AI-agent-DB-access governance. The post's framing: "Combined with the branch-level cost attribution shown earlier, you can answer 'which agent on which branch generated the 4 AM CPU spike?' in one SQL query." The structural property: agent identity propagates to every SQL statement as a server-side tag, joining the same system.access.audit + system billing substrate that humans + branches already populate. Without this leg, AI-agent DB access is opaque from the audit/billing side — you see traffic from a single MCP service principal but can't distinguish which agent ran which tool against which branch.
LakebaseOps + Lakebase MCP are explicitly framed as the "tooling shifts with the role" pair. Verbatim: "The platform has to do the routine work on its own, and the DBA needs a place to design how that work gets done. Two open-source tools, both deployed as Databricks Apps and both governed by the same Unity Catalog grants and audit trail described above, close that loop. … LakebaseOps runs for the team. Lakebase MCP runs with the team. Both inherit the governance posture you just saw." The two tools are the practical realisation of the role-shift; the governance inheritance is what makes them safe to compose with the platform.

Architectural primitives table¶

Primitive	What it absorbs	UC substrate
Foreign catalog `lakebase_bs` (systems/lakehouse-federation)	Postgres-level RBAC complexity	UC GRANT model
`system.access.audit`	CloudTrail + pgaudit + CloudWatch cross-reference	UC governed Delta table
System billing tables	Custom CloudWatch tags for ephemeral RDS	UC governed Delta table with `project_id` / `branch_id` / `endpoint_id` columns
Attribute-level masking policies	Manual data-masking on dev/test provisioning	UC policy + branch-creation-time propagation
Lakebase MCP profiles (`read_only`/`analyst`/`developer`/`admin`)	Per-MCP-tool ACL maintenance	Maps onto same UC GRANT patterns
Per-statement tool tag (Lakebase MCP)	Opaque agent identity in audit/billing	Joins `system.access.audit` + system billing

Operational numbers¶

Number	Context
`31.6130 DBU`	Production branch cost attribution (one specific entry from UC system billing tables)
`0.0107 DBU`	Dropped test branch cost attribution (independent attribution to a transient branch — the 1-hour-then-destroyed disaster-recovery test from Part 1)
30–40 branches	Sprint-volume of database copies from a 6-developer team running 2-week sprints (the workflow that motivates branch-propagated masking)
60%	Organisations breached in non-production environments due to inadequate anonymisation (Perforce 2025 State of Data Compliance)
51	Tasks LakebaseOps' three agents (Provisioning / Performance / Health) replace
7	Of those 51 that run as scheduled Databricks Jobs, replacing the pg_cron crontab
9	KPIs in the LakebaseOps adoption dashboard
10	Source engines the LakebaseOps migration wizard scores against Lakebase (Aurora / RDS / Cloud SQL / AlloyDB / Cosmos DB / others)
46	Tools exposed by Lakebase MCP to MCP-capable AI agents
4	Lakebase MCP pre-built access profiles (`read_only` / `analyst` / `developer` / `admin`)
30+ → <5	Aspirational DBA ticket volume per sprint (operational tickets → high-value policy reviews)

Caveats¶

Tier-3 borderline-include source. Databricks Blog is Tier-3 per AGENTS.md. This post passes scope as the second of a three-part Thoughtworks guest series with substantive architecture-content (~70%): named primitives (Lakehouse Federation, foreign catalog lakebase_bs, system.access.audit, system billing tables, branch-propagated masking, dual-layer MCP governance, tool-tag attribution) + a worked example (the Part 1 disaster-recovery experiment audit-trail query) + concrete cost attribution numbers + two open-source code references. The PR framing — "a quieter shift … the DBA's new strategic dashboard" — is decoration on substantive engineering content. Sibling ingest precedent: the [[sources/2026-04-30-databricks-backstage-with-lakebase|2026-04-30 Part 1 ingest]] passed scope on the same Tier-3-Thoughtworks- guest-architecture-density profile.
Lakehouse Federation foreign-catalog mechanism is asserted at the capability level, not at the wire-protocol level. The post says the catalog was "exposed as a foreign catalog (lakebase_bs)" but does not disclose: how the federation layer translates UC GRANT semantics to Postgres-level enforcement decisions, what query path UC ABAC policies take through the federation engine, whether row filters / column masks are pushed down to Lakebase or evaluated above the federation boundary, what the latency envelope of a UC-governed Lakebase query is vs a direct Postgres query, and whether Postgres-side write-path enforcement uses the same path as read-path.
system.access.audit granularity is shown for control-plane events (branch create / delete) but not for data-plane events (DML inside the Postgres branch). The post's worked example is control-plane: "Every branch creation and deletion from our Part 1 experiments is logged." It is silent on whether arbitrary Postgres DML statements inside Lakebase reach system.access.audit — that gap matters because "to audit a single table drop on RDS" (the explicit pain-point baseline) is a data-plane DDL operation. The post's "three services" before-state was about a DDL audit; the after-state worked example is about control-plane branch-lifecycle audit; the two are not strictly the same shape.
Branch-propagated masking is asserted, not benchmarked. "A developer working on their feature branch never sees unmasked production data — not because someone remembered to configure it, but because the governance layer enforces it at creation time." The mechanism (UC attribute-level masking policies with branch as scope-extension) is named at the capability level; not disclosed: the policy-evaluation path on the new branch, whether the policies cover Postgres-side reads or only UC-federated reads, what the policy-propagation latency is at branch creation, and how the scope is expressed (parent-of-the-branch? the original schema? some explicit branch-policy binding?).
The DBA-role-evolution claim is aspirational organisational framing, not measured. "30+ operational tickets per sprint to fewer than 5 high-value policy reviews" is a recommended target shape, not a before/after measurement of the Thoughtworks POC team. The architectural prerequisites (self-service branching + automatic governance + LakebaseOps + Lakebase MCP) are delivered by the POC; the role-volume claim is the "if you adopt this fully" framing, not a measured outcome.
LakebaseOps and Lakebase MCP are open-source author-projects, not Databricks-shipped products. Both repositories are under github.com/suryasai87/ (the post's Thoughtworks author). The Databricks-Blog distribution is the platform-vendor surface for the projects, but the projects are author-built reference implementations. The "deployed as Databricks Apps" and "both governed by the same Unity Catalog grants and audit trail" framing is about composition with first-party Databricks primitives (Apps + UC), not about the projects being first-party Databricks features. The 51-tasks / 9-KPIs / 10-engines / 46-tools numbers describe the open-source project scope, not a Databricks-shipped product scope.
Tool-tag attribution is asserted at the "every query" granularity but not detailed at the SQL level. The post says "the server tags every statement with the originating tool" but does not show: where the tag lives in the SQL stream (SET application_name? a structured comment? pgsetapp-equivalent? a UC-level tag separate from the SQL?), how it propagates from the agent (per-tool-invocation? per-session?), how it interacts with connection pooling, and whether it survives the SQL-statement-guard rewriting layer. The composability claim ("which agent on which branch generated the 4 AM CPU spike") is mechanically plausible if tags reach system.access.audit and the system billing tables, but the source path is not disclosed.
Cross-part series — this is Part 2. Part 3 (FinOps) is forthcoming and explicitly previewed: "In Part 3 of this series, we will look at the ultimate payoff: taking the infrastructure ownership data inside Backstage and joining it directly to cloud billing data in a single SQL query." That ingest will likely cover the full operational↔billing join topology that this post hints at via the 31.6130 / 0.0107 DBU numbers. No Part 2 disclosure of the cross-system join shape yet.
No row-level security (RLS) policies actually built for Backstage in this POC. Verbatim: "While we didn't build end-to-end Row-Level Security policies for Backstage in this POC, architecturally, the exact same RLS rules that protect sensitive billing tables can be applied directly to these operational tables." The architectural symmetry is asserted; the actual RLS deployment is reserved for "the wall between 'operational' and 'analytical' stops being a physical boundary, and simply becomes an access pattern" framing.

Source¶

Original: https://www.databricks.com/blog/backstage-lakebase-part-2
Raw markdown: raw/databricks/2026-05-15-backstage-with-lakebase-part-2-7489c477.md
Part 1 of the series: sources/2026-04-30-databricks-backstage-with-lakebase (Deployment Cycles)
LakebaseOps repository: github.com/suryasai87/lakebase-ops-platform
Lakebase MCP repository: github.com/suryasai87/lakebase-mcp
Unity Catalog product page: databricks.com/product/unity-catalog
Lakebase product page: databricks.com/product/lakebase