Databricks — Backstage with Lakebase (Part 1: Deployment Cycles)¶
Summary¶
Thoughtworks ran a proof-of-concept
ripping Backstage (Spotify's state-heavy Internal
Developer Portal) off its standard Postgres database and pointing it
at Databricks Lakebase (Neon-lineage serverless
Postgres). The post is Part 1 of a three-part series (Part 1: Deployment
Cycles, Part 2: Governance, Part 3: FinOps) and focuses on what happens
to database development cycles when creating a copy of the database
becomes functionally free. Two operational datapoints anchor the
architecture discussion: a 63 MB Backstage catalog branch lands in
1.09 seconds (data plane), and a Point-in-Time Recovery from
deleted state completes end-to-end in 3.78 seconds. The thesis is
that this collapses two separate engineering practices into the same
primitive — branching is PITR with source_branch_time = now — and
rearranges the development cycle enough to deprecate 20-30% of test
code (mock objects for database interfaces).
Key takeaways¶
-
Wire-protocol-Postgres compatibility is the first-order integration property. "Because it speaks wire-protocol Postgres, Backstage doesn't know or care that it isn't talking to RDS." Backstage's application logic, Knex migrations, and
PgSearchEngineswap all ran cleanly after pointingapp-config.yamlat Lakebase. The only integration friction was at the auth tier, not the protocol tier. This is the operational payoff of Lakebase's Neon-lineage choice to keep upstream Postgres semantics while rewriting the storage layer. (Source: sources/2026-04-30-databricks-backstage-with-lakebase) -
Lakebase rejects classic Databricks PATs; expects OAuth JWTs. Verbatim: "Lakebase rejects classic Databricks Personal Access Tokens, expecting an OAuth JWT instead." The Databricks CLI provides
databricks postgres generate-database-credentialwhich mints a scoped, short-lived JWT for a specific endpoint — "the intended approach for apps and CI." For the Backstage POC, Thoughtworks wrapped the command in a lightweight cron script that rewroteDATABRICKS_TOKENin the.envfile every 50 minutes to handle token expiration. Canonical patterns/credential-refresh-cron-as-auth-compat-shim — the gap between the short-lived-JWT model Lakebase prefers and the long-lived-credential model a legacy integration assumes. See also concepts/oauth-jwt-short-lived-credential. (Source: sources/2026-04-30-databricks-backstage-with-lakebase) -
Branching is instant because it's a pointer, not a copy. The post names the mechanism explicitly: "Because Lakebase separates storage from compute using a copy-on-write architecture, creating a branch doesn't copy any data, it creates a pointer to the same underlying pages, and only diverges on write." This is the concepts/copy-on-write-storage-fork primitive; the Neon-lineage systems/pageserver-safekeeper is the substrate that makes it possible. What the CMK-era Lakebase page disclosed as a storage architecture is named here as a developer-cycle primitive. (Source: sources/2026-04-30-databricks-backstage-with-lakebase)
-
Branch API requires a
spec-nested body with an explicit lifetime. Undocumented gotcha: "the request body must nest everything inside a spec object, and you must specifyttl,expire_time, orno_expiry. Without that, the API returns 'Expiration must be specified.'" This is the first wiki-ingested concrete detail of Lakebase's branch-creation API surface — the lifetime declaration is mandatory, not optional. Canonicalises a design choice: branches are short-lived by default and long-lived-ness requires explicit opt-in. (Source: sources/2026-04-30-databricks-backstage-with-lakebase) -
Disclosed branching throughput: ~63 MB Backstage catalog → 1.09-second data-plane clone. "The control plane acknowledged it instantly. The actual data-plane clone of the ~63 MB Backstage catalog landed in 1.09 seconds." First wiki operational datapoint on Lakebase/Neon branch-creation time at MB-scale dataset granularity — prior ingests (LangGuard 2026-04-27, Stripe Projects 2026-04-29) disclosed branching latency only as "seconds" or "sub-350 ms" for cold Postgres provisioning. This post separates the control-plane acknowledgement (instant) from the data-plane clone (1.09 s for 63 MB). At this size the branching cost is dominated by fixed setup, not data volume — predictable from the copy-on-write architecture. (Source: sources/2026-04-30-databricks-backstage-with-lakebase)
-
Point-in-Time Recovery (PITR) completes end-to-end in 3.78 seconds. The POC wiped
final_entities(32 rows → 0), then created a recovery branch from a timestamp captured seconds before the delete. "The elapsed time end-to-end was 3.78 seconds. Verifying the data confirmed the recovered branch had all 32 entities back; production was still at zero, confirming the delete was real and the branches are fully isolated." Canonical concepts/point-in-time-recovery at Lakebase/Neon altitude; completes at an order of magnitude faster than the traditional snapshot-restore shape (minutes to hours for RDS). (Source: sources/2026-04-30-databricks-backstage-with-lakebase) -
WAL-record granularity: recovery snapped backward 12 seconds to the nearest record. "Notably, we asked for 22:56:02Z, but Lakebase snapped to 22:55:50Z, 12 seconds earlier, snapping backward to the nearest WAL record." Canonicalises concepts/wal-record-granularity as a first-class property: PITR granularity is bounded by WAL-record cadence, not by the caller's timestamp precision. Always rounds backward to the nearest known durable state — a structural property, not a bug — but load-bearing for time-sensitive recovery workflows because the user's chosen target time is best-effort. The incident cycle still ran in under a minute. (Source: sources/2026-04-30-databricks-backstage-with-lakebase)
-
Branching is PITR-with-time-now. Architectural unification disclosed in one sentence: "Branching and Point-in-Time Recovery (PITR) are essentially the same primitive: branching is just PITR with
source_branch_time = now." Canonical patterns/branching-is-pitr-with-time-now. The two operations are the same control-plane call with a different time parameter; the storage substrate is the same concepts/copy-on-write-storage-fork. This unification is architecturally load-bearing because it means every risky operation gets a dry run and every incident gets an undo — "When database state becomes a cheap, forkable artifact instead of a 2 TB EBS volume, every risky operation gets a dry run, and every incident gets an undo." (Source: sources/2026-04-30-databricks-backstage-with-lakebase) -
Cheap branching deprecates mock objects. The post's sprint- cycle comparison makes the claim concrete: "In our experience across multiple partner teams evaluating this workflow, mock objects account for 20-30% of test code. That's not test coverage — it's test infrastructure. Infrastructure that diverges from production behavior over time, creating false confidence. When branching a production-equivalent database costs nothing, mocking becomes the expensive choice." Canonical concepts/mock-object-maintenance-cost + patterns/database-branch-per-test-over-mocking. The structural insight is that mock objects have a maintenance cost (divergence from production behavior) + a correctness cost (false confidence from passing tests that don't reflect production) that was previously justified only by the unavailability of cheap real-database environments. (Source: sources/2026-04-30-databricks-backstage-with-lakebase)
-
Developer workflow rearrangement, not just feature addition. The post's before/after comparison names seven steps in the traditional cycle that collapse or disappear with branching: staging collisions, "works on my machine but breaks in staging", schema-migration-against-real-data-surprises, mock maintenance. Replaced with: per-branch IDE database, per-PR CI branch + schema diff, per-QA-member destructive-test branch, post-merge clean-up. See concepts/integration-tests-against-real-database for the workflow pivot point and patterns/database-branch-per-test-over-mocking for the formalised shape. (Source: sources/2026-04-30-databricks-backstage-with-lakebase)
Systems / concepts / patterns extracted¶
Systems:
- systems/lakebase — canonical system this POC deploys; third full-capability demonstration after CMK (2026-04-20) and LangGuard (2026-04-27).
- systems/backstage — Spotify's open-source Internal Developer Portal; the canonical state-heavy application used as a migration- stress-test here.
- systems/pageserver-safekeeper — the Neon-lineage durable storage tier that makes instant branching + PITR work.
- systems/databricks-postgres-cli — the
databricks postgres generate-database-credentialcommand, Lakebase's intended short-lived-JWT auth path. - systems/thoughtworks-technology-radar — the consulting-firm Technology Radar that endorsed Backstage as an IDP foundation, motivating this POC.
- systems/postgresql — the wire protocol + semantics Lakebase preserves.
Concepts:
- concepts/compute-storage-separation — the architectural forcing function behind both instant branching and PITR.
- concepts/database-branching — reaffirmed as the developer-cycle primitive, with a fourth canonical use-case axis (IDP / IDE integration) beyond PlanetScale-dev-sandbox / LangGuard-policy- testing / Stripe-Projects-agent-operation.
- concepts/copy-on-write-storage-fork — explicitly named as the mechanism ("a pointer to the same underlying pages, and only diverges on write").
- concepts/point-in-time-recovery — canonical first wiki instance at Lakebase/Neon altitude; 3.78 s end-to-end.
- concepts/wal-record-granularity — canonical first wiki concept page on the property that PITR target-times snap backward to the nearest WAL record (12-second snap-back demonstrated).
- concepts/mock-object-maintenance-cost — canonical first wiki page on the structural cost of mock-as-test-infrastructure (20-30% of test code, divergent from production, source of false confidence).
- concepts/oauth-jwt-short-lived-credential — the credential shape Lakebase expects, distinct from long-lived PATs.
- concepts/integration-tests-against-real-database — the workflow primitive that replaces mocks when branches are cheap.
Patterns:
- patterns/branching-is-pitr-with-time-now — architectural unification of two operations that were previously distinct.
- patterns/database-branch-per-test-over-mocking — CI / QA / IDE workflow pattern where each test environment gets its own branch.
- patterns/credential-refresh-cron-as-auth-compat-shim — the workaround Thoughtworks used to bridge Lakebase's short-lived JWT to Backstage's expectation of long-lived credentials.
Operational numbers¶
| Metric | Value | Notes |
|---|---|---|
| Backstage catalog size | ~63 MB | Full Backstage metadata graph in the POC |
| Branch creation (data plane) | 1.09 s | Control plane ack was instant |
| PITR end-to-end recovery | 3.78 s | 32 rows deleted then recovered via branch |
| WAL-record snap-back | 12 s | Requested 22:56:02Z, got 22:55:50Z |
| Credential refresh cadence | every 50 min | Cron-based workaround for short-lived JWT |
| Test-code savings claim | 20-30% of test code | Mock objects across evaluated teams |
No latency / throughput / concurrency numbers beyond these; the post is a workflow-transformation showcase, not a capacity benchmark.
Caveats¶
- Tier-3 single-vendor POC with Thoughtworks as guest author. The 1.09-second and 3.78-second numbers are single-shot measurements in a development environment, not production-scale benchmarks. The post does not disclose variance across repeated runs, concurrent-branch-creation load, or geographic latency. The 20-30% mock-code claim is attributed to "our experience across multiple partner teams evaluating this workflow" with no count, methodology, or comparison group.
- 63 MB is a developer-IDP-scale dataset. Whether the 1.09-second branching time scales linearly, sub-linearly, or cliff-edges at GB / TB dataset sizes is not disclosed. The copy-on-write architecture predicts near-constant time (fixed control-plane + metadata pointer work) but the POC does not verify this at larger scales.
- PITR granularity is WAL-record-bounded and WAL-cadence-dependent. The 12-second snap-back is a function of Lakebase's WAL-write cadence in the POC's configuration; different workload intensities and configurations will produce different granularities. The post explicitly flags this as "an important caveat for time-sensitive recovery workflows."
- Auth workaround (50-minute cron refresh) is a POC hack, not a
production pattern. Databricks would likely guide production
integrations to use the CLI directly or embed credential
refresh into the application's connection layer. The
.env-rewrite approach is a Thoughtworks-POC choice driven by Backstage's configuration-file expectations. - Cross-part series — this is Part 1. Part 2 (Governance) and Part 3 (FinOps) are referenced as forthcoming; their content is not in scope here. This ingest captures only the deployment-cycles + branching + PITR surface.
- "Instant branches for performance tests, disposable branches for functional tests, running branch for UAT" workflow description is aspirational — the POC demonstrated branching and PITR but not the full multi-branch CI/CD topology. Later posts in the series may cover this.
Source¶
- Original: https://www.databricks.com/blog/backstage-lakebase
- Raw markdown:
raw/databricks/2026-04-30-backstage-with-lakebase-5fa20593.md
Related¶
- systems/lakebase · systems/backstage · systems/pageserver-safekeeper · systems/databricks-postgres-cli · systems/thoughtworks-technology-radar · systems/postgresql
- concepts/compute-storage-separation · concepts/database-branching · concepts/copy-on-write-storage-fork · concepts/point-in-time-recovery · concepts/wal-record-granularity · concepts/mock-object-maintenance-cost · concepts/oauth-jwt-short-lived-credential · concepts/integration-tests-against-real-database
- patterns/branching-is-pitr-with-time-now · patterns/database-branch-per-test-over-mocking · patterns/credential-refresh-cron-as-auth-compat-shim · patterns/policy-testing-via-database-branching · patterns/branch-based-schema-change-workflow
- companies/databricks