PATTERN Cited by 1 source
Asset Bundle Single-Command Deployment¶
Asset Bundle Single-Command Deployment is the pattern of packaging an entire data + AI pipeline (orchestration, inference calls, schemas, storage references, configuration) as a single declarative bundle deployable + runnable with one command — so a non-platform-expert operator can run, update, and re-deploy the pipeline without touching the underlying multi-service architecture.
Problem¶
Production data + AI pipelines compose multiple services: object storage, governance catalog, table format, orchestrator, model endpoints, compute. Standing one up traditionally requires:
- Wiring permissions across services.
- Coordinating deploy artifacts across CI / IaC / notebook tools.
- Maintaining version compatibility across service boundaries.
- Handing over a runbook to whoever operates it after.
This is a high bar for partner organisations (nonprofits, academic collaborators, internal teams without platform engineers) who just want to run the pipeline against their archive without learning the underlying platform.
Solution¶
Package the full pipeline as a declarative bundle with:
- Pipeline logic (notebooks / SQL / DataFrame jobs).
- Orchestration topology (Lakeflow Job DAG with task dependencies, schedule, compute targets).
- Storage references (Volume + Delta table names; content is a config parameter, not part of the bundle).
- Schema definitions (input + output schemas, including LLM output schemas).
- Model endpoint references (which Foundation Model API endpoints, which judge model).
- Configuration (sampling thresholds, judge confidence cutoff).
The bundle is the unit of deployment. bundle deploy provisions the
job + tables + volumes; bundle run executes; bundle update
re-deploys after config or code changes.
In the MapAid groundwater pipeline¶
"The entire system is packaged as a Databricks Asset Bundle, meaning it can be deployed, updated, and run with a single command. MapAid received a self-contained solution that can be maintained without expertise across multiple cloud services. Because the pipeline logic is decoupled from the specific archive it processes, the same system could be adapted to other water archives, other regions, or other domains where large collections of scanned documents need to be classified and made searchable." (Source: sources/2026-05-11-databricks-unlocking-the-archives)
Two architectural points:
- Operating handoff. A pro-bono partner (MapAid) operates the pipeline. They don't need to know how Lakeflow Jobs, AI Functions, or Unity Catalog compose. They run a command.
- Pipeline-vs-archive decoupling. The bundle bakes in the pipeline shape (classify → judge → filter → extract). The archive is a parameter. SUDAAK's Sudanese groundwater archive is one input; an Ethiopian or Malawian water archive is a different input to the same bundle.
Mechanics¶
- Bundle = code + topology + schemas + config; not data. Data references (Volume names, Delta table names) are parameters resolved at deploy time. Pointing the bundle at a different archive is a parameter change, not a code change.
- Single deploy primitive. All pipeline assets land in one command. No coordinating IaC tools across services.
- Re-runnable. New batches of input data flow through the same bundle on the same schedule.
- Version-controllable. The bundle is a Git artifact. Pipeline versions are commit hashes.
What this pattern enables¶
- Pro-bono / partner-operated pipelines. The whole reason to do this on top of a managed platform is so the platform owner ships the pipeline and the partner organisation runs it. The bundle is the unit of handoff.
- Cross-domain transfer. Once the bundle's archive parameter is pulled out, the same pipeline shape applies to other domains (legal-discovery, healthcare-records, scientific-paper archives) — "the same system could be adapted to other water archives, other regions, or other domains."
- Fleet operations. A platform team can deploy the same bundle to N customer workspaces by parameterizing the archive + configuration per deployment.
- Reproducibility. A historical bundle version + historical input Volume version is fully reproducible. Audit / re-run / debug all hit the same pipeline shape.
What this pattern requires¶
- A managed platform that exposes pipeline orchestration, model inference, and storage as composable declarative resources. Without this you're back to gluing services together — at which point the bundle is just a shell script.
- Pipeline parametrisation discipline. The bundle author must cleanly separate pipeline logic from archive-specific configuration. If pipeline logic embeds path strings or schema fields specific to one archive, the bundle isn't really portable.
- A command-line surface for deploy + run. The partner operator
needs
bundle deployto be one command, not ten.
When to use¶
- Pipelines you intend to hand off to a non-platform-expert operator.
- Pipelines you'll deploy multiple times with different input data (per-customer, per-region, per-archive).
- Pipelines that need reproducibility over time (re-run historical inputs through historical pipeline versions).
When not to use¶
- One-off pipelines run once by their authors. The bundle scaffolding isn't worth the setup cost.
- Pipelines with deeply custom infrastructure requirements (custom model serving, bespoke storage layers) the bundle abstraction doesn't cover.
- Multi-platform deployments (deploying the same logic to Databricks + Snowflake + AWS managed services). Asset Bundles are Databricks-specific.
Tradeoffs¶
- Vendor coupling. Databricks Asset Bundles are Databricks-only. Equivalent abstractions exist on other platforms (e.g. dbt projects, Airflow DAG packages) but they don't cover model inference + storage governance + orchestration in one bundle.
- Abstraction ceiling. Bundles work well for pipelines composed of platform-native primitives. Anything custom (third-party model endpoints, external storage) escapes the bundle.
Seen in¶
- sources/2026-05-11-databricks-unlocking-the-archives — canonical wiki instance. The MapAid groundwater pipeline (classify → judge → filter → extract over scanned PDFs) ships as a single Asset Bundle; one-command deploy + re-run; archive decoupled from pipeline logic for cross-region adaptability.