Skip to content

SYSTEM Cited by 1 source

Zeppelin to Databricks Notebook Converter

The Zeppelin to Databricks Notebook Converter is a customer-built Databricks App developed by Deutsche Börse Group to migrate the StatistiX platform's Zeppelin notebooks (running on Cloudera, EOL 2027) to Databricks-native .ipynb notebooks. The converter is the canonical wiki instance of the structural-deterministic-logical-LLM-split pattern.

Disclosed in the 2026-05-19 Deutsche Börse / Databricks customer-blog post.

What it is

A web app — running inside the Deutsche Börse Databricks workspace as a Databricks App — that takes a single uploaded Zeppelin notebook JSON export and emits two outputs:

  1. A converted .ipynb Databricks notebook with structure translated and all logic preserved verbatim.
  2. A context-augmented prompt — populated with Deutsche Börse's specific Zeppelin environment (custom interpreters, data sources, configuration patterns) — that the user pastes into Genie inside Databricks to drive logic reconstruction.

The app does not rewrite any logic, references, visualisations, widgets, or scheduling. The deliberate negative-space decision is one of the most cited architectural choices in the post.

Architecture

Frontend

  • shadcn UI (production). The team initially built a Streamlit prototype but evolved to shadcn for "a more professional and scalable interface". (Streamlit and shadcn UI are both viable on Databricks Apps; the choice is a UX-quality decision, not an architectural one.)

Backend

  • A Python backend (implied by the Databricks Apps deployment shape) performs the structural conversion and prompt generation. The backend is stateless per request — one notebook in, one .ipynb + one prompt out.

Deployment substrate

  • Databricks Apps. The app authenticates as a workspace service principal, runs inside the customer's Databricks workspace, and benefits from the platform's built-in deployment without standing up "separate infrastructure". This is one of the first publicly-disclosed customer Databricks App deployments where the app itself is a migration utility rather than an analytics or decision-support workload.

Two-stage conversion pipeline

Zeppelin .json export
        |
        v
[ Stage 1: Structural Converter ] -- deterministic, rule-based
        |
        +--> .ipynb (Databricks notebook)         -- logic preserved verbatim
        |
        +--> Context-augmented prompt template    -- environment-encoded
                |
                v
        [ User pastes into Genie ]
                |
                v
[ Stage 2: Genie -- LLM logic reconstruction ] -- heterogeneous, per-notebook
                |
                v
        Reconstructed Databricks notebook (interactive Q&A loop)

The structural converter and Genie are decoupled by a prompt string that the user copy-pastes between the two systems. The handoff is intentionally manual at this seam — it is the user's chance to inspect the converted notebook before logic reconstruction. The seam is the architectural feature, not a UX inconvenience.

What Stage 1 does (deterministic, rule-based)

  • Paragraph → cell. Each Zeppelin paragraph becomes a Databricks cell. Unit-of-execution mapping with no semantic change.
  • Interpreter prefix mapping. Zeppelin magics (%python, %sql, %pyspark, …) are translated to their Databricks-native equivalents. Mapping is finite and deterministic.
  • Metadata → .ipynb JSON. Notebook-level metadata (kernel info, layout) is reformatted into the Jupyter .ipynb schema Databricks consumes.
  • Logic content preserved exactly. SQL strings, Python code, visualisation specs, widget definitions, scheduling fragments, HDFS/Oracle references — all copied verbatim.

What Stage 1 explicitly does NOT do

The post is unusually explicit about negative space:

The converter does not rewrite SQL logic, Python logic, visualizations, widgets, Oracle and HDFS references, scheduling logic or business-specific custom code. All of that content is preserved in the converted notebook, untouched, because rewriting it automatically would introduce errors and undermine trust in the output.

This is the cleanest articulation in the wiki of the deliberate decision not to rewrite as a first-class design choice.

What Stage 1 hands to Stage 2 (the prompt)

For every uploaded notebook, the app automatically generates a prompt that includes specific details about Deutsche Börse's Zeppelin environment:

  • Custom interpreters (e.g. the StatistiX Spark interpreter with Oracle credentials pre-bound).
  • Data sources (HDFS path conventions, Oracle schemas).
  • Configuration patterns (workspace conventions Genie cannot infer from the notebook content alone).

Without this context block, generic Genie prompting "produces generic results" — the LLM has no way to know that a particular HDFS path corresponds to a particular logical table, or that a particular custom-interpreter magic implies pre-bound credentials. The team's lessons-learned section pins this as the load-bearing factor: "Context is the difference between a good prompt and a great one." See concepts/context-encoded-llm-prompt and patterns/context-encoded-prompt-handoff.

User workflow

  1. Export a Zeppelin notebook as JSON (in Cloudera).
  2. Upload the JSON into the converter app (Databricks workspace).
  3. Click Convert.
  4. Download the converted .ipynb.
  5. Open Databricks, upload the notebook, launch Genie, paste the generated prompt.
  6. Genie asks clarifying questions and rebuilds the notebook.

The workflow is designed for business users, not engineers. Migration does not require a dedicated engineering team for each notebook.

Operational disclosures

  • Per-notebook redevelopment time: hours of manual effort → 15–20 minutes (depends on complexity).
  • Migration scope: 2,000+ users at Deutsche Börse using StatistiX.
  • Status as of 2026-05-19: development complete; "large-scale, real-world testing" phase. Open: prompt finalisation across business entities, validation across the full notebook corpus, end-user onboarding.

Rejected design: agentic architecture

The team's first attempt was "a more complex agentic architecture that added overhead without solving the core problem." They discarded it for a simple UI + clean backend. This is a notable counter-cyclical signal against the 2026 reflex of "reach for an autonomous agent loop first" — the migration task turned out to be well-bounded enough that a linear app pipeline was sufficient and superior. (Source: sources/2026-05-19-databricks-deutsche-borse-zeppelin-to-databricks-notebook-migration.)

Open questions

  • Genie reconstruction accuracy on Zeppelin → Databricks logic translation is not quantified. The economics of the structural-vs-logical split depend on Stage 2 producing usable output most of the time.
  • Generalisability beyond Deutsche Börse. The prompt is hand-tuned for DBG's specific Zeppelin environment. Whether the converter productises into a Databricks platform feature for other Cloudera Zeppelin tenants is not stated.
  • No public reference implementation. Unlike the AWS / Synthesia G7e companion piece (which links aws-samples/sample-asynchronous-video-decoding), this writeup does not open-source the converter code.
  • Tail of difficult notebooks. The 15–20 minute number is for typical notebooks; the post does not characterise the long tail of notebooks that may require deeper engineering intervention.

Seen in

  • 2026-05-19 — canonical first-wiki appearance. (Source: sources/2026-05-19-databricks-deutsche-borse-zeppelin-to-databricks-notebook-migration.) Architectural primitives canonicalised: structural-vs-logical split as the load-bearing design decision; context-encoded prompt as the handoff between deterministic and LLM stages; deliberate decision not to rewrite logic as the negative space that preserves trust; rejection of agentic architecture for a well-bounded migration task; Databricks Apps as the substrate for customer-built migration tooling. Hours-to-minutes per notebook, business-user-self-service, 2,000+-user migration scope.
Last updated · 542 distilled / 1,571 read