SYSTEM Cited by 1 source
Apache Zeppelin¶
Apache Zeppelin is an open-source web-based notebook for interactive data analytics. It supports multiple language interpreters in a single notebook (Spark, SQL, Python, PySpark, Markdown, shell, R, JDBC, and many others) and was widely deployed in Hadoop-era big-data stacks — particularly bundled with Cloudera Data Platform as the canonical interactive-analytics surface alongside HDFS and YARN-orchestrated Spark.
Stub page. First wiki ingest is the 2026-05-19 Deutsche Börse / Databricks customer-blog post disclosing the Cloudera Zeppelin 2027 decommissioning as the forcing function for a major large-enterprise migration motion to Databricks .ipynb notebooks.
Architectural shape¶
- Paragraph as execution unit. A Zeppelin notebook is a sequence of paragraphs. Each paragraph carries a per-paragraph interpreter directive (the leading magic, e.g.
%python,%sql,%pyspark,%md,%spark) that selects which interpreter executes the paragraph body. This per-paragraph polyglot model differs from Jupyter's per-notebook-kernel model. - Pluggable interpreters. Zeppelin's interpreter abstraction is a public extension point. Operators commonly install custom interpreters that pre-bind credentials, configurations, or data-source clients to a particular magic. Deutsche Börse's StatistiX deployment, for example, embeds "custom interpreters" tied to its Cloudera + HDFS + Oracle environment — the migration source's central reason that rule-based notebook conversion is impractical.
- In-notebook visualisations and widgets. Zeppelin paragraphs can declare visualisations and dynamic widgets bound to paragraph output, plus scheduling logic for periodic re-execution. These are part of the notebook content and survive serialisation.
- JSON serialisation. Zeppelin notebooks export to a JSON representation (the format used by migration tools as input).
Why Zeppelin notebooks are hard to migrate automatically¶
The Deutsche Börse source frames the problem cleanly:
Our Zeppelin notebooks weren't simple scripts. They contained complex SQL and Python logic, custom interpreters, Oracle and HDFS references, visualizations, widgets and scheduling logic built up over years. Each one reflected institutional knowledge from the business teams who relied on it. The diversity across the entire notebook landscape made a rule-based rewriting engine impractical, since the logic was simply too heterogeneous and too business-specific for automated rules to handle reliably.
Two distinct migration sub-problems coexist in any Zeppelin → Databricks (or Zeppelin → Jupyter) move:
- Structural — paragraph-to-cell, interpreter-prefix mapping, metadata reformat into
.ipynbJSON. Deterministic. Rule-driven. - Logical — the SQL/Python bodies, custom-interpreter semantics, environment-specific data-source references, visualisations, widgets, scheduling logic. Heterogeneous. Per-notebook. Resists rules.
The architectural insight pinned by the Deutsche Börse migration is that these two sub-problems must be handled by two different mechanisms — rules for structure, an LLM for logic — with a clean handoff between them. See patterns/structural-deterministic-logical-llm-split.
Cloudera Zeppelin EOL: 2027¶
Cloudera has announced full decommissioning of Zeppelin in 2027. Customers running Zeppelin under Cloudera Data Platform face a migration cliff. (Source: sources/2026-05-19-databricks-deutsche-borse-zeppelin-to-databricks-notebook-migration.) This forcing function applies to any Cloudera-CDP tenant that adopted Zeppelin as their interactive-analytics surface — Deutsche Börse is one disclosed instance, presumably one of many.
Seen in¶
- 2026-05-19 — Deutsche Börse: hours-to-minutes Zeppelin → Databricks migration. (Source: sources/2026-05-19-databricks-deutsche-borse-zeppelin-to-databricks-notebook-migration.) Canonical first-wiki appearance. 2,000+ users, Cloudera + Zeppelin + HDFS + Oracle legacy stack, Zeppelin EOL 2027 forcing function. The migration motion uses Zeppelin's JSON export as the transfer format — DBG users export-to-JSON in Cloudera, upload to the converter app inside Databricks, and the app emits both a converted
.ipynband an environment-encoded prompt for downstream Genie reconstruction. The architectural insight that the per-paragraph interpreter directive plus heterogeneous custom-interpreter ecosystem is the load-bearing reason rule-based migration fails on Zeppelin specifically, not on notebooks generically — pinned here as the canonical wiki characterisation of Zeppelin's structural mismatch with rule-based migration.
Related¶
- Databricks Apps — destination substrate hosting the migration tool.
- Databricks Genie — LLM agent that handles the logic-reconstruction stage of the migration.
- Zeppelin to Databricks Notebook Converter — Deutsche Börse-built migration tool.
- concepts/notebook-format-migration — the broader notebook-portability concept.
- concepts/heterogeneous-code-migration — the failure mode that defeats rule-based Zeppelin conversion.
- patterns/structural-deterministic-logical-llm-split — the architectural pattern the migration uses.
- companies/deutsche-borse — disclosed Cloudera Zeppelin tenant migrating to Databricks.