SYSTEM Cited by 1 source
Datalab (Zalando)¶
Definition¶
Datalab is Zalando's internal name for a hosted multi-tool notebook environment that exposes JupyterHub, R Studio, "and other tools" behind a single web-browser URL, with web-based shell access and pre-configured access to Zalando's internal data sources (S3, BigQuery, MicroStrategy, and others). It is the first-contact experimentation surface for Zalando's applied scientists and ML engineers.
Canonical disclosure¶
From the 2022-04-18 ML Platform overview (sources/2022-04-18-zalando-zalandos-machine-learning-platform):
"Zalando provides its ML practitioners with access to a hosted version of JupyterHub, an experimentation platform where they can use Jupyter notebooks, R Studio, and other tools they may need to query available data, visualize results, and validate hypotheses. Internally we call this environment Datalab."
"Because Datalab provides pre-configured access to various data sources within Zalando, such as S3, BigQuery, MicroStrategy, and others, its users don't have to worry about setting up the necessary tools and clients on their own laptops. Instead, they're ready to start experimenting in less than a minute."
Role in the ML platform¶
- First-30-seconds entry point — the explicit value proposition ("ready to start experimenting in less than a minute") positions Datalab as the way Zalando amortises the usual Python + data-client + credentials + S3 / BigQuery setup friction across the whole org.
- Complements, not replaces, Databricks and HPC. The post names three experimentation substrates for three workload shapes: Datalab for prototyping / quick feedback; systems/databricks for big-data Spark; the GPU HPC cluster for compute-vision or large-model training. Datalab is not a distributed-compute environment — "Datalab is well suited for prototyping and getting quick feedback, it's not always enough, especially when big data is involved."
- Not the pipeline authoring surface. Production pipelines are authored in systems/zflow Python scripts committed to git; Datalab is the experimentation-only environment.
Wiki positioning¶
- First named wiki instance of a company-branded hosted JupyterHub experimentation platform ("Datalab") as distinct from vanilla JupyterHub. Pairs with the general concept concepts/notebook-experimentation-platform.
- Datalab is the realization of the notebook- experimentation-platform concept at Zalando scale: hosted + multi-tool + pre-wired data-source access.
- Internals (host substrate, authentication, multi-tenancy, quota model) are not disclosed. Stub page — expand when Zalando publishes more.
Seen in¶
- sources/2022-04-18-zalando-zalandos-machine-learning-platform — canonical disclosure. First-30-seconds entry point into Zalando's ML platform; pre-wired S3, BigQuery, MicroStrategy access; hosted JupyterHub + R Studio.
Related¶
- systems/jupyterhub — the multi-user notebook hub substrate underneath.
- systems/databricks — companion substrate for big-data Spark work.
- systems/zalando-hpc-cluster — companion GPU substrate for CV and large-model training.
- systems/aws-s3 · systems/google-bigquery — two of the pre-wired data sources.
- companies/zalando · concepts/notebook-experimentation-platform