SYSTEM Cited by 3 sources
Lakeflow Jobs¶
Lakeflow Jobs is Databricks' job-orchestration product within the Lakeflow family — the workflow engine that runs notebook / SQL / Python / ML tasks on a defined schedule or trigger, on serverless or dedicated compute.
Three ingested sources so far. Across them, Lakeflow Jobs is the default Databricks-stack orchestrator for multi-step LLM-driven pipelines — the shape is converging: each customer composes Lakeflow Jobs + Delta Lake + AI Functions + MLflow into a different vertical workload.
Architectural role¶
In the MapAid groundwater pipeline Lakeflow Jobs is the orchestrator that strings the pipeline stages together (image rendering → sampled-page classification → quality judge → water-relevant filter → full-page OCR → entity-anchored merge → JSON record extraction) and runs them on serverless compute — "so MapAid pays only for what each run consumes."
Lakeflow Jobs is the orchestration layer; the Asset Bundle is the packaging layer; together they make the pipeline a deployable + scheduleable artifact. SUDAAK newly-digitised batches can be processed by re-running the same job against the new input.
Relationship to other Lakeflow products¶
The Lakeflow family also includes systems/lakeflow-spark-declarative-pipelines (declarative ETL pipelines with the AutoCDC primitive). Lakeflow Jobs is the generic orchestrator; Lakeflow Spark Declarative Pipelines is a more opinionated declarative-pipeline product. They compose: a Lakeflow Job can run a Lakeflow Spark Declarative Pipeline as a task.
Seen in¶
- sources/2026-05-20-databricks-virtue-foundation-medical-volunteers-72-countries — VF Match Foundational Data Refresh face. Lakeflow Jobs as the orchestrator for a 15+-task interdependent pipeline that processes 25M+ web pages through OpenAI GPT models for global healthcare-facility / NGO catalog construction. Verbatim: "These guarantees are enforced through Lakeflow Jobs, which orchestrate more than 15 interdependent tasks with conditional branching, parallel execution, and intelligent retry policies." This is the third independent customer using Lakeflow Jobs to compose multi-step LLM-driven extraction with conditional branching + retry policies — the shape is converging. Composes with patterns/multi-step-llm-extraction-pipeline (the pipeline-shape this orchestrator runs) and concepts/status-based-llm-pipeline-checkpointing (per-record state tracking enables resumable re-runs without re-paying LLM cost on already-processed rows). Lakeflow Jobs' retry policies
-
per-record status combine to make 25M+-record LLM pipelines economically resumable.
-
sources/2026-05-13-databricks-the-rosetta-stone-of-cps-clarotys-ai-powered-library — CSAF→Delta security-advisory ETL face. Lakeflow Jobs as the orchestrator for a CSAF (Common Security Advisory Framework JSON) parsing pipeline that lands each step in a dedicated Delta table and uses AI Functions (
ai_query) to call Serving endpoints inline for LLM-driven enrichment. "To handle the vast amount of information from various sources, Claroty uses Lakeflow Jobs to orchestrate the full process — from raw data to a well structured table. One of our pipelines orchestrates an ETL process that parses CSAF, a JSON formatted security advisory, into a tabular structure. In this process, each step reads and writes entries into a dedicated delta table. In this ETL, and in many more use cases, we use LLMs to enrich the data — from classification tasks and AI Functions like ai_query, using various Serving endpoints and MLflow to evaluate the answers we get from the LLM, using statistic metrics and LLM-as-a-judge, and monitor the cost." Composes with patterns/llm-judge-as-inline-pipeline-stage (the judge-inline-the-pipeline shape) and patterns/hybrid-classical-er-plus-genai (CSAF advisories feed the entity-resolution pipeline that links CVEs back to CPS-IDs). -
sources/2026-05-11-databricks-unlocking-the-archives — canonical wiki instance. Orchestrates the multi-stage document-classification
- extraction + judge pipeline on serverless compute.