PATTERN Cited by 1 source
SQL-Native Multimodal LLM Inference¶
SQL-Native Multimodal LLM Inference is the pattern of exposing LLM (including multimodal) inference as a callable function inside SQL / DataFrame / streaming queries so model calls compose with the rest of the data pipeline like any other column expression — no separate model-serving service, no separate ETL job to fan out inference, no glue code to bridge "the data warehouse" and "the model endpoint."
Problem¶
Conventional LLM-in-pipeline architectures bolt model serving onto the side of the data platform:
data warehouse ──> ETL job ──> HTTP request to model service ──> write back
▲ │
│ ▼
└── retry logic ──┘
└── batch sizing ──┘
└── auth ──┘
└── model versioning ──┘
└── format-conversion glue ──┘
Every pipeline that wants LLM inference re-invents this scaffolding. Iteration on prompts means redeploying the ETL job. Schema changes mean coordinating across the model service and the warehouse. Multimodal input (images) requires custom encoding/serialization. A team that just wants "classify this column with an LLM" writes a service to do it.
Solution¶
Expose model inference as a first-class function in the query language. The function takes a model endpoint reference, a prompt, an input column (text or image), and an output schema, and returns the model's response as a typed column.
SELECT
doc_id,
page_number,
ai_query(
'multimodal-classifier-endpoint',
'Classify this scanned page. Return Dewey codes, geographies, and a water-relevance flag.',
page_image,
responseFormat => '{"type":"json_schema", "schema": {...}}'
) AS classification
FROM unity_catalog.archive.rendered_pages
WHERE document_length > 5
That's the pipeline. The model call is a column expression. There is
no separate service to deploy, no separate ETL to schedule, no
serialization glue. Multimodal input is just a column. Structured
output is just a responseFormat parameter.
In the MapAid groundwater pipeline¶
The MapAid
pipeline uses ai_query in
three load-bearing stages of the same pipeline:
- Classification pass —
ai_queryover sampled page-images produces Dewey codes + geographies + water flag. - Extraction pass —
ai_queryover per-page OCR'd text + schema-constrained output emits JSON well/borehole records. - Judge pass —
ai_queryagainst a different judge model scores each classification. See patterns/llm-judge-as-inline-pipeline-stage.
"Because AI Functions run directly within SQL, the team could iterate on prompts and output schemas without building separate model-serving infrastructure." (Source: sources/2026-05-11-databricks-unlocking-the-archives)
The architectural punch line is in that last clause: iteration cost is a query refactor, not an infrastructure deploy.
Mechanics¶
- Inputs are columns. Text columns, image columns (e.g. Volume-stored page images), structured columns. The function doesn't care about input modality — the platform handles serialisation.
- Outputs are typed columns. Schema-constrained responses (per concepts/schema-constrained-llm-output) deserialize directly into typed columns or structs.
- Endpoints are referenced by name. Switch models by changing the
endpoint string. Ship A/B tests with a
CASEexpression. - Composes with all SQL/DataFrame primitives. Filter on model output. Join model outputs to source tables. Window over them. Aggregate them. Stream them.
- Storage substrate. Inputs live in Unity Catalog Volumes (raw files / images) or Delta Lake (intermediate tables); outputs land in Delta with ACID + lineage.
What this pattern collapses¶
A traditional LLM pipeline has all of these as separate pieces:
- Model-serving cluster
- ETL/batch job that fans out requests
- Retry / rate-limit logic
- Batch-size tuning
- Auth between ETL and model service
- Model versioning + endpoint routing
- Schema definition for inputs + outputs
- Glue code to (de)serialise multimodal inputs
- Glue code to write outputs back to warehouse
SQL-native multimodal inference collapses all of those into:
- An
ai_querycall.
That's the pattern's value. Everything that used to be Python + Kubernetes is now a SQL clause.
When to use¶
- LLM inference whose inputs and outputs naturally live in tables — text columns, image columns, structured records.
- Pipelines whose iteration speed matters more than custom inference orchestration.
- Teams without dedicated ML-platform engineers (analysts, scientists, partner organizations).
- Multimodal workloads where input format would otherwise require custom serialisation glue.
When not to use¶
- Real-time interactive inference in user-facing latency
budgets —
ai_queryruns as part of a query plan, not a request handler. Use Databricks Model Serving / Foundation Model API direct endpoints for that. - Workloads with custom request shaping (streaming token output, custom retry semantics, fine-grained rate-limit handling) that exceed what the SQL surface exposes.
- Stateful multi-turn agents —
ai_queryis per-row inference, not a conversation.
Tradeoffs¶
- Vendor coupling. AI Functions are Databricks-specific. Equivalent surfaces exist in some other warehouses (Snowflake Cortex, BigQuery ML), but the pattern as a whole is bound to whichever data platform you're on.
- Limited control over inference internals. Batching, retries, rate limiting, and prompt-cache behaviour are platform-managed — good for ergonomics, bad if you need fine control.
- Cost transparency. Per-row inference cost shows up in compute bills, not as a separate model-serving line item; can be harder to attribute.
Seen in¶
- sources/2026-05-11-databricks-unlocking-the-archives — canonical
wiki instance.
ai_queryused in three stages (classify / extract / judge) of the MapAid groundwater pipeline; multimodal page-image input; schema-constrained JSON output; iteration-without-separate- infra explicitly framed as the architectural value.
Related¶
- systems/databricks-ai-functions
- systems/databricks-foundation-model-api
- systems/delta-lake
- systems/unity-catalog
- systems/unity-catalog-volumes
- concepts/schema-constrained-llm-output
- concepts/multimodal-document-understanding
- patterns/visual-first-document-extraction
- patterns/llm-judge-as-inline-pipeline-stage