Skip to content

PATTERN Cited by 1 source

SQL-Native Multimodal LLM Inference

SQL-Native Multimodal LLM Inference is the pattern of exposing LLM (including multimodal) inference as a callable function inside SQL / DataFrame / streaming queries so model calls compose with the rest of the data pipeline like any other column expression — no separate model-serving service, no separate ETL job to fan out inference, no glue code to bridge "the data warehouse" and "the model endpoint."

Problem

Conventional LLM-in-pipeline architectures bolt model serving onto the side of the data platform:

data warehouse ──> ETL job ──> HTTP request to model service ──> write back
                       ▲                  │
                       │                  ▼
                       └── retry logic ──┘
                       └── batch sizing ──┘
                       └── auth ──┘
                       └── model versioning ──┘
                       └── format-conversion glue ──┘

Every pipeline that wants LLM inference re-invents this scaffolding. Iteration on prompts means redeploying the ETL job. Schema changes mean coordinating across the model service and the warehouse. Multimodal input (images) requires custom encoding/serialization. A team that just wants "classify this column with an LLM" writes a service to do it.

Solution

Expose model inference as a first-class function in the query language. The function takes a model endpoint reference, a prompt, an input column (text or image), and an output schema, and returns the model's response as a typed column.

SELECT
  doc_id,
  page_number,
  ai_query(
    'multimodal-classifier-endpoint',
    'Classify this scanned page. Return Dewey codes, geographies, and a water-relevance flag.',
    page_image,
    responseFormat => '{"type":"json_schema", "schema": {...}}'
  ) AS classification
FROM unity_catalog.archive.rendered_pages
WHERE document_length > 5

That's the pipeline. The model call is a column expression. There is no separate service to deploy, no separate ETL to schedule, no serialization glue. Multimodal input is just a column. Structured output is just a responseFormat parameter.

In the MapAid groundwater pipeline

The MapAid pipeline uses ai_query in three load-bearing stages of the same pipeline:

  1. Classification passai_query over sampled page-images produces Dewey codes + geographies + water flag.
  2. Extraction passai_query over per-page OCR'd text + schema-constrained output emits JSON well/borehole records.
  3. Judge passai_query against a different judge model scores each classification. See patterns/llm-judge-as-inline-pipeline-stage.

"Because AI Functions run directly within SQL, the team could iterate on prompts and output schemas without building separate model-serving infrastructure." (Source: sources/2026-05-11-databricks-unlocking-the-archives)

The architectural punch line is in that last clause: iteration cost is a query refactor, not an infrastructure deploy.

Mechanics

  • Inputs are columns. Text columns, image columns (e.g. Volume-stored page images), structured columns. The function doesn't care about input modality — the platform handles serialisation.
  • Outputs are typed columns. Schema-constrained responses (per concepts/schema-constrained-llm-output) deserialize directly into typed columns or structs.
  • Endpoints are referenced by name. Switch models by changing the endpoint string. Ship A/B tests with a CASE expression.
  • Composes with all SQL/DataFrame primitives. Filter on model output. Join model outputs to source tables. Window over them. Aggregate them. Stream them.
  • Storage substrate. Inputs live in Unity Catalog Volumes (raw files / images) or Delta Lake (intermediate tables); outputs land in Delta with ACID + lineage.

What this pattern collapses

A traditional LLM pipeline has all of these as separate pieces:

  • Model-serving cluster
  • ETL/batch job that fans out requests
  • Retry / rate-limit logic
  • Batch-size tuning
  • Auth between ETL and model service
  • Model versioning + endpoint routing
  • Schema definition for inputs + outputs
  • Glue code to (de)serialise multimodal inputs
  • Glue code to write outputs back to warehouse

SQL-native multimodal inference collapses all of those into:

  • An ai_query call.

That's the pattern's value. Everything that used to be Python + Kubernetes is now a SQL clause.

When to use

  • LLM inference whose inputs and outputs naturally live in tables — text columns, image columns, structured records.
  • Pipelines whose iteration speed matters more than custom inference orchestration.
  • Teams without dedicated ML-platform engineers (analysts, scientists, partner organizations).
  • Multimodal workloads where input format would otherwise require custom serialisation glue.

When not to use

  • Real-time interactive inference in user-facing latency budgets — ai_query runs as part of a query plan, not a request handler. Use Databricks Model Serving / Foundation Model API direct endpoints for that.
  • Workloads with custom request shaping (streaming token output, custom retry semantics, fine-grained rate-limit handling) that exceed what the SQL surface exposes.
  • Stateful multi-turn agentsai_query is per-row inference, not a conversation.

Tradeoffs

  • Vendor coupling. AI Functions are Databricks-specific. Equivalent surfaces exist in some other warehouses (Snowflake Cortex, BigQuery ML), but the pattern as a whole is bound to whichever data platform you're on.
  • Limited control over inference internals. Batching, retries, rate limiting, and prompt-cache behaviour are platform-managed — good for ergonomics, bad if you need fine control.
  • Cost transparency. Per-row inference cost shows up in compute bills, not as a separate model-serving line item; can be harder to attribute.

Seen in

  • sources/2026-05-11-databricks-unlocking-the-archives — canonical wiki instance. ai_query used in three stages (classify / extract / judge) of the MapAid groundwater pipeline; multimodal page-image input; schema-constrained JSON output; iteration-without-separate- infra explicitly framed as the architectural value.
Last updated · 542 distilled / 1,571 read