PATTERN Cited by 1 source

Torchvision over PIL image processing¶

Pattern¶

For multimodal LLM serving, explicitly select the Torchvision-based image processor at model-load time when both PIL- and Torchvision- backed processors are available. The Torchvision path is ~10× faster on resize+normalise; the PIL default for some Hugging Face models is slow enough to block the inference engine's event loop.

Canonical wiki disclosure (Source: sources/2026-05-27-databricks-reliable-llm-inference-at-scale):

"Among all CPU operations for images, image processing (resizing and normalization) is 10x slower than other operations like base64 decoding. Some Hugging Face models default to the PIL-based image processor, while others use the faster Torchvision-based processor."

"By switching to Torchvision-based image processors and properly configuring OMP_NUM_THREADS, we sustained much higher QPS and fully leveraged the GPUs. After the fix shipped, requests completed per second jumped > 3x with the same replicas and load."

When to use it¶

Serving Hugging Face Transformers models with image inputs (vision-language models, multimodal LLMs).
Production inference engines (vLLM and similar) where the front- end runs on Python's async event loop and a slow per-request preprocessing step blocks the loop.
Workloads where image preprocessing happens on CPU per-request before the GPU forward pass — the standard configuration today.

When NOT to use it¶

Models where a custom GPU-side preprocessing pipeline (e.g. NVIDIA DALI, custom CUDA preprocessing) replaces the Hugging Face image processor entirely.
Workloads where image inputs are pre-processed offline and cached — the per-request preprocessing cost is amortised away.
Models that don't expose a Torchvision-backed alternative; some niche models only have PIL implementations.

Mechanics¶

The Hugging Face Transformers library has a base ImageProcessor class with multiple backends. Modern image processors typically have two implementations:

# Default (may be PIL-backed for some models)
processor = AutoImageProcessor.from_pretrained(model_name)

# Force Torchvision-backed (faster)
processor = AutoImageProcessor.from_pretrained(
    model_name, use_fast=True
)

The use_fast=True flag (or equivalent on the specific image- processor class) selects the Torchvision-backed implementation. As of 2026, this defaults to False for some models — the override is required explicitly, per model, per deployment.

The Torchvision-backed processor uses torchvision.transforms.v2, which dispatches to optimised native code for resize, normalise, crop, and similar operations. PIL goes through Python-level wrapping of libjpeg-turbo etc. with more per-call overhead. The 10× cost difference comes from avoided Python overhead and from operating on torch tensors directly without the PIL Image bridge.

Composition with OMP_NUM_THREADS fix¶

The 2026-05-27 disclosure pairs two independent fixes that together delivered >3× RPS:

This pattern: switch to Torchvision processor.
Fix OMP_NUM_THREADS to match container CPU quota.

Both are required because:

Torchvision dispatches to native Torch operations that respect OMP_NUM_THREADS for parallelism.
If OMP_NUM_THREADS is wrong, even a fast Torchvision path runs oversubscribed threads inside the container's CPU quota, triggering CPU throttling.
If Torchvision isn't selected, the PIL path bypasses the OMP thread pool entirely but is slow per-request.

The combined fix delivers fast preprocessing and correct thread sizing, leaving CPU-side preprocessing fast enough to feed the GPU without blocking.

Why the default matters operationally¶

The structural lesson: library defaults can hide order-of- magnitude performance bugs. The Hugging Face library provides a fast processor, but it's opt-in for some models. A team deploying the model without a profile pass would never know the slow path is running until traffic load surfaces it.

Operational discipline:

Profile every multimodal model on a representative request pattern before production rollout.
Check explicitly which image-processor backend is selected via the model's processor configuration (processor.__class__ or similar).
Set use_fast=True defensively at every model-load site, even if the default is already fast — the default may change in a model update.

This is one instance of a broader class:

Library / Default	Issue	Fix
HF Transformers PIL processor	10× slower than Torchvision	`use_fast=True`
HF tokenizers (slow Python path)	10-100× slower than Rust path	`use_fast=True` on tokeniser load
Pandas read_csv default engine	C engine is faster than Python	`engine="c"` (default since 1.x but worth verifying)
NumPy default integer type	platform-dependent — int32 on Windows, int64 on Linux	Explicit `dtype=`

A robust deployment audit checks all of these explicitly.

Risks and mitigations¶

Torchvision processor produces subtly different output than PIL — different rounding / interpolation default. Mitigation: validate model accuracy on a held-out eval set before production rollout.
use_fast=True not supported on the specific model — newer or bespoke models may not have the Torchvision path. Mitigation: profile to confirm; fall back to PIL if necessary and accept the cost or move preprocessing to a separate process pool.
Future Torchvision API changes break the override. Mitigation: pin Torchvision version in the deployment manifest.

Open questions¶

Which specific Hugging Face models default to PIL vs Torchvision — the post does not enumerate.
Eval-quality impact of the processor swap — Databricks does not disclose whether the swap was verified to be model-quality neutral.
GPU-side preprocessing as a future direction (NVIDIA DALI, custom CUDA kernels) — not addressed in the post.

Seen in¶

sources/2026-05-27-databricks-reliable-llm-inference-at-scale — canonical wiki disclosure of Torchvision-over-PIL as a load- bearing performance fix for multimodal LLM serving on Databricks Model Serving. >3× RPS jump on same hardware when paired with the OMP_NUM_THREADS fix.

concepts/multimodal-cpu-bottleneck — the failure mode this pattern fixes.
concepts/omp-num-threads-container-misconfiguration — the companion fix shipped together.
concepts/cpu-bound-serving-small-fast-model — adjacent regime.
patterns/multiprocessing-runtime-for-cpu-bound-serving — the adjacent CPU-bound-fix pattern that does not solve this case alone.
systems/pytorch / Torchvision — the substrate.
systems/databricks-model-serving — the deployment context.
systems/vllm — the open-source engine class this applies to.