Skip to content

PATTERN Cited by 1 source

Uniform buffer batching

A uniform buffer batching pattern amortizes GPU-memory allocation and upload costs across many draw calls by collecting per-draw uniform data into a single buffer upload, then issuing all the draws with per-draw offsets into the shared buffer.

When to reach for it

  • Running on WebGPU (or any modern native graphics API — Vulkan, Metal, D3D12) where uniforms must be supplied via a uniform buffer rather than set individually.
  • Rendering many draw calls per frame with varying per-draw uniforms (typical of any non-trivial renderer).
  • Previously ran on WebGL with per-uniform API calls; need to migrate without performance regression.

The naïve WebGPU mapping (don't do this)

Port WebGL's setUniform directly to WebGPU:

for (const draw of draws) {
  const buf = device.createBuffer({ size, usage: UNIFORM|COPY_DST });
  device.queue.writeBuffer(buf, 0, draw.uniformData);
  pass.setBindGroup(0, bindGroupFor(buf));
  pass.draw(...);
}

Every draw allocates and uploads. Both operations are expensive (see concepts/uniform-buffer). Performance tanks compared to the WebGL baseline.

The batched shape

Split the draw API into an encode phase and a submit phase:

// encode phase — no GPU calls yet
context->encodeDraw(uniformStructData1, material1, ...);
context->encodeDraw(uniformStructData2, material2, ...);
// ... many encodes ...
context->submit();

submit() on WebGPU:

  1. Sum the sizes of all encoded uniform structs.
  2. Allocate one uniform buffer that fits all of them.
  3. Pack all the uniform data contiguously and issue one writeBuffer() upload.
  4. Issue every encoded draw, each pointing at the shared buffer with its per-draw byte offset.

submit() on WebGL:

  1. For each encoded draw, issue the corresponding per-uniform WebGL calls (uniform1f, uniformMatrix3fv, …), then draw().

Both backends accept the same encode calls; the expensive work is deferred to submit() where the batching window is widest.

Generalization

The same shape applies beyond uniforms to any substrate where a per-operation op is expensive but the same op can be batched:

  • Vertex / index buffer uploads — pack many objects into one geometry buffer, draw with per-object offsets.
  • Texture uploads — texture atlases + index offsets rather than per-texture allocations.
  • GPU command encoding — pre-record an entire frame's commands into a single GPU-visible command buffer (this is what WebGPU's RenderBundles do at a higher level).
  • Network writes — coalesce many small messages into one frame.
  • Disk writes — coalesce into WAL append batches.

Implementation notes

  • Alignment. Uniform buffers usually have alignment requirements (commonly 256-byte stride per binding on WebGPU); the packer must round up.
  • Maximum buffer size — set a cap; flush the batch early if a frame's uniform total would exceed it.
  • Re-use vs re-allocate — ring buffer of uniform buffers per frame, recycled once the GPU has consumed them.
  • Don't batch across different bindGroup layouts if the grouping would force an expensive rebind per draw.

Canonical instance

Figma's C++ renderer — the encode/submit split is central to the WebGPU migration. Figma explicitly chose this shape before writing the WebGPU backend:

"we decided we would need to batch uploads together, by setting up the uniforms for multiple draw calls, uploading all of the data at once, and then 'submitting' all the draw calls in the right order."

The encode/submit API lives in the abstraction layer (patterns/graphics-api-interface-layer), so both backends participate — WebGPU gets the batching win, WebGL degrades gracefully to its existing per-uniform path.

(Source: sources/2026-04-21-figma-rendering-powered-by-webgpu)

Seen in

Last updated · 200 distilled / 1,178 read