Skip to content

CONCEPT Cited by 1 source

KPU — Kinesis / Managed Flink Processing Unit

A KPU is AWS Managed Flink's unit of compute provisioning: a bundle of 1 vCPU + 4 GB memory + 50 GB running application storage. It cannot be decomposed — you cannot buy storage without also buying vCPU and memory, cannot buy memory without buying vCPU and storage, etc.

Quoted directly from AWS, per the Zalando 2026-03 post (sources/2026-03-03-zalando-why-we-ditched-flink-table-api-joins-cutting-state-by-75-with-datastream-unions):

"Managed Service for Apache Flink provisions capacity as KPUs. A single KPU provides you with 1 vCPU and 4GB of memory. For every KPU allocated, 50GB of running application storage is also provided. This means that the application resources are always configured in terms of KPUs, there's no way to allocate more storage without also allocating more CPU and memory, or more memory without also allocating more CPU and storage."

  • State size drives KPU count when storage is the binding resource (RocksDB on disk). A state-heavy job over-provisions vCPU and memory just to get enough local storage.
  • Every stop triggers a savepoint (this is a configurable setting AWS Managed Flink defaults on); see concepts/flink-snapshot-savepoint. Scaling up or down therefore takes the time of a full snapshot on the current state, which at large state sizes is 11–20 minutes.
  • Operators carry overscale margin to avoid restart cycles. Zalando ran parallelism 10–20 % higher than necessary; that margin compounds the KPU bill.
  • State-reduction savings are sub-proportional. When Zalando cut state 76 %, AWS cost dropped only 13 %. vCPU and memory needs didn't fall proportionally with state, and the KPU bundle means you cannot drop one dimension in isolation. The 13 % came mostly from eliminating the overscale margin.

Why KPU bundling exists

The bundled shape makes the service's capacity planning predictable and prevents pathological asymmetric sizing (e.g., a job asking for 500 GB storage and 1 vCPU), but it is hostile to workloads where one resource dimension dominates — particularly large-state stream processors where local disk is the real constraint and CPU is slack.

Seen in

Last updated · 507 distilled / 1,218 read