Skip to content

CONCEPT Cited by 1 source

Active CPU pricing

Active CPU pricing is a serverless-function billing model where customers pay for time the function is actively executing code on-CPU, not wall-clock duration including I/O wait. If the function is blocked on a database query, an external HTTP call, or any other I/O, the wait time is not billed.

Contrast with peer billing models

Model Billed time Example
Wall-clock (GB-seconds) Invocation start → response finish, including I/O wait Classical AWS Lambda
Active CPU On-CPU execution time only Vercel Functions on Fluid compute
CPU-time per isolate V8-isolate CPU ticks Cloudflare Workers

Wall-clock billing over-charges workloads that spend most of their time waiting on external calls; active-CPU billing aligns cost with actual compute resource consumption.

Structural coupling to concurrent-request substrates

Active CPU pricing is most naturally expressed on multi-request-per-instance substrates like Fluid compute. If one instance handles N concurrent requests, wall-clock billing would double-count time that parallelises across concurrency (request A's 500 ms DB wait is also request B's potentially- productive on-CPU time). Active-CPU billing inverts the incentive: faster code means less billed time, not more throughput-per-dollar indirection.

On per-request-isolation substrates (one container per request, one isolate per request in older Workers), there's no parallelised-I/O-wait to avoid charging for — the billing model is mostly a notational choice.

Why it shows up on this wiki

Canonical pricing-model counterpoint to classical Lambda GB-seconds. In the 2026-04-21 Vercel announcement, Active CPU pricing is disclosed as the structural fit for Bun on Fluid compute: a faster runtime translates directly into lower spend (billed active-CPU time drops), not indirectly via throughput-per-dollar.

Worked example (from 2026-04-21 Vercel)

The Bun runtime runs on Fluid compute, which handles multiple concurrent requests on the same instance. Active CPU pricing means you pay for time spent executing code, not wall-clock time waiting on responses. If your function is waiting for a database query or an API call, you're not being charged for that wait time.

Implications for runtime choice

When moving from a slower to a faster runtime (e.g. Node.js → Bun for CPU-bound workloads), Active CPU billing compresses the cost delta proportionally with the latency delta. A 28 % latency reduction on CPU-bound Next.js rendering translates to a ~28 % billed-active-CPU reduction, not just a throughput improvement. Wall-clock billing would under-reward the runtime swap on I/O-heavy endpoints.

Open questions

  • Specific rate cards not disclosed in the 2026-04-21 post.
  • Threshold behaviour unspecified (is there a minimum billing tick, cold-start billing treatment, concurrency cap per priced unit?).
  • Comparison with AWS Lambda's concurrency-aware billing model (post-2024 updates) not engaged in the Vercel post.

Seen in

Last updated · 476 distilled / 1,218 read