Skip to content

CONCEPT Cited by 1 source

PlanetScale hard transaction/query timeouts

Definition

PlanetScale enforces two hard timeouts at the vttablet layer that bound every client interaction with the database — 20 seconds for any open transaction and 900 seconds (15 minutes) for any single query. They are not configurable at the cluster level; they are "deliberately set timeouts that exist for performance reasons and to encourage good application design."

Both timeouts surface as specific RPC-level error messages from vttablet:

Transaction timeout (20 s):

vttablet: rpc error: code = Aborted desc = transaction <id>:
in use: in use: for tx killer rollback (CallerID:
planetscale-admin)

Query timeout (900 s):

target: example-db.-.primary: vttablet: rpc error:
code = Canceled desc = (errno 2013) due to context deadline
exceeded, elapsed time: 15m0.002989349s, killing query ID
65535 (CallerID: <id>)

The first is vttablet's tx killer reaping the long-lived transaction; the second is Go's context deadline exceeded error from the query-scoped context, surfaced with MySQL error code 2013 (CR_SERVER_LOST).

Why they exist

The timeouts are a platform-level admission-control mechanism: PlanetScale runs as a multi-tenant managed service, and an uncapped long transaction or long query from one tenant consumes vttablet connection-pool slots and InnoDB resources that every other tenant on the same instance depends on. Hard timeouts ensure that:

  1. No single client can pin a row-lock or connection indefinitely — transactions past 20 s are rolled back unconditionally, which prevents deadlock storms and lock-contention cascades from leaking across tenants.
  2. No single client can monopolise a query-planner or execution slot — queries past 15 min are killed, which prevents hot-table full-scans from starving short-running OLTP traffic.
  3. The platform's operational guarantees stay achievable — PlanetScale's managed-service SLAs assume query latencies in the millisecond band; long-tail queries break the tail-latency assumption they're built on.

The canonical PlanetScale framing: the timeouts "exist for performance reasons and to encourage good application design."

The root cause is usually application shape, not

query length

PlanetScale Support's observation: "most of the time, it's rather the user's application keeping the transaction open while handling other tasks such as data manipulation or sorting instead of closing the transaction first. Loops such as while <expression> or until <expression>, or for loops are particularly susceptible to that."

Canonical anti-pattern: BEGIN; SELECT …; <application does some work in-process>; INSERT …; COMMIT; — the transaction is open for the duration of the application's in-process work, not just the database queries. The fix is shorten the transaction boundary, not raise the timeout — "there is almost always a better solution."

The OLAP escape hatch

PlanetScale documents one supported per-session bypass: change the vttablet workload mode from OLTP (the default) to OLAP:

set workload='olap';

This must be issued per session (it cannot be changed globally) and resets to OLTP when the session closes. With workload='olap', vttablet skips its query deadline enforcement — which is what lifts the 900 s ceiling.

PlanetScale's own documentation is explicitly unfavourable toward this escape hatch: "we generally recommend against using it … it can cause rather drastic side effects such as a workload consuming all available resources or blocking other important, short-lived queries or transactions from completing, or overloading a database up to a point where it goes into an unrecoverable state and where manual intervention is needed. It can also block planned failovers or critical updates and will make it easier to hit other intentional limits or timeouts dictated by MySQL."

Framing: OLAP mode is a deliberate-friction escape hatch, not a cluster-wide tuning knob.

The recommendation ladder

PlanetScale's documented remediation ordering, from cheapest to most architecturally invasive:

  1. Shorten the transaction. Close the transaction before application-side data manipulation; narrow BEGIN/COMMIT to the set of statements that actually require atomicity.
  2. Use optimistic locking instead of a long-lived transaction — read current row-version, compute the update, and UPDATE … WHERE version = <expected> as a conditional write. If the version moved, retry. Eliminates the transaction-span-vs-application-work problem entirely for single-row update cases.
  3. Use Sagas for complex multi-step workflows — break the transaction into a sequence of local transactions connected by compensating actions, so no single transaction exceeds the 20 s ceiling.
  4. Move ETL workloads to data-integration enginesAirbyte, Stitch, Datadog integration. These are designed for long-running scan + aggregate workloads and sit outside PlanetScale's OLTP envelope.
  5. Last resort: set workload='olap'; per session, explicitly scoped to the one transaction or query that genuinely requires it.
  6. Beyond PlanetScale: route OLAP workloads to a separate OLAP system downstream of CDC — the architectural-level fix, reinforced in Matt Lord's VStream post.

Tooling for detection

PlanetScale Insights is PlanetScale's documented diagnostic surface for finding queries that approach the 20 s / 900 s ceilings: per-pattern aggregates with sketch-based percentile buckets. "To help you with optimizing your queries and transactions, PlanetScale provides you with additional tools such as Insights." Canonical pairing of hard-timeout-as-constraint with Insights-as-lens.

Contrast: Postgres statement_timeout

Postgres's native statement_timeout is user-settable per session via SET and has no cluster-level hard-cap — if the customer sets it to 0 (no limit) they get what they asked for. PlanetScale's design is the opposite: platform-imposed ceiling with a deliberate-friction per-session lift. This is the multi-tenant-vs-single-tenant trade-off surfacing at the timeout-configuration layer.

Seen in

  • sources/2026-04-21-planetscale-supports-notes-from-the-field — canonical wiki disclosure of the 20 s / 900 s numbers, their verbatim vttablet error messages, the set workload='olap'; per-session escape hatch, and the operator recommendation ladder (shorten-transaction → optimistic-locking → sagas → Airbyte/Stitch/Datadog → OLAP-mode as last resort). Mike Stojan, PlanetScale Support team, 2023-01-11. The post frames the timeouts as "deliberately set … for performance reasons and to encourage good application design" and is explicit that they "need to be considered hard timeouts" at time of writing ("We are looking into ways to lift or at least extend these, but for the time being …").
Last updated · 470 distilled / 1,213 read