CONCEPT Cited by 1 source

Query lifecycle manager¶

Definition¶

A query lifecycle manager is the component of a distributed SQL engine "responsible for the lifecycle of currently-running queries" — scheduling them onto executors, tracking their progress, cancelling them on request, handling their completion, releasing their resources (Source: sources/2026-01-27-redpanda-engineering-den-query-manager-implementation-demo).

Distinct from:

Query planner / optimiser — which produces the plan from SQL text; the manager executes the plan. Planner work completes before manager work starts.
Query executor — the per-node / per-thread worker that runs a piece of the plan. The manager orchestrates executors; it doesn't do the data processing itself.

The manager sits between the planner (upstream) and the executors (downstream), owning the query's identity and lifecycle:

SQL → parser → planner → [manager: schedule → execute → finish] ← executors
                                    ↑ cancel event

Canonical wiki instance¶

Oxla's 2026-01-27 query-manager rewrite. The Oxla team's own post is the first canonical wiki disclosure of a query lifecycle manager as a named component; prior wiki coverage of distributed query engines (Presto, Trino, Spark SQL, Snowflake warehouses, Vitess evalengine) describes planners and executors but names their lifecycle-management components only implicitly.

The Oxla rewrite framed the manager as the component where the previous implementation's reliability bugs concentrated:

Scheduling bugs: queries stuck in the scheduling phase without progress.
Cancellation bugs: "spawned async work per thread, retried cancellation from a different thread entirely" — the cancellation code path was the worst-affected substrate. See concepts/async-cancellation-thread-spawn-antipattern.
State-reporting bugs: "a query might show as scheduled in one place and finished in another" — two different queries to the manager returning different answers.
Resource-leak bugs: "queries could get stuck in 'finished' or 'executing' while still holding onto resources" — the manager's teardown path was not reliably triggered.

Primary responsibilities¶

A production-grade query lifecycle manager must handle:

Schedule: place a new query into the lifecycle (assign executors, allocate resources).
Execute: track the query's progress through its plan (stages completing, intermediate results flowing).
Cancel: respond to an external cancellation request (timeout, user abort, admin kill) by terminating execution and reclaiming resources — a non-trivial concurrency problem (see concepts/request-cancellation).
Complete: handle normal termination — collect final result, mark query done, tear down executors (see concepts/explicit-teardown-on-completion).
Report: expose query state to external observers (admin tools, health checks, observability pipelines).
Restart (optional): handle partial-failure recovery — some engines retry failed stages, some retry the whole query, some fail fast.

The Oxla architectural claim¶

The 2026-01-27 post's load-bearing claim: the manager should be built as a deterministic state machine, with every transition logged (see concepts/state-transition-logging) and explicit teardown at terminal states (see concepts/explicit-teardown-on-completion). This composition forms the patterns/state-machine-as-query-lifecycle-manager pattern.

The reliability argument: ad-hoc shared-state management produces the pathologies Oxla named (stuck queries, state disagreement, resource leak, cancellation pathology). Explicit state-machine design eliminates them by construction.

The debuggability argument: when bugs do happen, trajectory reconstruction from the transition log makes them "fixed in days instead of weeks".

Composability¶

The query lifecycle manager is one instance of a broader pattern: a stateful resource managed through its full lifetime by a per-resource state machine. Related lifecycle-management substrates:

Consensus request lifecycle: see concepts/two-phase-completion-protocol + the tentative/durable/cancelled states.
Leader lifecycle: see concepts/leader-establishment + concepts/leader-revocation.
Workflow lifecycle: see concepts/fault-tolerant-long-running-workflow.
Connection lifecycle: TCP state machine (classic formal example).

All share the deterministic-state-machine + explicit-transitions + terminal-teardown structure.

Seen in¶

sources/2026-01-27-redpanda-engineering-den-query-manager-implementation-demo — canonical wiki instance. Oxla 2026-01-27 query-manager rewrite disclosure. First wiki canonicalisation of query-lifecycle-manager as a named substrate.