Skip to content

SYSTEM Cited by 1 source

VStream

What it is

VStream (vitess.io docs) is Vitess's low-level gRPC change-stream primitive — the substrate under every VReplication workflow and the public API under every third-party CDC driver for Vitess. VStream exposes a Vitess keyspace's committed changes as a single ordered stream, papering over the fact that the keyspace may be backed by hundreds or thousands of MySQL shards.

Two RPCs, two altitudes

VStream has two surfaces in the Vitess codebase, both named "VStream":

  • Tablet-level VStream (queryservice.proto L103-L113) — per-VTTablet RPC: stream changes from one specific MySQL shard. Consumed internally by VReplication workflows (MoveTables, Reshard, Materialize) that need per-source-shard granularity.
  • VTGate-level VStream (vtgateservice.proto L55-L56) — keyspace-wide RPC exposed by every VTGate. Fans out to every shard's tablet-level VStream, interleaves the per-shard event streams, attaches keyspace-wide VGTID checkpoints. This is the API external CDC drivers call.

The layering is tidy: one primitive, two consumer altitudes. Internal data motion and external CDC ride the same mechanism.

Stream shape

Events in a VStream fall into a small vocabulary (from the output logs in the 2024-07-29 post's worked example):

  • FIELD — schema-for-table announcement (column list, types, charsets). Emitted once per table per stream start + on schema changes.
  • ROW — a per-row change (insert / update / delete). Carries before / after images of the row plus the shard the row originated from.
  • VGTID — keyspace-wide progress checkpoint; payload is a set of per-shard GTID positions. Persist this on the consumer side; resubmit on reconnect.
  • BEGIN / COMMIT — transaction boundaries. Events between BEGIN/COMMIT on the same shard belong to one source MySQL transaction.
  • COPY_COMPLETED — per-shard sentinel marking the end of the initial row-copy phase and the transition to live binlog streaming.

Each event carries keyspace + shard so a consumer can either process per-shard or treat the full keyspace as one ordered logical stream.

Copy phase → replication phase

VStream executes the canonical snapshot + catch- up replication shape at the public-API altitude, per shard in parallel:

  1. Copy phase — each shard emits FIELD events for the tracked tables, then ROW events for every existing row (ordered by primary key), interleaved with VGTID messages capturing the GTID the snapshot was taken at. COPY_COMPLETED marks the end of this phase for that shard.
  2. Replication phase — each shard then tails its MySQL binlog starting from the VGTID captured in step 1, emitting ROW events for every committed transaction and advancing its VGTID entry on every batch commit.

Because each shard runs this lifecycle independently and VStream interleaves their outputs at the VTGate level, COPY_COMPLETED arrives per shard and the overall transition from "backfilling" to "caught up" is observable shard by shard.

Consumers

Canonical new wiki patterns/cdc-driver-ecosystem instance: one vendor API, N driver ecosystems.

Why it shows up on this wiki

VStream is the public-facing CDC entrypoint of Vitess and therefore of PlanetScale MySQL. It is the mechanism by which downstream ETL tools ride on a Vitess-backed database without needing to understand shard topology, shard-count, or per-shard GTID bookkeeping — all of which are papered over by VGTID and the VTGate fan-out.

Seen in

  • sources/2026-04-21-planetscale-building-data-pipelines-with-vitesscanonical wiki disclosure of VStream as Vitess's CDC primitive. Matt Lord (Vitess core maintainer) walks through the two-RPC layering (tablet-level used by VReplication internally; VTGate-level exposed to external CDC drivers), the event vocabulary (FIELD / ROW / VGTID / BEGIN / COMMIT / COPY_COMPLETED) via a worked two-shard customer keyspace example with full captured output, the VGTID keyspace-wide progress-checkpoint shape, and the driver-ecosystem payoff (Debezium + Airbyte + Fivetran + PlanetScale Connect all composing on the one API). Canonical wiki framing: "use a Vitess variant of the connector/driver rather than the MySQL one" — engine-native CDC tools see a per-shard view of a sharded cluster and can't reconstruct keyspace- level semantics.
Last updated · 319 distilled / 1,201 read