SYSTEM Cited by 1 source
VStream¶
What it is¶
VStream (vitess.io docs) is Vitess's low-level gRPC change-stream primitive — the substrate under every VReplication workflow and the public API under every third-party CDC driver for Vitess. VStream exposes a Vitess keyspace's committed changes as a single ordered stream, papering over the fact that the keyspace may be backed by hundreds or thousands of MySQL shards.
Two RPCs, two altitudes¶
VStream has two surfaces in the Vitess codebase, both named "VStream":
- Tablet-level VStream (queryservice.proto L103-L113)
— per-VTTablet RPC: stream changes from
one specific MySQL shard. Consumed internally by VReplication
workflows (
MoveTables,Reshard,Materialize) that need per-source-shard granularity. - VTGate-level VStream (vtgateservice.proto L55-L56) — keyspace-wide RPC exposed by every VTGate. Fans out to every shard's tablet-level VStream, interleaves the per-shard event streams, attaches keyspace-wide VGTID checkpoints. This is the API external CDC drivers call.
The layering is tidy: one primitive, two consumer altitudes. Internal data motion and external CDC ride the same mechanism.
Stream shape¶
Events in a VStream fall into a small vocabulary (from the output logs in the 2024-07-29 post's worked example):
FIELD— schema-for-table announcement (column list, types, charsets). Emitted once per table per stream start + on schema changes.ROW— a per-row change (insert / update / delete). Carriesbefore/afterimages of the row plus the shard the row originated from.VGTID— keyspace-wide progress checkpoint; payload is a set of per-shard GTID positions. Persist this on the consumer side; resubmit on reconnect.BEGIN/COMMIT— transaction boundaries. Events between BEGIN/COMMIT on the same shard belong to one source MySQL transaction.COPY_COMPLETED— per-shard sentinel marking the end of the initial row-copy phase and the transition to live binlog streaming.
Each event carries keyspace + shard so a consumer can
either process per-shard or treat the full keyspace as one
ordered logical stream.
Copy phase → replication phase¶
VStream executes the canonical snapshot + catch- up replication shape at the public-API altitude, per shard in parallel:
- Copy phase — each shard emits
FIELDevents for the tracked tables, thenROWevents for every existing row (ordered by primary key), interleaved withVGTIDmessages capturing the GTID the snapshot was taken at.COPY_COMPLETEDmarks the end of this phase for that shard. - Replication phase — each shard then tails its MySQL
binlog starting from the VGTID captured in step 1, emitting
ROWevents for every committed transaction and advancing its VGTID entry on every batch commit.
Because each shard runs this lifecycle independently and
VStream interleaves their outputs at the VTGate level,
COPY_COMPLETED arrives per shard and the overall transition
from "backfilling" to "caught up" is observable shard by
shard.
Consumers¶
- VReplication itself — the
substrate under
MoveTables,Reshard,Materialize, and online schema changes. Uses the tablet-level RPC. - Debezium Vitess connector — streams from VTGate VStream into Kafka. docs.
- PlanetScale Connect — PlanetScale's own ELT feature, built on VStream.
- Airbyte Vitess source — planetscale/airbyte-source.
- Fivetran Vitess source — planetscale/fivetran-source.
- Bespoke clients — the example VStream client in the Vitess repo shows the raw gRPC shape.
Canonical new wiki patterns/cdc-driver-ecosystem instance: one vendor API, N driver ecosystems.
Why it shows up on this wiki¶
VStream is the public-facing CDC entrypoint of Vitess and therefore of PlanetScale MySQL. It is the mechanism by which downstream ETL tools ride on a Vitess-backed database without needing to understand shard topology, shard-count, or per-shard GTID bookkeeping — all of which are papered over by VGTID and the VTGate fan-out.
Seen in¶
- sources/2026-04-21-planetscale-building-data-pipelines-with-vitess
— canonical wiki disclosure of VStream as Vitess's CDC
primitive. Matt Lord (Vitess core maintainer) walks through
the two-RPC layering (tablet-level used by VReplication
internally; VTGate-level exposed to external CDC drivers),
the event vocabulary (FIELD / ROW / VGTID / BEGIN / COMMIT /
COPY_COMPLETED) via a worked two-shard
customerkeyspace example with full captured output, the VGTID keyspace-wide progress-checkpoint shape, and the driver-ecosystem payoff (Debezium + Airbyte + Fivetran + PlanetScale Connect all composing on the one API). Canonical wiki framing: "use a Vitess variant of the connector/driver rather than the MySQL one" — engine-native CDC tools see a per-shard view of a sharded cluster and can't reconstruct keyspace- level semantics.
Related¶
- systems/vitess
- systems/vitess-vreplication
- systems/vitess-vtgate
- systems/planetscale-connect
- systems/debezium
- systems/mysql
- concepts/vgtid
- concepts/gtid-position
- concepts/binlog-replication
- concepts/change-data-capture
- concepts/unified-change-stream-across-shards
- patterns/snapshot-plus-catchup-replication
- patterns/cdc-driver-ecosystem
- companies/planetscale