CONCEPT

VGTID¶

Definition¶

VGTID is Vitess's keyspace-wide progress token in a VStream — a set of per-shard MySQL GTID positions, one entry per shard in the target keyspace, wrapped in a single message the VStream consumer can persist as a one-token restart checkpoint. It extends the per-MySQL-server GTID position to the sharded-keyspace altitude.

The problem VGTID solves¶

Before VGTID, a consumer of change events from a sharded MySQL fleet had to:

Know the shard layout.
Track a separate GTID position per shard.
Glue per-shard streams together into a reasonable replay order.
Serialize the entire per-shard-GTID map itself and manage its evolution as shards are added / removed / split.

That is exactly the engine-specific bookkeeping a sharding layer should hide from downstream consumers.

VGTID moves that responsibility server-side: VTGate emits one VGTID message periodically into the VStream whose payload is the full per-shard GTID map. The consumer persists that one token and resubmits it on reconnect. The VStream server does the rest.

Shape of a VGTID¶

From the 2024-07-29 worked example (two-shard customer keyspace):

type:VGTID vgtid:{
  shard_gtids:{keyspace:"customer" shard:"-80"}
  shard_gtids:{keyspace:"customer" shard:"80-"
               gtid:"MySQL56/97257a80-3e14-11ef-b24b-2eaacc019d10:1-58"}
}

Each shard_gtids entry binds {keyspace, shard} to an optional gtid value. Absent gtid means "start from the beginning of this shard's binlog history"; present gtid means "resume from after this position on this shard."

Mid-copy-phase VGTIDs additionally carry a table_p_ks sub-message capturing the last-copied primary-key rows per shard per table — the resumable-snapshot bookkeeping that lets the copy phase recover exactly-where-it-left-off on reconnect:

vgtid:{
  shard_gtids:{keyspace:"customer" shard:"80-"
               gtid:"MySQL56/97257a80-..."
               table_p_ks:{table_name:"customer"
                           lastpk:{fields:{...} rows:{...}}}}}

Cardinality + size scaling¶

The VGTID payload is O(shard count). For a keyspace with hundreds or thousands of shards the VGTID message is correspondingly larger — that's the trade: a single checkpoint token that scales with the topology, vs N independent cursors the consumer has to maintain. Vitess absorbs the serialisation cost; the consumer gets a simple one-token restart contract.

The 2024-07-29 post does not publish per-shard-count size measurements; the concept is defined by behaviour, not by a published size bound.

Emission cadence¶

VStream emits a VGTID at natural boundaries in the event stream:

During the copy phase, after a batch of rows is written for a shard (capturing the last-copied primary-key position
the shard's current GTID).
On COPY_COMPLETED sentinels marking the end of the per-shard snapshot.
During the replication phase, on each batch commit as shards advance their binlog positions.

This aligns the VGTID with the event-group boundaries a consumer would want to checkpoint at anyway — one durable write of (last-processed event batch, VGTID) persists the full consumer state.

Relationship to MySQL GTID¶

The gtid: field inside each shard_gtids entry is a literal MySQL GTID set (MySQL56/<source_uuid>:<transaction_range>). VGTID is purely a compositional wrapper — it doesn't invent a new identity or ordering scheme, it just bundles per-shard MySQL GTIDs with shard-identity metadata.

This means every tool that understands MySQL GTIDs (MySQL server replication, pt-heartbeat, Percona toolkit observability, legacy Debezium MySQL connector tooling) can still inspect the underlying positions inside a VGTID payload for debugging — the wrapping is additive, not opaque.

Seen in¶

— canonical wiki disclosure of VGTID as VStream's unified change-stream progress token. Matt Lord's worked two-shard customer keyspace example shows the VGTID payload shape, the per-shard gtid + table_p_ks sub- fields, and the interleaving of VGTID messages with FIELD / ROW / COPY_COMPLETED events. The post doesn't name "VGTID" as a defined concept — the name comes from the literal protobuf type tag in the output logs — but the role is unambiguous: keyspace-wide restart-checkpoint that a consumer persists once per batch instead of maintaining per-shard cursor state itself.