Skip to content

CONCEPT Cited by 1 source

Extensibility in protocol design

Definition

Design principle stating that schemas should be structured with foreseeable future change in mind, so that additions and type-evolutions can be accommodated with a new field number or a new oneof branch rather than a breaking-change migration. Second-half of the two-principle framing the 2024-09-16 Lyft Media post builds its protobuf practices on (with clarity as the first).

From the post:

"It is crucial that protocol structure is built with future vision and potential roadmap in mind. This way, some foreseeable additions and breaking changes can be accounted for in advance. While it's impossible to predict and stay safe from all potential breaking changes, there's a few common pitfalls in protobuf, which often revolve around changing the type of a field and rearranging oneof groupings."

Why extensibility matters upfront

A protobuf schema change falls into one of three classes:

Class Example Cost
Additive New field, new oneof branch, new enum value Wire-compatible; old consumers ignore.
Rearranging Move field in/out of oneof, change int32int64 Language-specific; often compatible-looking but breaks one stack.
Breaking Rename field number, delete required field, swap string → bytes All clients must redeploy; stale data corrupts.

Extensibility work before the first message ships maximises the chance that all future change falls in the additive column. Retrofitting extensibility after a schema is in production is expensive because every already-deployed client has to cooperate with the migration.

Specific pitfalls the post calls out

1. Integer ID that later wants to be UUID

// Bad — locks in integer semantics
message Entity {
    uint64 id = 1;
}

A future migration to UUIDs forces a new field (string uuid = 2;), a deprecated id, and parallel handling across every consumer.

Extensible alternative:

message Entity {
    string id = 1;   // accepts any ID scheme, including UUID
}

string is a wider contract than uint64 — the cost is a few extra bytes and parsing, the benefit is any future ID migration is a client-side implementation detail, not a schema change.

2. Discriminator enum + sibling fields

// Bad — implicit which-fields-belong-to-which-kind contract
message Event {
    enum Kind { A = 0; B = 1; C = 2; }
    Kind kind = 1;
    uint32 payload_size = 2;   // only for kind C
}

Extending to a new kind requires either adding more top-level fields or overloading existing ones — both of which make the contract harder to read and mean older consumers see new kinds as the default enum value.

Extensible alternative:

message Event {
    oneof data_kind {
        EventDataA data_a = 1;
        EventDataB data_b = 2;
        EventDataC data_c = 3;
    }
}

Adding EventDataD is a one-line additive change; see patterns/oneof-over-enum-plus-field.

3. Raw primitives for semantic types

uint64 timestamp = 1;

If the semantic of timestamp later expands (different epoch? timezone? precision?) the field number is already taken. Using google.protobuf.Timestamp from the start means the semantic is locked behind a well-known type; future precision/epoch work happens inside the well-known type, not at the schema boundary.

Rules of thumb

From the post's catalogue:

  • Prefer string IDs over integer IDs unless the schema is strictly internal and you control both ends of the deployment.
  • Prefer oneof over enum + sibling fields — the discriminator and payload become the same field.
  • Prefer well-known types over primitives for any field with semantics beyond the primitive (timestamps, durations, currency, coordinates).
  • Reserve field numbers and names when deleting fields (reserved 3, 4; reserved "old_name";) so they can't be accidentally reassigned with incompatible semantics.
  • Never change the type of a deployed field. Even "compatible" type changes (int32int64) break one or another language stack.
  • Never move a field between oneof groupings. Wire-compat doesn't imply semantic compat; some receivers may have cached which-branch-was-set state.

Balanced with clarity

Extensibility and clarity pull in the same direction more often than they conflict: oneof is more extensible and more explicit; well-known types are more extensible and more self-documenting; UUID-compatible strings are more extensible and more descriptive.

When the two do conflict — e.g. a field you know you'll want to extend but the most natural name lies on a specific primitive — the post's implicit resolution is: prefer clarity for this generation of the schema and migrate additively when the extension lands, rather than pre-committing to a generic type that obscures current semantics.

Seen in

  • sources/2024-09-16-lyft-protocol-buffer-design-principles-and-practicescanonical statement on the wiki. Lyft Media's post names extensibility as the second of two load-bearing principles, ranking it equal to clarity for "long-term protocol maintainability." The specific pitfalls (integer-ID-to-UUID, oneof rearrangement, primitive-to-well-known-type) come from the post's explicit list.
Last updated · 319 distilled / 1,201 read