CONCEPT Cited by 1 source
Extensibility in protocol design¶
Definition¶
Design principle stating that schemas should be structured with
foreseeable future change in mind, so that additions and
type-evolutions can be accommodated with a new field number or a
new oneof branch rather than a breaking-change migration.
Second-half of the two-principle framing the 2024-09-16 Lyft Media
post builds its protobuf practices on (with
clarity as
the first).
From the post:
"It is crucial that protocol structure is built with future vision and potential roadmap in mind. This way, some foreseeable additions and breaking changes can be accounted for in advance. While it's impossible to predict and stay safe from all potential breaking changes, there's a few common pitfalls in protobuf, which often revolve around changing the type of a field and rearranging oneof groupings."
Why extensibility matters upfront¶
A protobuf schema change falls into one of three classes:
| Class | Example | Cost |
|---|---|---|
| Additive | New field, new oneof branch, new enum value |
Wire-compatible; old consumers ignore. |
| Rearranging | Move field in/out of oneof, change int32 → int64 |
Language-specific; often compatible-looking but breaks one stack. |
| Breaking | Rename field number, delete required field, swap string → bytes |
All clients must redeploy; stale data corrupts. |
Extensibility work before the first message ships maximises the chance that all future change falls in the additive column. Retrofitting extensibility after a schema is in production is expensive because every already-deployed client has to cooperate with the migration.
Specific pitfalls the post calls out¶
1. Integer ID that later wants to be UUID¶
A future migration to UUIDs forces a new field (string uuid = 2;),
a deprecated id, and parallel handling across every consumer.
Extensible alternative:
string is a wider contract than uint64 — the cost is a few extra
bytes and parsing, the benefit is any future ID migration is a
client-side implementation detail, not a schema change.
2. Discriminator enum + sibling fields¶
// Bad — implicit which-fields-belong-to-which-kind contract
message Event {
enum Kind { A = 0; B = 1; C = 2; }
Kind kind = 1;
uint32 payload_size = 2; // only for kind C
}
Extending to a new kind requires either adding more top-level fields or overloading existing ones — both of which make the contract harder to read and mean older consumers see new kinds as the default enum value.
Extensible alternative:
message Event {
oneof data_kind {
EventDataA data_a = 1;
EventDataB data_b = 2;
EventDataC data_c = 3;
}
}
Adding EventDataD is a one-line additive change; see
patterns/oneof-over-enum-plus-field.
3. Raw primitives for semantic types¶
If the semantic of timestamp later expands (different epoch?
timezone? precision?) the field number is already taken. Using
google.protobuf.Timestamp from the start means the semantic is
locked behind a well-known type; future precision/epoch work
happens inside the well-known type, not at the schema boundary.
Rules of thumb¶
From the post's catalogue:
- Prefer
stringIDs over integer IDs unless the schema is strictly internal and you control both ends of the deployment. - Prefer
oneofoverenum + sibling fields— the discriminator and payload become the same field. - Prefer well-known types over primitives for any field with semantics beyond the primitive (timestamps, durations, currency, coordinates).
- Reserve field numbers and names when deleting fields
(
reserved 3, 4; reserved "old_name";) so they can't be accidentally reassigned with incompatible semantics. - Never change the type of a deployed field. Even "compatible"
type changes (
int32→int64) break one or another language stack. - Never move a field between
oneofgroupings. Wire-compat doesn't imply semantic compat; some receivers may have cached which-branch-was-set state.
Balanced with clarity¶
Extensibility and
clarity pull in the same direction more often than they conflict:
oneof is more extensible and more explicit; well-known types
are more extensible and more self-documenting; UUID-compatible
strings are more extensible and more descriptive.
When the two do conflict — e.g. a field you know you'll want to extend but the most natural name lies on a specific primitive — the post's implicit resolution is: prefer clarity for this generation of the schema and migrate additively when the extension lands, rather than pre-committing to a generic type that obscures current semantics.
Seen in¶
- sources/2024-09-16-lyft-protocol-buffer-design-principles-and-practices — canonical statement on the wiki. Lyft Media's post names extensibility as the second of two load-bearing principles, ranking it equal to clarity for "long-term protocol maintainability." The specific pitfalls (integer-ID-to-UUID, oneof rearrangement, primitive-to-well-known-type) come from the post's explicit list.
Related¶
- concepts/clarity-over-efficiency-in-protocol-design — sibling principle
- concepts/backward-compatibility — the property extensibility defends
- concepts/schema-evolution — the axis extensibility is designed against
- concepts/contract-first-design — design method this principle frames
- systems/protobuf — the schema system this principle is grounded in
- patterns/oneof-over-enum-plus-field — concrete extensible pattern