PATTERN Cited by 1 source
Protobuf validation rules¶
Summary¶
Declare field- and message-level validation constraints inline in
the .proto file using
protoc-gen-validate
(PGV) or its stable successor
protovalidate. One
schema definition produces validators in every language the team
generates code for; rule evolution is reviewable at schema-change
time; and the critical operational rule is that generated
validators are not invoked automatically on parse — callers must
call Validate() at every trust boundary.
Problem¶
Cross-language services sharing protobuf schemas have four ways to spell validation:
- No validation. Any well-formed wire message is accepted; the business-logic layer discovers inconsistencies at runtime (bad UUIDs, empty strings that should be non-empty, enums with unsupported values).
- Hand-rolled per-language. Each codebase adds its own checks. The rules drift; a producer team tightens a rule its consumers haven't heard about; bugs.
- External validation service. Centralised service or library
checks messages; now there are two sources of truth (
.protocontract + validation service) that can disagree. - Rules attached to the schema itself. One source of truth. Generated validators land in every language stack. Rule evolution is a schema-change CR.
(4) is the pattern. PGV (now in maintenance) and protovalidate (the successor) implement it.
Solution¶
Declare rules as protobuf options inside the .proto:
syntax = "proto3";
import "google/protobuf/timestamp.proto";
import "validate/validate.proto";
message Event {
string id = 1 [(validate.rules).string = {
min_len: 1,
uuid: true
}];
google.protobuf.Timestamp timestamp_utc = 2 [(validate.rules).timestamp = {
required: true
}];
oneof data_kind {
option (validate.required) = true; // force one branch set
EventDataA data_a = 3;
EventDataB data_b = 4;
EventDataC data_c = 5;
}
}
message EventDataC {
optional uint32 payload_size_bytes = 1 [(validate.rules).uint32 = {
lt: 10000000 // < 10MB
}];
}
Run the validator plugin; generated code includes a Validate() (Go,
Java, C++) or equivalent per-language entry point.
Rule families the Lyft post calls out¶
From the 2024-09-16 post's explicit recommendations:
| Rule family | Key rules |
|---|---|
oneof |
option (validate.required) = true; — force one branch set. Critical; default is none required. |
message |
required: true for sub-message presence. |
enum |
defined_only: true (closes proto3's open-enum semantics), in: [...], not_in: [0] (exclude UNKNOWN). |
string |
min_len: 1 (non-empty), uuid, email, ipv4/ipv6, uri, pattern: "..." (regex). |
repeated |
min_len: 1 (non-empty), items: { ... } (per-element rules), unique: true (set semantics). |
map |
min_pairs, keys: { ... }, values: { ... }, no_sparse: true (require non-null values). |
| wrapper types | Same rule family as the wrapped primitive — google.protobuf.StringValue + (validate.rules).string. |
Critical caveat — validation is not automatic¶
The generated validator has to be called explicitly. Parsing a protobuf wire message does not run validation. From the 2024-09-16 Lyft post:
"it is important to understand that the generated validation methods still need to be called manually — if a message is formed in violation of the stated rules, nothing will fail until its validator is invoked!"
Operational implications:
- Every handler that accepts a message from an untrusted source
(mobile → backend, external API → internal service, cross-service
RPC where the peer is in a different trust zone) must call
Validate(). - Forgetting the call is a silent gap, not a compile error. No
logline, no metric, no runtime assertion surfaces the omission. - Code review + linting should treat "deserialise-without-validate" as a checkable anti-pattern at trust boundaries.
- A centralised deserialise helper that always calls
Validate()is a cheap safety net; pushing responsibility to every handler doesn't scale.
Canonical invocation in Python (PGV):
import protoc_gen_validate.validator
from your_pb.event_pb2 import Event as EventPB
event_pb = EventPB(...)
try:
protoc_gen_validate.validator.validate(event_pb)
except protoc_gen_validate.validator.ValidationFailed as ex:
raise ValueError(f'Protobuf validation error: {ex}')
Composition with other protobuf practices¶
Validation rules compose naturally with the other practices in the same source:
- patterns/oneof-over-enum-plus-field +
option (validate.required) = true;— theoneofmakes the contract explicit; the rule closes the "zero branches set" loophole. - concepts/unknown-zero-enum-value +
(validate.rules).enum = { not_in: [0] }— the sentinel survives wire parsing; the rule rejects it at the trust boundary. - concepts/proto3-explicit-optional + wrapper-type rules —
google.protobuf.StringValuewith(validate.rules).string = {min_len: 1}validates both "must be present" and "non-empty." - Unit-suffixed names +
range rules —
payload_size_byteswith(validate.rules).uint32 = { lt: 10000000 }pins both meaning and range.
CI-time complement¶
Runtime validation is the last-line-of-defence at trust boundaries.
CI-time validation of schemas themselves — checking that a proposed
.proto change is backwards-compatible — is a separate, orthogonal
gate. See patterns/schema-validation-before-deploy.
The two levels compose:
- CI-time: "This schema change doesn't break deployed consumers."
- Runtime: "This specific message satisfies this specific schema's rules."
Both are needed; neither subsumes the other.
PGV → protovalidate¶
Per the 2024-09-16 post, PGV is stable and succeeded by protovalidate; new projects should start with protovalidate. The rule-family concepts transfer directly; the rule namespace and runtime change. See systems/protoc-gen-validate.
Seen in¶
- sources/2024-09-16-lyft-protocol-buffer-design-principles-and-practices — canonical pattern statement on the wiki. Full rule-family taxonomy, the "validators must be invoked manually" caveat in bold, and the PGV → protovalidate migration note. The post's consolidated closing example combines this pattern with every other practice it covers.
Related¶
- systems/protobuf — the schema system
- systems/protoc-gen-validate — the validation plugin (PGV / protovalidate)
- patterns/oneof-over-enum-plus-field — the pattern this one pairs with most often
- patterns/schema-validation-before-deploy — the CI-time complement (checks schema evolution, not per-message correctness)
- patterns/client-side-schema-validation — related pattern on other schema systems (TypeScript + Zod, JSON-Schema)
- concepts/unknown-zero-enum-value + concepts/proto3-explicit-optional — sibling clarity practices this pattern composes with
- concepts/clarity-over-efficiency-in-protocol-design — the principle that justifies inline declarative rules