Skip to content

PATTERN Cited by 1 source

Protobuf validation rules

Summary

Declare field- and message-level validation constraints inline in the .proto file using protoc-gen-validate (PGV) or its stable successor protovalidate. One schema definition produces validators in every language the team generates code for; rule evolution is reviewable at schema-change time; and the critical operational rule is that generated validators are not invoked automatically on parse — callers must call Validate() at every trust boundary.

Problem

Cross-language services sharing protobuf schemas have four ways to spell validation:

  1. No validation. Any well-formed wire message is accepted; the business-logic layer discovers inconsistencies at runtime (bad UUIDs, empty strings that should be non-empty, enums with unsupported values).
  2. Hand-rolled per-language. Each codebase adds its own checks. The rules drift; a producer team tightens a rule its consumers haven't heard about; bugs.
  3. External validation service. Centralised service or library checks messages; now there are two sources of truth (.proto contract + validation service) that can disagree.
  4. Rules attached to the schema itself. One source of truth. Generated validators land in every language stack. Rule evolution is a schema-change CR.

(4) is the pattern. PGV (now in maintenance) and protovalidate (the successor) implement it.

Solution

Declare rules as protobuf options inside the .proto:

syntax = "proto3";
import "google/protobuf/timestamp.proto";
import "validate/validate.proto";

message Event {
    string id = 1 [(validate.rules).string = {
        min_len: 1,
        uuid: true
    }];

    google.protobuf.Timestamp timestamp_utc = 2 [(validate.rules).timestamp = {
        required: true
    }];

    oneof data_kind {
        option (validate.required) = true;   // force one branch set
        EventDataA data_a = 3;
        EventDataB data_b = 4;
        EventDataC data_c = 5;
    }
}

message EventDataC {
    optional uint32 payload_size_bytes = 1 [(validate.rules).uint32 = {
        lt: 10000000   // < 10MB
    }];
}

Run the validator plugin; generated code includes a Validate() (Go, Java, C++) or equivalent per-language entry point.

Rule families the Lyft post calls out

From the 2024-09-16 post's explicit recommendations:

Rule family Key rules
oneof option (validate.required) = true; — force one branch set. Critical; default is none required.
message required: true for sub-message presence.
enum defined_only: true (closes proto3's open-enum semantics), in: [...], not_in: [0] (exclude UNKNOWN).
string min_len: 1 (non-empty), uuid, email, ipv4/ipv6, uri, pattern: "..." (regex).
repeated min_len: 1 (non-empty), items: { ... } (per-element rules), unique: true (set semantics).
map min_pairs, keys: { ... }, values: { ... }, no_sparse: true (require non-null values).
wrapper types Same rule family as the wrapped primitive — google.protobuf.StringValue + (validate.rules).string.

Critical caveat — validation is not automatic

The generated validator has to be called explicitly. Parsing a protobuf wire message does not run validation. From the 2024-09-16 Lyft post:

"it is important to understand that the generated validation methods still need to be called manually — if a message is formed in violation of the stated rules, nothing will fail until its validator is invoked!"

Operational implications:

  • Every handler that accepts a message from an untrusted source (mobile → backend, external API → internal service, cross-service RPC where the peer is in a different trust zone) must call Validate().
  • Forgetting the call is a silent gap, not a compile error. No log line, no metric, no runtime assertion surfaces the omission.
  • Code review + linting should treat "deserialise-without-validate" as a checkable anti-pattern at trust boundaries.
  • A centralised deserialise helper that always calls Validate() is a cheap safety net; pushing responsibility to every handler doesn't scale.

Canonical invocation in Python (PGV):

import protoc_gen_validate.validator
from your_pb.event_pb2 import Event as EventPB

event_pb = EventPB(...)
try:
    protoc_gen_validate.validator.validate(event_pb)
except protoc_gen_validate.validator.ValidationFailed as ex:
    raise ValueError(f'Protobuf validation error: {ex}')

Composition with other protobuf practices

Validation rules compose naturally with the other practices in the same source:

  • patterns/oneof-over-enum-plus-field + option (validate.required) = true; — the oneof makes the contract explicit; the rule closes the "zero branches set" loophole.
  • concepts/unknown-zero-enum-value + (validate.rules).enum = { not_in: [0] } — the sentinel survives wire parsing; the rule rejects it at the trust boundary.
  • concepts/proto3-explicit-optional + wrapper-type rules — google.protobuf.StringValue with (validate.rules).string = {min_len: 1} validates both "must be present" and "non-empty."
  • Unit-suffixed names + range rules — payload_size_bytes with (validate.rules).uint32 = { lt: 10000000 } pins both meaning and range.

CI-time complement

Runtime validation is the last-line-of-defence at trust boundaries. CI-time validation of schemas themselves — checking that a proposed .proto change is backwards-compatible — is a separate, orthogonal gate. See patterns/schema-validation-before-deploy.

The two levels compose:

  • CI-time: "This schema change doesn't break deployed consumers."
  • Runtime: "This specific message satisfies this specific schema's rules."

Both are needed; neither subsumes the other.

PGV → protovalidate

Per the 2024-09-16 post, PGV is stable and succeeded by protovalidate; new projects should start with protovalidate. The rule-family concepts transfer directly; the rule namespace and runtime change. See systems/protoc-gen-validate.

Seen in

Last updated · 319 distilled / 1,201 read