Skip to content

PATTERN Cited by 1 source

Oneof over enum-plus-field

Summary

When a protobuf message has a variant nature — different kinds carry different payloads — model it as a oneof tagged union rather than as a discriminator enum plus a grab-bag of per-kind optional fields. The oneof makes the kind and the payload the same field, eliminates the implicit "kind == C means only payload_size is valid" contract, and drops wire size because unset branches don't serialise.

Problem

The naive schema for a variant-nature message:

// ⚠️ Anti-pattern
message Event {
    enum Kind {
        EVENT_KIND_A = 0;
        EVENT_KIND_B = 1;
        EVENT_KIND_C = 2;
    }
    uint64 id           = 1;
    uint64 timestamp    = 2;
    Kind   kind         = 3;
    uint32 payload_size = 4;   // specific to EVENT_KIND_C
}

Multiple problems compound with every new kind:

  • Implicit correctness contract. If kind == EVENT_KIND_C, is payload_size required to be set? If kind == EVENT_KIND_A, is it required to be absent? The schema doesn't say. Every consumer and producer has to maintain this correspondence in code.
  • Conditional branches on every access. Readers can't access payload_size without first checking kind, and must keep the check in sync with every new kind.
  • Cross-kind field pollution. Kind-A-only fields, kind-B-only fields, and kind-C-only fields all sit at the same level; the schema can't express which belong together.
  • Every new kind compounds the problem. Adding EVENT_KIND_D means adding its per-kind fields to the top level and updating the cross-field validation logic everywhere it exists.
  • Wire overhead. Kind-A messages still carry a field tag for payload_size (even if empty).

As the 2024-09-16 Lyft post frames it:

"It's nice to work with a protocol that's structured in a way where it explains itself; both as perceived immediately and as proven by iterating on it long term."

The anti-pattern violates that at the schema layer and pushes the burden into caller code.

Solution: oneof

Model the variant as a oneof union with one sub-message per kind:

message Event {
    uint64 id               = 1;
    google.protobuf.Timestamp timestamp_utc = 2;
    oneof data_kind {
        option (validate.required) = true;   // enforce one branch set
        EventDataA data_a = 3;
        EventDataB data_b = 4;
        EventDataC data_c = 5;
    }
}

message EventDataA {}   // empty when no per-kind fields
message EventDataB {}
message EventDataC {
    optional uint32 payload_size_bytes = 1;
}

Properties this buys:

  • The discriminator and the payload are the same field. No way to desync them; no way to set a B-payload for an A-kind.
  • Each kind's fields live together in their own message. Adding a new kind-specific field only needs to touch one message.
  • Generated code exposes WhichOneof('data_kind') or equivalent, making dispatch explicit and exhaustive-checkable in strongly-typed languages.
  • Only the set branch serialises. data_a messages don't carry a data_c.payload_size_bytes wire tag.
  • Adding EventDataD is a one-line additive change and doesn't touch existing consumers' code until they opt in to handle it.

Example consumer code:

kind = event_pb.WhichOneof('data_kind')
if kind == 'data_a':
    handle_a(event_pb.data_a)
elif kind == 'data_b':
    handle_b(event_pb.data_b)
elif kind == 'data_c':
    handle_c(event_pb.data_c.payload_size_bytes
              if event_pb.data_c.HasField('payload_size_bytes')
              else None)
else:
    handle_unknown(kind)   # new kind added after this consumer
                           # was built

Enforce one-branch-required at the validation layer

By default a oneof does NOT require any of its branches to be setWhichOneof can return None. The 2024-09-16 Lyft post flags this as a counterintuitive default and prescribes the fix via protoc-gen-validate / protovalidate:

oneof data_kind {
    option (validate.required) = true;
    EventDataA data_a = 3;
    EventDataB data_b = 4;
    EventDataC data_c = 5;
}

With option (validate.required) = true; the generated validator rejects a message where none of data_a / data_b / data_c is set. See patterns/protobuf-validation-rules — validation still has to be explicitly invoked.

Shared-field refactor

If two or three kinds share a common subset of fields, duplicating the fields across their per-kind messages is worth the minor repetition cost:

message EventDataA {
    string actor_id = 1;     // A and B both have actor_id
}
message EventDataB {
    string actor_id = 1;
    string target_id = 2;    // B-specific
}

Alternative: promote the shared field to the outer Event message when it's genuinely present for every kind. The trap to avoid is promoting a field to the outer level when most but not all kinds have it — that reintroduces the optional-field ambiguity the oneof was supposed to eliminate.

Extensibility note

Moving a field into or out of a oneof is technically wire-compatible but can change which-branch-was-set state on consumers that cache WhichOneof() results. The 2024-09-16 Lyft post calls this out as a common protobuf pitfall:

"there's a few common pitfalls in protobuf, which often revolve around changing the type of a field and rearranging oneof groupings."

Design the oneof membership up-front and avoid moving fields in and out after deploy.

When NOT to use oneof

  • No branch-specific fields exist — if every kind has the same shape and only the discriminator differs, a simple enum (with _UNKNOWN = 0) is fine.
  • Mutually exclusive but orthogonal state — if two variants can legitimately coexist (e.g. status + reason), they're separate fields, not a oneof.
  • More than 10-20 branches — at that cardinality, a oneof gets awkward to maintain; consider a dedicated container or a polymorphic message ecosystem.

Seen in

  • sources/2024-09-16-lyft-protocol-buffer-design-principles-and-practicescanonical motivation. Lyft Media's post uses the discriminator-enum anti-pattern as the opening example and evolves it into a oneof across three code-block iterations; flags option (validate.required) = true as the missing-default fix; the final consolidated example combines the pattern with unit-suffixed field names, UNKNOWN enum sentinels, explicit optional, and inline validation.
Last updated · 319 distilled / 1,201 read