Skip to content

SYSTEM Cited by 1 source

Protocol Buffers (protobuf)

Protocol Buffers (protobuf.dev) is Google's open-source, language-neutral, schema-driven binary serialization format and the default IDL for gRPC. A .proto file declares messages, enums, and service RPCs; the protoc compiler generates typed code for C++ / Java / Python / Go / C# / Swift / Kotlin / JavaScript / etc. Messages serialise to a compact tag-length-value binary format, and the schema is designed for forwards-and-backwards-compatible evolution — unknown fields are preserved on deserialise/re-serialise, field numbers (not names) are load-bearing on the wire, and the default-value semantics are chosen to tolerate added and removed fields.

Key language features (proto3)

  • message — record type with numbered fields. Field number, not name, is what appears on the wire.
  • enum — named integer constants. 0 must be present and is the default value for enum-typed fields; canonical convention is to reserve 0 as UNKNOWN (see concepts/unknown-zero-enum-value).
  • oneof — tagged union; exactly one of the contained fields may be set at a time. The wire encoding only includes the set branch. See patterns/oneof-over-enum-plus-field.
  • optional (reintroduced in proto3 3.15, 2021) — on singular scalars, adds HasField() presence semantics so callers can distinguish "absent" from "equals type default". Before 3.15, the workaround was the google.protobuf.*Value wrapper types (StringValue, UInt32Value, …). See concepts/proto3-explicit-optional.
  • map<K, V> — key/value collection; equivalent to a repeated message of {K, V} pairs on the wire.
  • Well-known typesgoogle.protobuf.Timestamp, Duration, Any, the *Value wrappers; standardised imports every codegen supports.

Why protobuf (vs JSON, vs Cap'n Proto, vs Avro)

From the 2024-09-16 Lyft Media post:

  1. Efficiency — denser on wire than JSON, faster encode/decode.
  2. Forward/backward compatibility — tag-based encoding tolerates schema drift by design.
  3. Multi-language codegen — Python / Swift / Kotlin / Java / Go / C++ / TS all first-class.
  4. Rich tooling — Lyft specifically calls out "rich internal tooling at Lyft, and widespread use in both mobile-to-server and server-to-server domains."
  5. Validation extensions — declarative validation via protoc-gen-validate / protovalidate over the same .proto surface.

Contrast:

  • JSON — human-readable, ubiquitous, but larger wire size, weaker typing, and no first-class schema evolution story.
  • Cap'n Proto — zero-copy memory representation ("the in-memory layout is the wire"); protobuf encodes/decodes which costs CPU but gives flexibility. Cap'n Proto drops its schema language in the JS-native Cap'n Web successor.
  • Avro — schema stored with the data or in a registry, names on the wire, more dynamic; protobuf uses static codegen and field numbers.

Compatibility rules (must-know)

  • Never reuse a field number. Reserve numbers / names of removed fields (reserved 3, 4; reserved "old_name";) so future edits can't accidentally reassign them.
  • Changing field type is usually breaking even when it looks compatible (e.g. int32int64 works on some languages, breaks on others).
  • Enums are "open" by default in proto3 — a consumer may receive unknown numeric values; use (validate.rules).enum.defined_only to close the set at the validation layer.
  • Adding a new field to a oneof or moving a field in/out of a oneof is wire-compatible but can change which-branch-is-set semantics in subtle ways; avoid.
  • required was dropped in proto3 explicitly because relaxing it in a proto2 schema was effectively impossible (see concepts/proto3-explicit-optional).

Design practices (Lyft Media, 2024-09-16)

Two principles + five practices from sources/2024-09-16-lyft-protocol-buffer-design-principles-and-practices:

Seen in

Last updated · 319 distilled / 1,201 read