Skip to content

CONCEPT Cited by 1 source

Unit-suffix field naming

Definition

Name schema fields with the explicit unit (or concrete semantic qualifier) as a suffixpayload_size_bytes, not payload_size; timestamp_ms_utc, not timestamp; duration_seconds, not duration. Consumers of the schema should not have to read documentation, runbook, or source code to answer "what unit is this?"

Why

A schema is a contract. Every field name is consumed by every downstream implementer across every language stack that generates code from the schema. Ambiguity at the name layer gets paid back across N client teams × K years × M misinterpretation-incidents.

The 2024-09-16 Lyft Media post frames it as a corollary of the clarity principle:

"What is the unit specifying this value? One may assume bytes, but one shouldn't have to — and this makes the difference between a good protocol and a lacking one … it proves much more practical to name these fields appropriately … e.g. payload_size_bytes and timestamp_ms_utc."

Classic real-world cost of the opposite convention: the Mars Climate Orbiter was lost in 1999 because one piece of software assumed pound-seconds and another assumed newton-seconds for the same "force" field. Naming the field thrust_newton_seconds would not have been a complete fix, but it would have surfaced the disagreement at a code-review desk instead of in orbital mechanics.

Examples

❌ Ambiguous ✅ Unit-suffixed Notes
payload_size payload_size_bytes or _kib, _mib if that's true
timestamp timestamp_ms_utc both unit + timezone
duration duration_seconds or duration_ms
price price_usd_cents currency + sub-unit
distance distance_km or _m, _mi
rate rate_per_second or _per_minute
temperature temperature_celsius _fahrenheit / _kelvin
ttl ttl_seconds TTL alone is already ambiguous
latency latency_p99_ms statistic + unit

Better: use the type system instead

Where a well-known type exists, prefer the type over the suffix. In protobuf:

// Good:
uint64 timestamp_ms_utc = 2;

// Better — type now encodes both unit and timezone:
google.protobuf.Timestamp timestamp_utc = 2;

// Duration:
google.protobuf.Duration timeout = 3;   // was uint32 timeout_seconds

For domain-specific recurring shapes (lat/lng, money, etc.), a team can define and standardise its own well-known types:

message LatLng {
    double lat_degrees = 1;
    double lng_degrees = 2;
}

message Money {
    string currency_code = 1;   // ISO 4217
    int64  units         = 2;   // whole units
    int32  nanos         = 3;   // 1e-9 of a unit
}

The type-system move is stronger than the naming convention because the compiler enforces it; a name-only suffix is a review-time gate.

Where this applies

  • Protobuf / gRPC schemas
  • GraphQL field definitions
  • JSON Schema / OpenAPI payloads
  • Database column names (less critical because ORM mappings usually paper over it, but still valuable for ad-hoc SQL queries)
  • Wire protocols generally

Cost

  • Slightly longer names. payload_size_bytes is longer than payload_size by two tokens on every reference. Not a cost worth optimising against.
  • Migration pain for pre-existing schemas. Renaming a field in protobuf requires a deprecated-old-plus-new-alongside cutover; adopting the convention retroactively is multi-release work.
  • Unit drift risk. Once a field is named _ms, changing the producer to emit seconds silently corrupts every consumer that didn't redeploy. The unit suffix is a contract — it has to be preserved across schema versions.

Seen in

Last updated · 319 distilled / 1,201 read