Skip to content

CONCEPT Cited by 1 source

Model signature as source of truth

Definition

Model signature as source of truth is the design discipline that an ML model's input / output declaration — exported as a first-class artefact alongside the model weights — is the canonical authority for "what features does this model consume?" Other systems (serving tier, training tier, feature-trim tier, observability tier) read the signature rather than maintaining parallel hand-curated feature lists.

Canonicalised on the wiki from Pinterest's 2026-05-01 Feature Trimmer post. (Source: sources/2026-05-01-pinterest-optimizing-ml-workload-network-efficiency-part-i-feature-trimmer)

Pinterest's instance

Pinterest exports the signature as module_info.json inside the .pt archive, alongside the TorchScript artefact:

unzip -p model.pt archive/extra/module_info.json | jq
{
  "input_names": [
    "feature_id_1",
    "feature_id_2",
    "feature_id_3",
    ...
  ],
  "output_names": [
    "output_score_1",
    "output_score_2"
  ]
}

The same artefact is consumed by:

  1. The leaf's feature converter — transforms internal-company-format features into PyTorch tensors before inference. Because it knows the inputs, "it converts only the required features and discards the rest."
  2. The root's Feature Trimmer — uses input_names as the allowlist for that model version, trimming the fan-out payload before RPC transmission.

One source of truth, two consumers. If the feature converter and the trimmer used parallel lists, drift between them would manifest as silent mis-scoring.

Load-bearing invariants

Pinterest names two invariants that make the signature usable as a source of truth:

  1. Signature version-stability"A model's signature remains unchanged across different versions. If a signature modification is necessary — for instance, to introduce a new input feature — a new model is forked from the original." This discipline is enforced socially by the convention that a signature change forces a new model name, not by tooling disclosed in the post.
  2. Signature artefact availability before deploy — Pinterest ships module_info.json as a training-pipeline output alongside other model files, so the signature is available to the bundle-build step without any runtime dependency on the model loading.

The version-stability invariant is load-bearing for the trimmer's latest-version fallback: when a score request arrives with no version specified (or a version whose allowlist isn't in the consolidated map), the trimmer falls back to the latest-version allowlist. This only yields correct trimming if signatures really are stable across versions of the same model.

Why it's the right locus

Alternative sources of truth Pinterest could have used:

  • Hand-curated YAML / JSON config — maintained by the ML team, deployed separately. Drifts from the actual model as features come and go during experimentation.
  • Derived-from-training-code static analysis — fragile; training code is research-grade and changes frequently.
  • Model weights themselves (inspect at load time) — not knowable by the root tier without loading the model, which undoes the root / leaf separation.

The signature artefact is both: - Auto-emitted by training (so it can't drift from the model). - Externally parseable without executing the model (so the root can read it without GPU).

These two properties together make it the natural locus for cross-system consumption.

Sibling shapes

  • Protobuf / Thrift IDL as schema source of truth — service-interface altitude analogue. Producer + consumer both read the IDL; drift between them is structurally impossible because they regenerate from the same source.
  • OpenAPI / GraphQL schema as API source of truth — REST / GraphQL altitude.
  • TorchScript / ONNX metadata — same altitude as Pinterest's module_info.json but different format; both carry the input/output contract externally to the weights.

Seen in

Caveats

  • Version-stability is a convention, not a check. Pinterest enforces it socially ("a new model is forked from the original"); no tooling to reject a signature-changing refresh of an existing model name is mentioned. If the convention breaks, the latest-version fallback silently sends wrong features.
  • Scope: input features only, output at roughly the same fidelity. Pinterest's module_info.json includes input_names and output_names only — not types, shapes, valid ranges, or semantic versioning. More elaborate signatures (TorchScript has shape + dtype; ONNX has more structure) could carry more but aren't necessary for the feature-trim use case.
  • Signature as source of truth is a discipline, not a mechanism. The mechanism is whatever pipeline publishes the artefact (Pinterest's training workflow) + whatever consumers read it (feature converter, trimmer). The discipline is "treat this as authoritative; don't maintain parallel lists."
Last updated · 445 distilled / 1,275 read