CONCEPT Cited by 1 source
Model signature as source of truth¶
Definition¶
Model signature as source of truth is the design discipline that an ML model's input / output declaration — exported as a first-class artefact alongside the model weights — is the canonical authority for "what features does this model consume?" Other systems (serving tier, training tier, feature-trim tier, observability tier) read the signature rather than maintaining parallel hand-curated feature lists.
Canonicalised on the wiki from Pinterest's 2026-05-01 Feature Trimmer post. (Source: sources/2026-05-01-pinterest-optimizing-ml-workload-network-efficiency-part-i-feature-trimmer)
Pinterest's instance¶
Pinterest exports the signature as module_info.json inside the .pt archive, alongside the TorchScript artefact:
unzip -p model.pt archive/extra/module_info.json | jq
{
"input_names": [
"feature_id_1",
"feature_id_2",
"feature_id_3",
...
],
"output_names": [
"output_score_1",
"output_score_2"
]
}
The same artefact is consumed by:
- The leaf's feature converter — transforms internal-company-format features into PyTorch tensors before inference. Because it knows the inputs, "it converts only the required features and discards the rest."
- The root's Feature Trimmer — uses
input_namesas the allowlist for that model version, trimming the fan-out payload before RPC transmission.
One source of truth, two consumers. If the feature converter and the trimmer used parallel lists, drift between them would manifest as silent mis-scoring.
Load-bearing invariants¶
Pinterest names two invariants that make the signature usable as a source of truth:
- Signature version-stability — "A model's signature remains unchanged across different versions. If a signature modification is necessary — for instance, to introduce a new input feature — a new model is forked from the original." This discipline is enforced socially by the convention that a signature change forces a new model name, not by tooling disclosed in the post.
- Signature artefact availability before deploy — Pinterest ships
module_info.jsonas a training-pipeline output alongside other model files, so the signature is available to the bundle-build step without any runtime dependency on the model loading.
The version-stability invariant is load-bearing for the trimmer's latest-version fallback: when a score request arrives with no version specified (or a version whose allowlist isn't in the consolidated map), the trimmer falls back to the latest-version allowlist. This only yields correct trimming if signatures really are stable across versions of the same model.
Why it's the right locus¶
Alternative sources of truth Pinterest could have used:
- Hand-curated YAML / JSON config — maintained by the ML team, deployed separately. Drifts from the actual model as features come and go during experimentation.
- Derived-from-training-code static analysis — fragile; training code is research-grade and changes frequently.
- Model weights themselves (inspect at load time) — not knowable by the root tier without loading the model, which undoes the root / leaf separation.
The signature artefact is both: - Auto-emitted by training (so it can't drift from the model). - Externally parseable without executing the model (so the root can read it without GPU).
These two properties together make it the natural locus for cross-system consumption.
Sibling shapes¶
- Protobuf / Thrift IDL as schema source of truth — service-interface altitude analogue. Producer + consumer both read the IDL; drift between them is structurally impossible because they regenerate from the same source.
- OpenAPI / GraphQL schema as API source of truth — REST / GraphQL altitude.
- TorchScript / ONNX metadata — same altitude as Pinterest's
module_info.jsonbut different format; both carry the input/output contract externally to the weights.
Seen in¶
- 2026-05-01 Pinterest — Optimizing ML Workload Network Efficiency (Part I): Feature Trimmer (sources/2026-05-01-pinterest-optimizing-ml-workload-network-efficiency-part-i-feature-trimmer) — canonical articulation; both consumers (feature converter, Feature Trimmer) named explicitly; version-stability invariant named + fork-on-incompat rule stated; deploy-pipeline integration described.
Caveats¶
- Version-stability is a convention, not a check. Pinterest enforces it socially ("a new model is forked from the original"); no tooling to reject a signature-changing refresh of an existing model name is mentioned. If the convention breaks, the latest-version fallback silently sends wrong features.
- Scope: input features only, output at roughly the same fidelity. Pinterest's
module_info.jsonincludesinput_namesandoutput_namesonly — not types, shapes, valid ranges, or semantic versioning. More elaborate signatures (TorchScript has shape + dtype; ONNX has more structure) could carry more but aren't necessary for the feature-trim use case. - Signature as source of truth is a discipline, not a mechanism. The mechanism is whatever pipeline publishes the artefact (Pinterest's training workflow) + whatever consumers read it (feature converter, trimmer). The discipline is "treat this as authoritative; don't maintain parallel lists."
Related¶
- concepts/send-what-you-use — the principle this source of truth enables.
- concepts/root-leaf-ml-serving-architecture — the architecture whose trimming is gated on this signature.
- systems/pinterest-feature-trimmer — the production consumer.
- systems/pytorch — the
.ptarchive format and TorchScript substrate. - patterns/artifact-rides-model-deploy-pipeline — how the signature reaches the root.
- patterns/feature-allowlist-over-blocklist — how the signature's
input_namesis used on the producer side.