SYSTEM Cited by 1 source

Instacart Griffin 2.0¶

Definition¶

Griffin 2.0 is Instacart's machine-learning serving platform — the substrate that hosts production ML models including the generative ads retrieval model's GPU serving stack. Disclosed in Source: sources/2026-06-02-instacart-from-scoring-to-spelling-rebuilding-ads-retrieval-at-instacart as the integration target for the new GPU serving stack:

"It is fully integrated with Griffin 2.0, Instacart's next-gen ML platform, streamlining deployment and maintenance within our ecosystem."

Griffin 2.0 hosts Go-native services with embedded GPU inference backed by TensorRT-LLM + NVIDIA Triton Inference Server for autoregressive-decoding workloads, replacing the legacy "Python and CPU inference" shape it succeeds.

What we know¶

The 2026-06-02 post is the first wiki disclosure naming Griffin 2.0; architectural detail is shallow:

Hosts production ML services — generative ads retrieval is "fully integrated with Griffin 2.0."
Streamlines deployment + maintenance — referenced as the value the platform provides (vs. a bespoke serving deployment per model).
Supports Go-native services — the new GPU-stack serving shell is described as "Implemented as a Go-native service" delivering "higher throughput and lower latency compared to the legacy Python environment" — which signals Griffin 2.0 supports non-Python service shells around GPU inference engines.
Implicit "1.0" predecessor — the "2.0" naming implies a prior generation focused on Python+CPU serving (consistent with the legacy stack the new GPU stack replaces).

Companion serving substrate¶

The 2026-06 post explicitly stacks Griffin 2.0 with three other systems for generative ads retrieval:

TensorRT-LLM — NVIDIA's high-performance LLM inference compiler.
NVIDIA Triton Inference Server — NVIDIA's serving runtime above TensorRT-LLM.
Go-native service shell — the request-handling layer above Triton.

Together these are canonicalised as the patterns/gpu-serving-stack-tensorrt-llm-triton pattern.

Caveats¶

Architecture, runtime, scheduling internals not disclosed.
Coverage across model types (LLM vs classical ranker vs feature store) not disclosed.
Multi-tenancy / capacity-allocation / autoscaling internals not disclosed.
Predecessor "Griffin 1.0" not linked or characterised in the source.
Prior Instacart write-up "Introducing Griffin 2.0: Instacart's Next-gen ML Platform" is linked from the source post but not yet ingested on the wiki — full architectural disclosure deferred.

Seen in¶

sources/2026-06-02-instacart-from-scoring-to-spelling-rebuilding-ads-retrieval-at-instacart — first canonical wiki mention; ML platform host for the new GPU serving stack.

systems/instacart-generative-ads-retrieval — first wiki- disclosed Griffin 2.0 production tenant.
systems/tensorrt-llm / systems/nvidia-triton-inference-server — the GPU inference substrate Griffin 2.0 hosts.
patterns/gpu-serving-stack-tensorrt-llm-triton / patterns/go-native-ml-serving — canonical patterns Griffin 2.0 implements.