SYSTEM Cited by 1 source
Instacart Griffin 2.0¶
Definition¶
Griffin 2.0 is Instacart's machine-learning serving platform — the substrate that hosts production ML models including the generative ads retrieval model's GPU serving stack. Disclosed in Source: sources/2026-06-02-instacart-from-scoring-to-spelling-rebuilding-ads-retrieval-at-instacart as the integration target for the new GPU serving stack:
"It is fully integrated with Griffin 2.0, Instacart's next-gen ML platform, streamlining deployment and maintenance within our ecosystem."
Griffin 2.0 hosts Go-native services with embedded GPU inference backed by TensorRT-LLM + NVIDIA Triton Inference Server for autoregressive-decoding workloads, replacing the legacy "Python and CPU inference" shape it succeeds.
What we know¶
The 2026-06-02 post is the first wiki disclosure naming Griffin 2.0; architectural detail is shallow:
- Hosts production ML services — generative ads retrieval is "fully integrated with Griffin 2.0."
- Streamlines deployment + maintenance — referenced as the value the platform provides (vs. a bespoke serving deployment per model).
- Supports Go-native services — the new GPU-stack serving shell is described as "Implemented as a Go-native service" delivering "higher throughput and lower latency compared to the legacy Python environment" — which signals Griffin 2.0 supports non-Python service shells around GPU inference engines.
- Implicit "1.0" predecessor — the "2.0" naming implies a prior generation focused on Python+CPU serving (consistent with the legacy stack the new GPU stack replaces).
Companion serving substrate¶
The 2026-06 post explicitly stacks Griffin 2.0 with three other systems for generative ads retrieval:
- TensorRT-LLM — NVIDIA's high-performance LLM inference compiler.
- NVIDIA Triton Inference Server — NVIDIA's serving runtime above TensorRT-LLM.
- Go-native service shell — the request-handling layer above Triton.
Together these are canonicalised as the patterns/gpu-serving-stack-tensorrt-llm-triton pattern.
Caveats¶
- Architecture, runtime, scheduling internals not disclosed.
- Coverage across model types (LLM vs classical ranker vs feature store) not disclosed.
- Multi-tenancy / capacity-allocation / autoscaling internals not disclosed.
- Predecessor "Griffin 1.0" not linked or characterised in the source.
- Prior Instacart write-up "Introducing Griffin 2.0: Instacart's Next-gen ML Platform" is linked from the source post but not yet ingested on the wiki — full architectural disclosure deferred.
Seen in¶
- sources/2026-06-02-instacart-from-scoring-to-spelling-rebuilding-ads-retrieval-at-instacart — first canonical wiki mention; ML platform host for the new GPU serving stack.
Related¶
- systems/instacart-generative-ads-retrieval — first wiki- disclosed Griffin 2.0 production tenant.
- systems/tensorrt-llm / systems/nvidia-triton-inference-server — the GPU inference substrate Griffin 2.0 hosts.
- patterns/gpu-serving-stack-tensorrt-llm-triton / patterns/go-native-ml-serving — canonical patterns Griffin 2.0 implements.