PATTERN Cited by 2 sources

Co-located inference GPU and object storage¶

Pattern¶

Run inference GPUs and the object storage that holds their model weights / datasets on the same platform, in the same regions, over the platform's internal fabric — rather than inference on one cloud (e.g. a specialist GPU provider) and object storage on another (e.g. S3 behind CloudFront).

The pattern eliminates two cost + latency boundaries at once:

GPU-instance surcharges on hyperscalers that run GPU VMs but don't specialise in them.
Egress fees paid every time an inference job pulls model weights / datasets cross-cloud.

Combined with Anycast ingress, all three locality axes — GPU + object storage + end-user network — collapse onto one platform.

When to use¶

Transaction-shaped inference (per-HTTP-request model invocations). Sensitivity to WAN round-trips + egress per request is high. (inference-vs-training workload shape)
Globally-distributed inference where requests arrive from every geography and the objective is to land the request, the model weights, and the answer all inside one region.
Cost-sensitive inference workloads where hyperscaler GPU + egress combined price exceeds the platform's flat rate.

When not to use¶

Training workloads. Batch-shape; egress + WAN round-trips amortise over the job. Better served by hyperscaler GPU capacity
cross-cloud dataset replication (see patterns/cross-cloud-replica-cache).
Ultra-large-model inference requiring H100 SXM fleets / NVLink-ganged topologies the platform doesn't provide.
Workloads that require the hyperscaler's adjacent managed services (e.g. AWS Bedrock, proprietary model APIs) — moving compute away forces an identity / network boundary recrossing.

Canonical instance: Fly.io × Tigris¶

Fly.io explicitly describes this as "pretty killer":

GPU compute: A10 / L40S / A100 / H100 attached to Fly Machines via whole-GPU passthrough.
Object storage: Tigris, the Fly.io-native S3-compatible object store whose byte plane is cached on the same Fly.io NVMe volumes that GPU Machines run on.
Network: Fly.io's Anycast edge.
Billing: one invoice; no per-request egress line item at steady state.

Fly.io's 2024-08-15 L40S price cut to $1.25/hr is the market move that makes this pattern economically the default on the platform, not a power-user configuration. (Source: sources/2024-08-15-flyio-were-cutting-l40s-prices-in-half)

Structural parts¶

GPU compute primitive — Fly Machine. Whole-GPU passthrough per instance (fractional-GPU path via MIG / vGPU was tried and abandoned on Fly.io — the whole-GPU path is what works.)
Object storage primitive — Tigris. Regional-first; metadata on FoundationDB; byte cache on Fly.io NVMe; demand-driven replication via a QuiCK-style queue; pluggable S3 origin.
Network primitive — Anycast at the Fly edge; private WireGuard mesh (6PN) between Machines.
Integration glue — fly storage create as a single command that injects S3-compat secrets so application code speaks to Tigris identically to how it would speak to S3.

Trade-offs vs separation¶

Axis	Co-location (this pattern)	Separation (hyperscaler GPU + remote object store)
Per-request cost	Flat GPU-hour	GPU-hour + per-GB egress
Cold-start weight fetch	Regional NVMe read	Cross-cloud fetch
Failure-isolation blast radius	One platform	Two platforms
Supply flexibility	Bounded to the platform's GPU fleet	Any hyperscaler's inventory
Managed-service access	Platform-local only	Full hyperscaler catalogue

Known uses¶

Fly.io × Tigris — the canonical wiki instance. Anchored by the 2024-02-15 Tigris public-beta post and the 2024-08-15 L40S price-cut post.
Shape-adjacent: hyperscaler bundles such as AWS EC2 G-instances
S3 same-region + CloudFront; Azure AKS + Blob + Front Door; GCP GKE + GCS + Cloud CDN. These realize the same pattern in principle but bill the three axes separately and do not publish an "inference locality" narrative around them.

concepts/inference-vs-training-workload-shape — why the co-location matters.
concepts/inference-compute-storage-network-locality — the concept this pattern instantiates.
concepts/anycast / concepts/egress-cost — the network / cost axes.
patterns/metadata-db-plus-object-cache-tier — Tigris's internal shape.
patterns/partner-managed-service-as-native-binding — how Tigris is surfaced as a first-party Fly.io primitive.