Skip to content

SYSTEM Cited by 4 sources

Tigris

Tigris (Tigris Data, Inc.) is a globally distributed, S3-compatible object storage platform integrated into Fly.io as the fly storage create primitive. Announced at public beta 2024-02-15 on the Fly.io blog. Architecturally, Tigris is not "S3 + CDN cache" — it is a regional-first distributed object store whose metadata and byte planes are both partitioned across the regions Fly.io runs in, with a queuing system propagating object data to where it is in demand.

(Source: sources/2024-02-15-flyio-globally-distributed-object-storage-with-tigris)

Architecture (as described by Fly.io)

Three layers, each a distinct system:

  1. Metadata plane — FoundationDB. "Tigris runs redundant FoundationDB clusters in our regions to track objects." FDB's strictly-serializable multi-key ACID transactions are the primitive for atomic object-state transitions (write-commits, delete-markers, version-bumps, region-membership) across the keyspace — the same shape that Datadog's Husky uses FDB for, at the fragment-metadata layer.

  2. Byte plane — Fly.io NVMe volumes. "They use Fly.io's NVMe volumes as a first level of cached raw byte store." Each Tigris-hosting Fly.io region caches object bytes on local NVMe; reads within the region short-circuit to local NVMe.

  3. Distribution plane — QuiCK-style queue. "A queuing system modelled on Apple's QuiCK paper to distribute object data to multiple replicas, to regions where the data is in demand, and to 3rd party object stores… like S3." This is where cross-region replication and archival-tier offload happen asynchronously from writes.

The overall shape is a named instance of metadata-DB + object- cache-tier: strongly-consistent metadata in one layer, eventually-replicated byte caches in another, and a pluggable origin/archival tier (S3 or similar) behind.

Replication policy

  • Objects <~128 KB: instantly global by default. Below the size threshold, bytes are proactively distributed across regions at write time. "Things are just snappy, all over the world, automatically."
  • Objects ≥ threshold: demand-driven replication. Larger objects propagate to a region when they are read there, via the QuiCK-style queue. See concepts/demand-driven-replication.
  • Pluggable archival tier. The same queue can propagate bytes out to "3rd party object stores… like S3" — i.e. S3 remains a viable origin for cold objects, while Tigris handles the geographic distribution layer in front.

Interface

S3-compatible API. "If your framework can talk to S3, it can use Tigris." Published Tigris SDK docs instruct customers to use the AWS libraries unchanged. fly storage create injects five env-var app secrets:

  • AWS_REGION
  • BUCKET_NAME
  • AWS_ENDPOINT_URL_S3
  • AWS_ACCESS_KEY_ID
  • AWS_SECRET_ACCESS_KEY

The AWS_ENDPOINT_URL_S3 override is what redirects AWS-SDK calls from S3's regional endpoints to Tigris's. No code changes to the application are required — a concrete instance of patterns/protocol-compatible-drop-in-proxy applied at the API-compat layer, and [[patterns/partner-managed-service-as- native-binding]] applied at the config-injection layer.

Positioning against alternatives

  • vs. single-region S3 + no CDN: globalized, no need to pick "one building in Ashburn".
  • vs. S3 + CloudFront caching CDN: Tigris is distributed from the start, not a cache in front of a single-write-region origin. Consistency guarantees
  • instant purge + relay regions.
  • vs. multi-region S3 buckets: application-transparent — no per-region bucket configuration or write-region routing in app code.
  • vs. building your own mini-CDN: Fly.io quotes this as the "week of tinkering" baseline; Tigris collapses it to single- digit-minute onboarding.

The article explicitly contrasts Tigris with CloudFront: CloudFront "will, at some variably metered expense, optimize the read side of a single-write-region bucket"; Tigris "is a toolset that you can use to build arbitrary CDNs, with consistency guarantees, instant purge and relay regions".

Integration with Fly.io

  • fly storage create CLI — one command to provision a bucket + wire credentials into the app.
  • Unified billing — Tigris usage is charged to the Fly.io bill alongside compute, block storage, databases, networking. See patterns/unified-billing-across-providers.
  • Runs on Fly.io's infrastructure — Tigris uses Fly.io NVMe volumes as its byte-cache tier and FoundationDB on Fly.io regions as its metadata store. Tigris is a third-party company, but physically co-resident with Fly.io compute for low-latency in-region reads.

Underlying Fly.io hosts run systems/firecracker-based micro- VMs ("we transmute containers into VMs, running them on our hardware around the world with the power of Firecracker alchemy") — but Firecracker is not part of the Tigris architecture per se, just Fly.io's compute substrate.

Why Fly.io partnered instead of built

Fly.io's stated rationale: "Compute and networking: those are things we love and understand. Object storage? We did not want to half-ass it. So we partnered with Tigris, so that they can put their full resources into making object storage as magical as Fly.io is." The post explicitly rejects the "vertically integrate all layers ourselves" strategy in favour of a Unix- philosophy partnership: "you have individual parts that do one thing very well that are then chained together to create a composite result."

Operational numbers

  • <128 KB object threshold for instantly-global replication.
  • Single-digit-minute setup time (fly storage create).
  • No latency, throughput, durability, consistency, replica count, region count, or cost disclosures in this piece.

Caveats

  • Third-party product. Tigris Data, Inc. is the builder; Fly.io is the integration point. Architectural statements here come from Fly.io's description, not a Tigris engineering deep-dive.
  • Minimal architectural disclosure. The three-layer (FoundationDB + NVMe + QuiCK-queue) description is one sentence of the piece; specifics on how each layer is configured, replicated, or sized are not public in this source.
  • Consistency guarantees unstated. The piece mentions "consistency guarantees" as a Tigris property but doesn't specify what model (strong / bounded-staleness / eventual per-key / read-your-writes cross-region).
  • Durability class unstated.
  • No incident or production-numbers retrospective. This is a public-beta announcement; operational character at scale is not documented here.

Seen in

  • sources/2024-02-15-flyio-globally-distributed-object-storage-with-tigris — the partnership / public-beta announcement on the Fly.io blog. Canonical wiki entry for Tigris's architecture (such as is publicly disclosed).
  • sources/2024-08-15-flyio-were-cutting-l40s-prices-in-halfGPU-inference-locality instance. Fly.io names Tigris as the object-storage axis of its inference compute-storage-network locality thesis: combined with GPU Fly Machines (L40S at $1.25/hr) and the Anycast edge, "model parameters, data sets, and compute are all close together … on VM instances that have enough memory to actually run real frameworks on … priced like we actually want you to use it." Tigris is the load-bearing storage-side component that lets inference GPUs hydrate model weights from regional-local bytes instead of cross-cloud S3-behind-CloudFront. Canonical paired instance with patterns/co-located-inference-gpu-and-object-storage.
  • sources/2025-02-26-flyio-taming-a-voracious-rust-proxyTigris-as-incident-trigger entry. Tigris Data was running a load test whose connection pattern ("tens of thousands of connections" sending small HTTP bodies and terminating early) exposed a close_notify with buffered trailer bug in tokio-rustls TlsStream inside fly-proxy — CPU-pegging busy-loop on two IAD edges. Tigris is explicitly "not the cause, the trigger": Fly.io asked Tigris to pause the load test while investigating, then resumed "no spin-outs" after the upstream rustls fix deployed. Illustrates how partner traffic patterns make rare state-machine edge cases reproducible at platform scale.
  • sources/2025-05-20-flyio-litestream-revampedCASAAS enabler entry. Tigris is named alongside S3 as an object store supporting the conditional writes primitive that revamped Litestream uses to implement its single-writer time-based lease. Direct quote: "Modern object stores like S3 and Tigris solve this problem for us: they now offer conditional write support. With conditional writes, we can implement a time-based lease." Canonical wiki disclosure of Tigris supporting the conditional-write semantics, and the first wiki instance of Tigris as coordination substrate rather than just byte store — the 2025-05-20 post is also the structural reason tkdb-style Fly.io deployments can now drop Consul (LiteFS's original primary-election dependency) for Litestream-shipped workloads.
Last updated · 200 distilled / 1,178 read