Skip to content

FLYIO 2024-06-19

Read original ↗

Fly.io — AWS without Access Keys

Summary

Fly.io's 2024-06-19 post (oidc-cloud-roles) on giving a Fly Machine access to an AWS S3 bucket without ever minting an AWS keypair. Fly.io runs an OIDC identity provider at oidc.fly.io that issues short-lived OIDC tokens bound to a specific Fly Machine (org + app + machine name in the sub claim). AWS IAM is configured to trust that IdP via an Identity Provider resource; an IAM Role with an AssumeRoleWithWebIdentity trust policy lists oidc.fly.io/<org> as Federated principal and constrains aud / sub matches. The developer sets a single environment variable — AWS_ROLE_ARN — on their Fly App. Fly's init binary detects it, fetches an OIDC token via the Machines API (over a Unix-socket proxy bound to that specific Machine), writes it to /.fly/oidc_token, and exports AWS_WEB_IDENTITY_TOKEN_FILE + AWS_ROLE_SESSION_NAME into every child process. The AWS SDK's standard credential provider chain picks this up, calls AssumeRoleWithWebIdentity against STS, receives short-lived STS credentials, and uses them to talk to S3. No AWS keypair is stored anywhere — not in fly secrets, not in the container, not in the user's shell.

Key takeaways

  1. ARNs are not credentials; the trust relationship is the credential. "Here, we're not setting any secrets at all; we're just adding an ARN — which is not a credential — to the Machine. […] That's the 'secret credential': the pre-configured trust relationship in IAM, and the public keypairs it manages." The ARN is a public identifier; what makes it powerful is the AssumeRoleWithWebIdentity trust policy that binds it to a specific OIDC issuer + audience + subject prefix. Canonical instance of workload identity — the identity is the Machine's platform-attested identity, not a user-managed secret. (Source: sources/2024-06-19-flyio-aws-without-access-keys)

  2. The OIDC token's sub claim is shape org:app:machine, and AWS trust policies can constrain it by prefix. Fly.io's OIDC tokens always carry a subject of the form <org_name>:<app_name>:<machine_name>. The AWS trust policy example uses StringLike on oidc.fly.io/example:sub with value "example:weather-cat:*" — scoping the Role to any Machine of the weather-cat app in the example org. Lock to a single Machine by dropping the *. Canonical sub-field-scoped role trust instance. (Source: sources/2024-06-19-flyio-aws-without-access-keys)

  3. STS credentials' value is their short lifetime, their rotation behaviour, and their dynamic provenance. "AWS STS credentials are short-lived. Because they're generated dynamically, rather than stored in a configuration file or environment variable, they're already a little bit annoying for an attacker to recover. But they're also dead in minutes. They have a sharply limited blast radius. They rotate themselves, and fail closed." Canonical short-lived-credential auth framing. Complements ephemeral credentials at the cross-cloud-federation layer. (Source: sources/2024-06-19-flyio-aws-without-access-keys)

  4. The init binary is the credential broker. "If, when init starts in a Fly Machine, it sees an AWS_ROLE_ARN environment variable set, it initiates a little dance; it: goes off and generates an OIDC token … saves that OIDC token in a file … and sets the AWS_WEB_IDENTITY_TOKEN_FILE and AWS_ROLE_SESSION_NAME environment variables for every process it launches. The AWS SDK, linked to your application, does all the rest." Canonical init-as-credential-broker instance — the guest-side component that closes the loop between platform-managed identity and the SDK's credential-provider chain without a code change in the application. (Source: sources/2024-06-19-flyio-aws-without-access-keys)

  5. The token-source Unix socket is privilege-attenuated via a Macaroon the guest never sees. "Every time we boot a Fly Machine, we pass it a Macaroon token locked to that particular Machine; init's server for /.fly/api is a proxy that attaches that token to requests." Two layers of protection: (a) the platform won't honour the Macaroon unless it comes from that specific Machine (flyd knows who it's talking to), and (b) "ordinary code running in a Fly Machine never gets a copy of the token to begin with" — the Macaroon is held by init, not the application. "You could rig up a local privilege escalation vulnerability and work out how to steal the Macaroon, but you can't exfiltrate it productively." Canonical machine-metadata-service shape on Fly.io — Fly's self-described answer to "the EC2 Instance Metadata Service", but reached via Unix socket (/.fly/api) not an HTTP link-local address. (Source: sources/2024-06-19-flyio-aws-without-access-keys)

  6. Twelve-step happy path is fully disclosed. The full chain: (1) init detects AWS_ROLE_ARN; (2) init hits /v1/tokens/oidc via /.api/proxy; (3) init writes the token to /.fly/oidc_token; (4) init sets AWS_WEB_IDENTITY_TOKEN_FILE + AWS_ROLE_SESSION_NAME; (5) entrypoint boots, (6) app runs e.g. aws s3 get-object, (7) AWS SDK walks the credential provider chain, (8) sees AWS_WEB_IDENTITY_TOKEN_FILE and calls AssumeRoleWithWebIdentity with the file's contents, (9) AWS STS verifies the token against https://oidc.fly.io/<org>/.well-known/openid-configuration (isolated-hardware-managed keys), (10) STS vends the Role's credentials, (11) SDK uses them to call S3, (12) IAM evaluates the Role's attached policy and returns the object. This is the canonical walkthrough for what OIDC role assumption across clouds actually looks like end-to-end. (Source: sources/2024-06-19-flyio-aws-without-access-keys)

  7. Pattern generalises beyond S3 and beyond AWS. "In case it's not obvious: this pattern works for any AWS API, not just S3. Our OIDC support on the platform and in Fly Machines will set arbitrary OIDC audience strings, so you can use it to authenticate to any OIDC-compliant cloud provider." GCP and Azure aren't yet covered by the init-binary environment-variable sugar ("those features are easy, and we're just waiting for people to tell us what they need") but the OIDC IdP + token endpoint are general. (Source: sources/2024-06-19-flyio-aws-without-access-keys)

  8. Fly.io's self-assessed position: OIDC tokens asymptotically approach Macaroons on security but don't equal them. "For us, the gold standard for least-privilege, conditional access tokens remains Macaroons, and it's unlikely that we're going to do a bunch of internal stuff using OIDC. We even snuck Macaroons into this feature." OIDC is pitched as a significant improvement over "hardcoded user credentials" (AWS access keys) that closes "a lot of the gap between hardcoded user credentials and Macaroons" — not as a replacement for Macaroons. Honest positioning. (Source: sources/2024-06-19-flyio-aws-without-access-keys)

OIDC token shape

Real token payload disclosed in the post:

{
  "app_id": "3671581",
  "app_name": "weather-cat",
  "aud": "sts.amazonaws.com",
  "image": "image:latest",
  "image_digest": "sha256:dff79c6da8dd4e282ecc6c57052f7cfbd684039b652f481ca2e3324a413ee43f",
  "iss": "https://oidc.fly.io/example",
  "machine_id": "3d8d377ce9e398",
  "machine_name": "ancient-snow-4824",
  "machine_version": "01HZJXGTQ084DX0G0V92QH3XW4",
  "org_id": "29873298",
  "org_name": "example",
  "region": "yyz",
  "sub": "example:weather-cat:ancient-snow-4824"
}

Notable fields:

  • iss is per-org — https://oidc.fly.io/<org_name> — so AWS IAM Identity Providers are scoped to a Fly org.
  • aud is set by the caller (AWS SDK requests sts.amazonaws.com as audience); the trust policy enforces StringEquals on the aud claim.
  • sub is org:app:machine — the identity-scoping pivot for trust policies.
  • image + image_digest + machine_version give the trust policy further hooks (not demonstrated in the post, but the claims are there for future policy shape).
  • region is the Fly region where the Machine runs.

AWS trust policy shape

Canonical trust-policy snippet disclosed:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "Federated": "arn:aws:iam::123456123456:oidc-provider/oidc.fly.io/example"
            },
            "Action": "sts:AssumeRoleWithWebIdentity",
            "Condition": {
              "StringEquals": {
                "oidc.fly.io/example:aud": "sts.amazonaws.com"
              },
              "StringLike": {
                "oidc.fly.io/example:sub": "example:weather-cat:*"
              }
            }
        }
    ]
}
  • Principal.Federated is the ARN AWS produced when the org registered oidc.fly.io/example as an Identity Provider.
  • aud must equal sts.amazonaws.com — guarantees STS only honours tokens Fly deliberately issued for STS.
  • sub lock-down is the per-org/per-app/per-machine scoping lever.

Numbers disclosed

  • STS credential lifetime: "dead in minutes" — no specific number, but the AWS default of 1 hour (and DurationSeconds minimum of 15 minutes) is standard.
  • OIDC token issuance: one token per Machine boot, generated via a single /v1/tokens/oidc call.

Numbers not disclosed

  • No QPS / fleet-scale numbers for OIDC token issuance across Fly.io.
  • No oidc.fly.io availability / latency SLO.
  • No refresh cadence for the OIDC token inside a long-running Machine (the post describes init generating one at boot; the AWS SDK then drives STS refresh with its own cadence, but the OIDC token's own lifetime + refresh policy isn't specified).
  • No kernel / OS-level details on how the /.fly/api Unix socket is mounted into the guest.

Caveats

  • The post is written in Fly.io's marketing-adjacent voice ("wash your neck and wait for the blade") — the security framing is accurate but the level of protocol detail is one step below a formal threat-model document. Key facts (IdP issuer URI scheme, trust-policy shape, OIDC token claim names, init behaviour) are disclosed verbatim; operational edge cases (token refresh, failure modes when OIDC IdP is unreachable, behaviour of a stale token on an STS refresh) are not.
  • No formal comparison of OIDC federation vs SPIFFE/SPIRE or Kubernetes service-account token projection — both of which are architecturally similar (platform-issued JWTs + cloud trust policies). The post's one comparison is OIDC vs AWS IAM static access keys (the predecessor pattern) and OIDC vs Macaroons (Fly.io's internal token format).
  • "Easier to manage than IAM service accounts" is a developer-UX claim — no data, just the assertion that the resulting ARN "isn't even a secret; your cloud team could just email or Slack message it back to you."

Relationship to existing wiki

Source

Last updated · 200 distilled / 1,178 read