Skip to content

PATTERN Cited by 1 source

Runtime-fetched credentials and config

Problem

The target platform's API for injecting credentials, environment variables, config, or hyperparameters into a container cannot match what the source platform provided:

  • No webhook-based injection — e.g. K8s has webhooks that mutate pods at creation to inject secrets; SageMaker has no equivalent.
  • API size limits — e.g. SageMaker's API caps the number of env vars and the size of hyperparameter payloads; the caps are too small for production use cases that passed megabytes of params via K8s ConfigMaps.
  • No sidecars — e.g. metric aggregation previously provided by a sidecar container; the target platform doesn't support sidecars.
  • No ConfigMap mount — K8s-specific primitive; target platforms don't universally implement it.

Under a zero-code-change migration, user code must not notice. The platform has to re-synthesise these primitives somewhere.

Pattern

Fetch at container entrypoint. The compatibility layer in the base image's entrypoint script reaches out at startup to fetch:

  • Credentials from a secret store (e.g. Confidant, AWS Secrets Manager, Vault) and writes them into the container environment in the exact shape the source platform injected (env vars / files / directories).
  • Environment variables that exceed the target API's cap — fetch from a side channel (S3 object, config service API, KV store) and export them before execing user code.
  • Hyperparameters or other large config payloads — upload to S3 (or equivalent) pre-job and have the substrate download them to a standard input path; or fetch directly from the entrypoint if the substrate supports it.
  • Metric / logging transport — reconfigure the in-process client to connect to an aggregation gateway instead of a sidecar.

The result: user code sees credentials / env / hyperparams in their expected places, not knowing that the platform had to fetch them at startup rather than having them pre-injected.

Lyft / LyftLearn 2.0 instance

All four sub-patterns appear in Lyft's K8s → SageMaker migration (Source: sources/2025-11-18-lyft-lyftlearn-evolution-rethinking-ml-platform-architecture):

  • Credentials — K8s previously injected Confidant creds via webhooks. SageMaker has no webhook primitive. The entrypoint now fetches Confidant creds at startup and exposes them identically to user code.
  • Environment variables — SageMaker caps the number of env vars passable via its API; Lyft moved "most environment setup to runtime, fetching additional configuration at job startup."
  • Hyperparameters — SageMaker's API cap on direct parameter passing is too small. Lyft uploads hyperparameters to S3 before each job and has SageMaker automatically download them to its standard input path — sidestepping the API limit while using SageMaker's native mechanism.
  • Metrics transport — K8s StatsD clients sent to sidecar containers. SageMaker has no sidecar support. Lyft reconfigured the runtime + networking to connect the StatsD client directly to the metrics aggregation gateway; the user-facing API was unchanged.

Trade-offs

  • + Parity achieved with no user-code changes.
  • + Single choke point — all compatibility concerns live in the entrypoint layer.
  • − Startup latency — every fetch adds to cold start. Must be weighed against warm pools and lazy image loading budgets.
  • − Dependency fan-out at startup — entrypoint now depends on Confidant, S3, config service; outages in any of them block job starts. Mitigate with retries + fallbacks + aggressive timeouts.

Seen in

Last updated · 517 distilled / 1,221 read