Skip to content

CONCEPT Cited by 1 source

Cookbook artifact versioning

Definition

Cookbook artifact versioning is the Chef-ecosystem unit of configuration-management rollout: a pinned-version cookbook artifact (typically a .tar.gz of the cookbook tree + its dependency lock) is promoted per-environment. Rollbacks are re-promotions of the previous version. Changes to the cookbook are additive versions, not in-place edits.

Canonical wiki instance: Slack's 2025-10-23 Chef phase-2 design (Source: sources/2025-10-23-slack-advancing-our-chef-infrastructure-safety-without-disruption).

The promotion unit

  cookbook edit → CI build → new version (e.g., 20250728.1753666491.0)
                              [Chef Librarian](<../systems/chef-librarian.md>)
                  ┌─────────────────────┼─────────────────────┐
                  │                     │                     │
             promote to             promote to          promote to
              sandbox                 dev                 prod-1
             (hourly)              (hourly)             (:30 past hour)

Each promotion is atomic per environment: the environment's version pin flips from the previous version to the new one. Every node mapped to that environment will, at its next chef-client run, resolve the pin and pull the new version from the Chef server.

The manifest record (as seen in Slack's S3 signal payload)

Slack's signal schema exposes the full manifest for operators and consumer agents (example from the post):

{
  "Splay": 15,
  "Timestamp": "2025-07-28T02:02:31.054989714Z",
  "ManifestRecord": {
    "version": "20250728.1753666491.0",
    "chef_shard": "basalt",
    "datetime": 1753666611,
    "latest_commit_hash": "XXXXXXXXXXXXXX",
    "manifest_content": {
      "base_version": "20250728.1753666491.0",
      "latest_commit_hash": "XXXXXXXXXXXXXX",
      "author": "Archie Gunasekara <agunasekara@slack-corp.com>",
      "cookbook_versions": {
        "apt": "7.5.23",
        "aws": "9.2.1"
      },
      "site_cookbook_versions": {
        "apache2": "20250728.1753666491.0",
        "squid": "20250728.1753666491.0"
      }
    },
    "s3_bucket": "BUCKET_NAME",
    "s3_key": "20250728.1753666491.0.tar.gz",
    "ttl": 1756085811,
    "upload_complete": true
  }
}

Observations:

  • The version string is a datetime + commit-hash hybrid — not a semver. Chef cookbook versions are traditionally semver, but Slack's automation encodes the build timestamp + Unix epoch in the primary version (20250728.1753666491.0 = date + Unix time of build + patch). This is a monotonic-per- commit numbering scheme for fully-automated rollout.
  • Cookbook versions are two-tier: upstream community cookbooks (apt, aws) retain their public semver; Slack's own "site cookbooks" (apache2, squid) share the automated timestamp version.
  • s3_bucket + s3_key separate the signal from the artifact. The artifact is uploaded to S3 first, then the signal is written with upload_complete: true as the ordering barrier.
  • ttl permits automatic expiry of stale signals; a consumer that sees a signal past its TTL should ignore it.

Why versioning is the unit of rollout

  • Atomic flip. Promoting environment A from version V1 to V2 is a single pin update, not an N-file edit. Either the whole environment sees V2 or it doesn't.
  • Rollback is trivial. Revert the pin to V1; next chef-client run applies the old version. No git revert
  • CI rebuild + re-promote chain.
  • Testing in non-prod makes sense. You can promote V2 to dev, verify, then promote the same artifact to prod. What dev ran is bit-identical to what prod runs.
  • Release trains work. A version-as-unit enables patterns/release-train-rollout-with-canary — V2 advances through prod-1prod-6 in a known sequence.

Contrasts

  • Vs. container image version. Container images are also versioned artifacts pinned per environment; the shape is functionally equivalent. The difference is altitude: cookbook artifacts describe host-level configuration (packages, services, files, users); container images describe workload-level configuration (process tree inside a namespace).
  • Vs. ECS task-definition revision. Task-defs are pinned per-service; they map N:1 to services (where cookbook artifacts map N:N to services via roles).
  • Vs. Lambda version alias. Lambda has immutable numeric versions with mutable aliases pointing at them — a similar "pin via name, revert via re-alias" shape.
  • Vs. Chef Policyfiles. Policyfiles also produce immutable artifacts (.lock.json) that pin per-environment, but they merge roles + environments + run-lists into one artifact. Slack rejected Policyfiles on migration-cost grounds; see systems/chef-policyfiles.

Caveats

  • Versioning discipline depends on the build pipeline. If the cookbook CI / build process doesn't produce reproducible artifacts, version pinning is just a name — the actual apply can diverge per-node. Slack's timestamp-encoded version numbering is a strong signal that reproducible builds are assumed.
  • Rollback has limits. Re-promoting V1 reverts the version pin, but any V2-side-effects (e.g., a V2 recipe that ran rm -rf) may not be reversed by re-applying V1. Cookbook idempotency discipline matters.
  • Slack's post discloses the manifest schema, not the build process. How cookbooks become artifacts — CI, build tool, dependency resolver — is not covered.

Seen in

Last updated · 470 distilled / 1,213 read