Skip to content

PATTERN Cited by 1 source

jsonschema-validated config at commit and CI

Problem

A YAML or JSON configuration file with a typoed field name or an unknown key often causes silent semantic bugs: the consuming application reads metadata as missing when the file has metadpata, and proceeds with some default — frequently "no entries" or "empty set", which for destructive code paths collapses into "every entry" (see supertool collapse-to-all).

A single enforcement point is insufficiently redundant:

  • IDE-only validation — developers without the plugin, or who ignore the red squiggle, bypass it entirely.
  • Pre-commit onlygit commit --no-verify bypasses it; direct pushes from web UIs bypass it.
  • CI only — PRs that are merged before CI runs, emergency hot-fix paths, or CI that's temporarily disabled bypass it.

Any one enforcement point can be bypassed; typos slip through.

Solution

Declare the config's shape once as a JSON Schema, then enforce the schema at three redundant points:

  1. Developer IDE — schema-driven autocompletion and live validation as the developer types. Catches typos at the earliest possible moment.
  2. Local pre-commit hook — refuses to produce a commit with invalid config. Catches what the IDE missed (or what developers dismissed).
  3. CI pipeline — the same schema check runs server-side on every push. Catches what pre-commit was bypassed on.

The load-bearing design choice is that all three enforcement points use the same JSON Schema file — portable, language-agnostic, implementations available in every mainstream language. No drift.

Zalando's canonical instance

From the 2024-01 metadpata postmortem:

"We have set up jsonschema validation for all our configuration files. All these checks run both locally (thanks to pre-commit hooks) and in the CI/CD pipelines. We also did some small quality of life improvements to enable autocompletion and schema validation in our local IDEs, which mitigates the possibility of typos and errors and is simple to set up:

```

yaml-language-server: $schema=schema/config_schema.json

(your config) ```"sources/2024-01-22-zalando-tale-of-metadpata-the-revenge-of-the-supertools

Three enforcement points, one schema (schema/config_schema.json):

  1. IDE — the # yaml-language-server: comment at the top of each YAML file points the Red Hat YAML Language Server at the schema; any LSP-aware editor (VSCode, Neovim with coc/lsp, IntelliJ, etc.) lights up with autocompletion and validation.
  2. Pre-commit — a pre-commit hook runs the schema check against staged YAML.
  3. CI/CD — the same check runs in the pipeline, uncircumventable by client-side --no-verify flags.

Mechanism

# Schema lives in the repo
schema/
    config_schema.json     # JSON Schema with
                           # additionalProperties: false
                           # at the right levels

# Every YAML config points at the schema
configs/
    account-a.yaml         # has "# yaml-language-server: $schema=..."
    account-b.yaml
    ...

# .pre-commit-config.yaml invokes a schema checker
-   repo: https://github.com/adrienverge/yamllint
    hooks: [yamllint]
-   repo: local
    hooks:
      - id: jsonschema-check
        name: validate configs against schema
        entry: python scripts/check_schema.py
        files: ^configs/.*\.yaml$

# CI runs the same command
# .github/workflows/validate.yml
-   name: validate configs
    run: python scripts/check_schema.py configs/

The same script runs locally (via pre-commit) and in CI. IDE validation is handled natively by the YAML Language Server via the schema comment.

Why it would have prevented metadpata

The metadpata typo failed at exactly the enforcement shape this pattern covers: a YAML key that doesn't match any declared property (additionalProperties: false would reject it). All three enforcement points would have caught it:

  • IDE: red squiggle under metadpata as soon as it's typed.
  • Pre-commit: git commit refused with "property metadpata not permitted".
  • CI: pipeline fails; PR cannot merge.

Zalando names this directly: the change "mitigates the possibility of typos and errors."

Prerequisites

  • Schema actually written and maintained. A validator with no schema catches only YAML syntax errors.
  • additionalProperties: false at the relevant object levels in the schema, to catch unknown keys. Without it, typos create new silent fields.
  • Every config file has the # yaml-language-server: $schema=… comment (team convention) or editor config maps schemas to file globs.
  • Schema kept in the same repo so edits to the schema and configs stay atomic.
  • Schema kept current with code. If the consumer accepts a field the schema doesn't declare, the schema blocks valid configs.

Caveats

  • Schemas catch shape, not semantics. A valid-but- dangerous config (e.g., empty accounts: [] that the supertool reads as "all accounts") is still syntactically valid. Must combine with defensive application logic and patterns/pr-preview-of-cloudformation-changeset for the "what would this actually do?" layer.
  • IDE enforcement is advisory. Developers can ignore red squiggles. Only pre-commit and CI are real enforcement.
  • Emergency bypass. Hot-fix paths that need to land without CI still can. Lower the bypass-bar only when strictly necessary.
  • Schema complexity can explode. Large schemas with conditional requirements (if/then, allOf/oneOf) become hard to read; developers stop trusting them.

Composes with

Seen in

Last updated · 501 distilled / 1,218 read