Skip to content

CONCEPT Cited by 1 source

jsonschema config validation

Definition

jsonschema config validation is the practice of declaring a single JSON Schema as the authoritative description of a configuration file's shape, and running that schema against the file at every point it gets edited or loaded — the developer's IDE, a local pre-commit hook, the CI/CD pipeline, and optionally at application startup. One schema, multiple enforcement points, no drift between the shape reviewers expect and the shape the program accepts.

For YAML configs specifically, the de-facto implementation uses JSON Schema (YAML is a JSON superset) plus the Red Hat YAML Language Server to bring schema-driven autocomplete and validation into editors.

Zalando's instantiation

From the 2024-01 metadpata postmortem:

*"We have set up jsonschema validation for all our configuration files. All these checks run both locally (thanks to pre-commit hooks) and in the CI/CD pipelines. We also did some small quality of life improvements to enable autocompletion and schema validation in our local IDEs, which mitigates the possibility of typos and errors and is

```

yaml-language-server: $schema=schema/config_schema.json

(your config) ```"sources/2024-01-22-zalando-tale-of-metadpata-the-revenge-of-the-supertools

Three enforcement points with one schema:

  1. IDE# yaml-language-server: $schema=<path> at the top of the YAML file points the YAML Language Server at the schema. Autocomplete lists valid keys; unknown keys light up red.
  2. Pre-commit — a local hook runs the schema check against staged YAML; the commit is refused if it fails. (systems/pre-commit)
  3. CI/CD — the same schema check runs server-side in the pipeline; direct-to-main bypasses that skip the local hook are still caught.

Why it matters for metadpata-class bugs

The metadpata incident's proximate cause was a YAML field that didn't match any declared propertymetadpata is not in the schema, metadata is. A schema validator with additionalProperties: false at the relevant level would have flagged the file at every enforcement point:

  • IDE: red squiggle under metadpata as the developer types.
  • Pre-commit: git commit refused with "property metadpata not permitted".
  • CI: PR cannot merge because the pipeline fails.

Any one of the three would have prevented the incident. The stack exists because no single enforcement point is reliably on: IDE plugins are optional, pre-commit can be skipped with --no-verify, CI can be disabled on a Friday-night branch push. Three redundant enforcement points make it hard to slip a typo through all of them accidentally.

Why one schema (not three)

A common anti-pattern: separate validators at each enforcement point, written in different languages or frameworks, drifting out of sync. The failure mode: an edit that passes IDE and pre-commit but fails CI (or vice versa), prompting someone to disable a check "because the checks don't agree". Using JSON Schema — a portable, language-agnostic standard with implementations in every mainstream language — keeps the checks identical.

Prerequisites

  • A schema actually written. A validator with no schema only catches YAML syntax errors, not field-name typos.
  • additionalProperties: false at the relevant object levels, to catch unknown keys. Without it, typos add new silent fields.
  • Team convention to include the schema comment in every YAML file, or an editor setup that associates schemas to file globs.
  • Schema kept in the same repo so edits to the schema and the configs stay atomic.

Caveats

  • Schemas can't catch semantic bugs. A metadata object with an empty accounts: [] that gets interpreted as "all accounts" is a valid YAML that a schema can't reject — the supertool still needs to handle the empty-set case explicitly. Schema validation is necessary but not sufficient.
  • IDE validation is advisory only — developers can ignore red squiggles. Only pre-commit and CI are enforcement.
  • Schema can lag the code. If the code accepts a new field before the schema declares it, the schema will reject valid configs. Schema maintenance has to be part of the feature PR.

Seen in

Last updated · 501 distilled / 1,218 read