Skip to content

PATTERN Cited by 1 source

Sandboxed domain-specific expression language

Summary

When users need to inject logic into a shared process (workflow orchestrator, admission controller, policy engine, configuration evaluator), build a domain-specific subset of a familiar language, bound its runtime behaviour with structural limits (loop iterations, array size, memory), and run evaluation inside a platform sandbox that denies dangerous capabilities.

Canonical wiki instance: Netflix SEL, used inside Maestro for user-injected expressions in parameterized workflows (Source: sources/2024-07-22-netflix-maestro-netflixs-workflow-orchestrator).

Problem

Multi-tenant control-plane services often need to evaluate tenant-supplied logic inline:

  • A workflow orchestrator evaluates conditional-branch conditions, foreach ranges, signal-matcher predicates.
  • An admission controller evaluates per-resource validation rules.
  • A policy engine evaluates authorisation decisions against policy documents.
  • A configuration system evaluates template expressions to compute values at deploy time.

The obvious approach — embed a general-purpose interpreter (Groovy, Python, JavaScript) — fails three ways:

  1. Availability — an infinite loop in one tenant's expression can stall the shared process. An unbounded array allocation OOMs the whole server.
  2. Security — general interpreters expose reflection, filesystem access, class loading, arbitrary syscalls. Each is a potential escape to the host.
  3. Reasoning — general languages are hard to reason about statically; you can't tell from reading an expression whether it'll terminate, what resources it'll consume, or what state it'll touch.

Maestro's explicit enumeration of the threat:

"Users might unintentionally write an infinite loop that creates an array and appends items to it, eventually crashing the server with out-of-memory (OOM) issues." (Source: sources/2024-07-22-netflix-maestro-netflixs-workflow-orchestrator)

Solution

Three-layer defence:

Layer 1 — Language subset

Pick a familiar host language (JLS, Go, Python) and define a subset that excludes dangerous constructs (unbounded recursion, reflection, unrestricted loops) while retaining the expressive power needed for the domain. SEL is a JLS subset focused on Maestro parameter types + datetime + predefined utility methods.

Layer 2 — Runtime limits

Enforce bounds in the language runtime itself, not via caller checks:

  • Loop-iteration limit — caps the total iterations any single expression can execute.
  • Array-size limit — caps collection growth.
  • Object memory limit — caps total evaluation memory.

SEL quote: "additional runtime checks, such as loop iteration limits, array size checks, object memory size limits and so on, to enhance security and reliability."

Layer 3 — Platform sandbox

Even a structurally-safe language running on a general VM can escape via platform capabilities (reflection, class loading, FS, net). Run evaluation inside a capability-restricted sandbox:

  • Java Security Manager (SEL's choice).
  • Go runtime without unsafe / filesystem access (Rego, CEL).
  • V8 isolate without Node APIs (serverless JS runtimes).

"It leverages the Java Security Manager to restrict access, ensuring a secure and controlled environment for code execution." (Source: sources/2024-07-22-netflix-maestro-netflixs-workflow-orchestrator)

Tradeoffs

Axis Gain Cost
Safety Tenant expressions cannot crash the process Users can't use JLS features outside the subset
Auditability Expressions are statically analysable One-time language + interpreter build cost
Performance Bounded evaluation = predictable latency Harder to optimise than native code
Maintenance Subset is stable; no churn Maintainers must keep subset current with host-language evolution
Learning Familiar syntax; minimal friction Users occasionally hit the subset boundary and must refactor

Structural variants

Variant Host language Example
Orchestrator expressions JLS subset SEL
Admission controller rules Proto-lang subset Kubernetes CEL
Authorisation policies Datalog-inspired OpenPolicyAgent Rego
Template expressions Shell / JSON AWS CloudFormation intrinsics, Jinja-safe mode
Serverless plugins V8 isolate Cloudflare Workers, Fastly Compute

When not to use this pattern

  • Full tenant containers — if tenant logic runs in its own isolated container / VM, you don't need a safe DSL; the container IS the sandbox. Temporal's approach: tenant workflow code runs in the tenant's own worker process.
  • Compile-ahead static configuration — if tenant input can be compiled ahead of time into data (not code), you avoid dynamic evaluation entirely.
  • Small expression surface — if users only need arithmetic / boolean / comparison ops, a tiny custom grammar is simpler than language-subset engineering.

Example (SEL)

User-supplied parameter expression inside a Maestro workflow:

// Compute next partition to backfill from a signal's timestamp
partition = dateAdd(signal.processed_date, days=-1).format("yyyy-MM-dd")

SEL parses this, validates the syntax tree against the JLS subset (no reflection, bounded iteration), runs it inside the Java Security Manager sandbox, and returns partition as a Maestro parameter for downstream steps to consume.

Seen in

Last updated · 319 distilled / 1,201 read