PATTERN Cited by 1 source

Sandboxed domain-specific expression language¶

Summary¶

When users need to inject logic into a shared process (workflow orchestrator, admission controller, policy engine, configuration evaluator), build a domain-specific subset of a familiar language, bound its runtime behaviour with structural limits (loop iterations, array size, memory), and run evaluation inside a platform sandbox that denies dangerous capabilities.

Canonical wiki instance: Netflix SEL, used inside Maestro for user-injected expressions in parameterized workflows (Source: sources/2024-07-22-netflix-maestro-netflixs-workflow-orchestrator).

Problem¶

Multi-tenant control-plane services often need to evaluate tenant-supplied logic inline:

A workflow orchestrator evaluates conditional-branch conditions, foreach ranges, signal-matcher predicates.
An admission controller evaluates per-resource validation rules.
A policy engine evaluates authorisation decisions against policy documents.
A configuration system evaluates template expressions to compute values at deploy time.

The obvious approach — embed a general-purpose interpreter (Groovy, Python, JavaScript) — fails three ways:

Availability — an infinite loop in one tenant's expression can stall the shared process. An unbounded array allocation OOMs the whole server.
Security — general interpreters expose reflection, filesystem access, class loading, arbitrary syscalls. Each is a potential escape to the host.
Reasoning — general languages are hard to reason about statically; you can't tell from reading an expression whether it'll terminate, what resources it'll consume, or what state it'll touch.

Maestro's explicit enumeration of the threat:

"Users might unintentionally write an infinite loop that creates an array and appends items to it, eventually crashing the server with out-of-memory (OOM) issues." (Source: sources/2024-07-22-netflix-maestro-netflixs-workflow-orchestrator)

Solution¶

Three-layer defence:

Layer 1 — Language subset¶

Pick a familiar host language (JLS, Go, Python) and define a subset that excludes dangerous constructs (unbounded recursion, reflection, unrestricted loops) while retaining the expressive power needed for the domain. SEL is a JLS subset focused on Maestro parameter types + datetime + predefined utility methods.

Layer 2 — Runtime limits¶

Enforce bounds in the language runtime itself, not via caller checks:

Loop-iteration limit — caps the total iterations any single expression can execute.
Array-size limit — caps collection growth.
Object memory limit — caps total evaluation memory.

SEL quote: "additional runtime checks, such as loop iteration limits, array size checks, object memory size limits and so on, to enhance security and reliability."

Layer 3 — Platform sandbox¶

Even a structurally-safe language running on a general VM can escape via platform capabilities (reflection, class loading, FS, net). Run evaluation inside a capability-restricted sandbox:

Java Security Manager (SEL's choice).
Go runtime without unsafe / filesystem access (Rego, CEL).
V8 isolate without Node APIs (serverless JS runtimes).

"It leverages the Java Security Manager to restrict access, ensuring a secure and controlled environment for code execution." (Source: sources/2024-07-22-netflix-maestro-netflixs-workflow-orchestrator)

Tradeoffs¶

Axis	Gain	Cost
Safety	Tenant expressions cannot crash the process	Users can't use JLS features outside the subset
Auditability	Expressions are statically analysable	One-time language + interpreter build cost
Performance	Bounded evaluation = predictable latency	Harder to optimise than native code
Maintenance	Subset is stable; no churn	Maintainers must keep subset current with host-language evolution
Learning	Familiar syntax; minimal friction	Users occasionally hit the subset boundary and must refactor

Structural variants¶

Variant	Host language	Example
Orchestrator expressions	JLS subset	SEL
Admission controller rules	Proto-lang subset	Kubernetes CEL
Authorisation policies	Datalog-inspired	OpenPolicyAgent Rego
Template expressions	Shell / JSON	AWS CloudFormation intrinsics, Jinja-safe mode
Serverless plugins	V8 isolate	Cloudflare Workers, Fastly Compute

When not to use this pattern¶

Full tenant containers — if tenant logic runs in its own isolated container / VM, you don't need a safe DSL; the container IS the sandbox. Temporal's approach: tenant workflow code runs in the tenant's own worker process.
Compile-ahead static configuration — if tenant input can be compiled ahead of time into data (not code), you avoid dynamic evaluation entirely.
Small expression surface — if users only need arithmetic / boolean / comparison ops, a tiny custom grammar is simpler than language-subset engineering.

Example (SEL)¶

User-supplied parameter expression inside a Maestro workflow:

// Compute next partition to backfill from a signal's timestamp
partition = dateAdd(signal.processed_date, days=-1).format("yyyy-MM-dd")

SEL parses this, validates the syntax tree against the JLS subset (no reflection, bounded iteration), runs it inside the Java Security Manager sandbox, and returns partition as a Maestro parameter for downstream steps to consume.

Seen in¶

sources/2024-07-22-netflix-maestro-netflixs-workflow-orchestrator — SEL as canonical instance