Yelp — Exploring CHAOS: Building a Backend for Server-Driven UI¶
Summary¶
Yelp Engineering post (2025-07-08) that unpacks the backend of
CHAOS, Yelp's internal SDUI
framework. A companion to their earlier 2024-03 post
introducing CHAOS; this post is the first-party disclosure of
how the CHAOS backend actually constructs the per-request
CHAOS SDUI Configuration (views + layout + components +
actions) that clients render. Three axes of architecture are
disclosed: (1) the GraphQL surface — an
Apollo Federation subgraph
implemented in Python via
Strawberry, fronting multiple per-team REST backends that all
implement the same CHAOS REST API; (2) the per-request build
pipeline — ChaosConfigBuilder → ViewBuilder → LayoutBuilder
→ FeatureProvider — with a deliberately two-loop parallel
async load/resolve structure for latency; (3) advanced
primitives — View Flows (preloaded subsequent views linked
by an OpenSubsequentView action) and View Placeholders
(nested CHAOS views loaded after the parent renders). The post
also documents concrete discipline: element content is carried
as JSON strings under a stable GraphQL schema, client
capability is matched at Register time (platform × required
components/actions × presenter handler), and every
FeatureProvider is wrapped in an error-isolation decorator
so a single feature failure drops that feature rather than the
whole view (unless the feature is flagged essential).
Key takeaways¶
-
CHAOS is a GraphQL surface over many REST backends, not a monolith. Verbatim: "we support multiple CHAOS backends that implement a CHAOS REST API to serve CHAOS content in the form of CHAOS Configurations. This architecture allows different teams to manage their CHAOS content independently on their own services, while the GraphQL layer provides a unified interface for client requests." One federated subgraph; many REST-API-conforming backends behind it. Canonical instance of patterns/federated-graphql-subgraph-per-domain with a per-domain REST API contract that lets teams keep their own services. (Source: sources/2025-07-08-yelp-exploring-chaos-building-a-backend-for-server-driven-ui)
-
Apollo Federation + Strawberry (Python) is the concrete GraphQL stack. Verbatim: "At Yelp, we have adopted Apollo Federation for our GraphQL architecture, utilizing Strawberry for federated Python subgraphs to leverage type-safe schema definitions and Python's type hints. The CHAOS-specific GraphQL schema resides in its own CHAOS Subgraph, hosted by a Python service." The CHAOS API authenticates requests at the supergraph layer and routes them to the relevant CHAOS backend service where "most of the build logic is handled." (Source: sources/2025-07-08-yelp-exploring-chaos-building-a-backend-for-server-driven-ui)
-
Element content is JSON strings inside a stable schema, on purpose. Verbatim: "Instead of defining individual schemas for each element in the GraphQL layer, we use JSON strings for element content. This approach maintains a stable GraphQL schema and allows for rapid iteration on new elements or versions." A
ChaosJsonComponenthas three fields the schema cares about:identifier,componentType(e.g.chaos.text.v1,chaos.button.v1,chaos.illustration.v1), and aparametersstring that carries the entire element-specific payload as escaped JSON. Python dataclasses type-check the content internally and serialise to JSON. Canonical instance of concepts/json-string-parameters-for-schema-stability. (Source: sources/2025-07-08-yelp-exploring-chaos-building-a-backend-for-server-driven-ui) -
The build pipeline is four layers deep, all named.
ChaosConfigBuilderselects aViewBuilderbyview_id()from a registered list; the ViewBuilder returns aLayoutBuilder(e.g.SingleColumnLayoutBuilderwith sections likemain, or mobile layouts with toolbar/ footer); the LayoutBuilder holds an ordered list ofFeatureProviderclasses per section (order = render order on the client); eachFeatureProviderproduces one feature's components + actions. This is the build contract. (Source: sources/2025-07-08-yelp-exploring-chaos-building-a-backend-for-server-driven-ui) -
FeatureProvider lifecycle is six named stages, run in two parallel loops. Verbatim stages:
registers,is_qualified_to_load,load_data,resolve,is_qualified_to_present,result_presenter. The first loop firesregisters+ qualification +load_dataacross all features in parallel — triggering async upstream requests. The second loop drivesresolve+ qualification +result_presenteronce responses return. Verbatim: "the feature providers are iterated over twice. In the first loop, the build process is initiated, triggering any asynchronous calls to external services … The second loop waits for responses and completes the build process." Parent pattern: patterns/two-loop-parallel-async-build; lifecycle pattern: patterns/feature-provider-lifecycle. (Source: sources/2025-07-08-yelp-exploring-chaos-building-a-backend-for-server-driven-ui) -
Client capability is matched via
Registerdeclarations, not feature flags. EachRegisterdeclares aCondition(platform=[iOS, Android, web, ...], library=[ required component and action classes])plus a presenter handler. At request time, the backend walks the ordered register list and picks the first register whose condition is satisfied by the requesting client — or drops the feature entirely if none match. Verbatim: "If no register qualifies, the feature is omitted from the final response." This is the mechanism that keeps old app versions working when new components ship. Canonical instance of concepts/register-based-client-capability-matching. (Source: sources/2025-07-08-yelp-exploring-chaos-building-a-backend-for-server-driven-ui) -
Every feature is error-wrapped; a single failure doesn't sink the view (unless marked essential). Verbatim: "each FeatureProvider is wrapped in an error-handling wrapper during the CHAOS build process. If an exception occurs, the individual feature is dropped, and the rest of the view remains unaffected. Unless developers choose to mark the feature as 'essential,' meaning its failure will affect the entire view." The pseudo-code is an
@error_decoratorthat catches exceptions infinal_result_presenter, checksself._is_essential_provider, and returns[]on non- essential failures. Failure telemetry — "feature name, ownership info, exception specifics, and additional request context" — is logged for alerting and owner notification "when problems reach a specified threshold." Canonical instance of patterns/error-isolation-per-feature-wrapper. (Source: sources/2025-07-08-yelp-exploring-chaos-building-a-backend-for-server-driven-ui) -
View Flows preload a sequence of related views in one response. A
ViewBuilder.subsequent_views()method returns additional ViewBuilder classes whose output is packed into the samechaosView.viewslist. Achaos.open-subsequent-view.v1action on any component navigates byviewIdwithout a network round-trip. Yelp illustrates with a three-view loop (View 1 → View 2 → View 3 → View 1) and motivates it with the Yelp for Business customer support FAQ menu. Verbatim rationale: "By preloading these views, we eliminate the need for additional network requests for each view configuration, thereby enhancing the user experience by reducing latency." Canonical instance of patterns/preloaded-view-flow-for-predictable-navigation. (Source: sources/2025-07-08-yelp-exploring-chaos-building-a-backend-for-server-driven-ui) -
View Placeholders embed nested CHAOS views rendered asynchronously.
ViewPlaceholderV1is a component carrying aviewName, optionalfeatureContext, and a full loading/error/empty/header/footer component-ID suite plusestimatedContentHeight. The client renders the parent immediately with a loading state; the placeholder fetches its own CHAOS configuration in the background and swaps in the resolved content when ready. Yelp's production example: Yelp for Business home screen embeds a Reminders feature (served by a different CHAOS backend) via a placeholder. Canonical instance of patterns/view-placeholder-async-embed. (Source: sources/2025-07-08-yelp-exploring-chaos-building-a-backend-for-server-driven-ui) -
The latest CHAOS backend moves to Python asyncio. Verbatim side-note: "the latest CHAOS backend framework introduces the next generation of builders using Python asyncio, which simplifies the interface. This will be explored in a future blog post." The disclosed two-loop iteration is the pre-asyncio mechanism for parallel async fan-out; asyncio collapses the two loops into structured concurrency. (Source: sources/2025-07-08-yelp-exploring-chaos-building-a-backend-for-server-driven-ui)
Architectural diagram (post-facto reconstruction)¶
client (iOS / Android / web)
│
│ GraphQL query { chaosConfiguration(name, context) { ... } }
▼
┌────────────────────────────┐
│ Yelp Supergraph │ (Apollo Federation router)
│ (authn + routing) │
└────────────┬────────────────┘
│
▼
┌────────────────────────────┐
│ CHAOS Subgraph (Python, │
│ Strawberry) │
│ – CHAOS-specific schema │
│ – resolves chaosView etc. │
└────────────┬────────────────┘
│ CHAOS REST API
▼
┌────────────────────────────┐
│ CHAOS Backend Service(s) │ (one per owning team)
│ │
│ ChaosConfigBuilder │
│ │ │
│ ▼ │
│ ViewBuilder (by view_id) │
│ │ │
│ ▼ │
│ LayoutBuilder │
│ (SingleColumn, mobile, …) │
│ │ │
│ ▼ per section, ordered│
│ [ FeatureProvider ]+ │
│ │ │
│ ▼ loop 1 (parallel) │
│ registers / is_qualified_ │
│ to_load / load_data → │
│ async upstream requests │
│ │ │
│ ▼ loop 2 (await) │
│ resolve / │
│ is_qualified_to_present / │
│ result_presenter → │
│ Components + Actions │
│ │ │
│ ▼ error-wrapped │
│ CHAOS Configuration │
│ (views + layout + compo- │
│ nents + actions, JSON) │
└─────────────────────────────┘
Build pipeline reference (from the post's code samples)¶
# 1. Request handler
def handle_chaos_request(request):
context = get_chaos_context(request)
ChaosConfigBuilder.register_view_builders([
ConsumerWelcomeViewBuilder,
# ... others
])
return ChaosConfigBuilder(context).build()
# 2. ViewBuilder selects layout
class ConsumerWelcomeViewBuilder(ViewBuilderBase):
@classmethod
def view_id(cls) -> str: return "consumer.welcome"
def subsequent_views(self) -> List[Type[ViewBuilderBase]]:
return []
def _get_layout_builder(self) -> LayoutBuilderBase:
return SingleColumnLayoutBuilder(
main=[WelcomeFeatureProvider], context=self._context,
)
# 3. FeatureProvider lifecycle (abbreviated)
class FeatureProviderBase:
@property
def registers(self) -> List[Register]: ...
def is_qualified_to_load(self) -> bool: return True
def load_data(self) -> None: ... # fires async requests
def resolve(self) -> None: ... # blocks on results
def is_qualified_to_present(self) -> bool: return True
def result_presenter(self) -> List[Component]: ...
# 4. Register declares client capability
Register(
condition=Condition(
platform=[Platform.IOS, Platform.ANDROID],
library=[TextV1, IllustrationV1, ButtonV1],
),
presenter_handler=self.result_presenter,
)
# 5. Error isolation wraps every feature
def error_decorator(f):
@wraps(f)
def wrapper(self, *args, **kwargs):
try: return f(self, *args, **kwargs)
except Exception as e:
if self._is_essential_provider: raise
log_error(exception=e, context=self._context)
return []
return wrapper
Operational details¶
- Element content carrier: JSON string in
parametersfield ofChaosJsonComponent/ChaosJsonAction— GraphQL schema stays stable across element version bumps. - Element versioning: baked into
componentType/actionTypestrings (e.g.chaos.text.v1,chaos.open-subsequent-view.v1). - Single-column layout example sections: one
mainsection listing component IDs (welcome-to-yelp-header,welcome-to-yelp-illustration,find-local-businesses-button). - Mobile layouts: include additional named sections ("toolbar and footer") — not enumerated in full.
- Error telemetry: logs feature name, ownership info, exception details, request context — drives alerting and owner-team notification on threshold breach.
- Parallel build: first loop fans out async upstream calls for all features; second loop awaits and resolves — the post-asyncio version will collapse this.
- View flows: subsequent views are packed alongside the
primary view in the same
viewsarray; navigation useschaos.open-subsequent-view.v1. - View placeholders: lazily-loaded nested CHAOS views fetched by the client after the parent renders; configurable loading / error / empty / header / footer component IDs.
Caveats / gotchas¶
- Spec incomplete for authenticated-context flow. The post says "The CHAOS API authenticates requests and routes them to the relevant backend service" but doesn't detail the auth token format, scope model, or per-backend access control — presumably inherited from the Apollo Federation supergraph but not spelled out.
- Layout repertoire is not enumerated. Only
SingleColumnLayoutBuilderis named, plus a passing reference to "a basic mobile layout" with toolbar + footer. The full layout taxonomy is not disclosed. - No numbers. Zero operational quantities: no latency percentiles, no cache hit rates, no RPS, no view counts. The post is an architecture walkthrough, not a retrospective.
- Two-loop iteration is already deprecated. Yelp flags that the "latest CHAOS backend framework" uses Python asyncio; the two-loop mechanism documented here is a transitional design. Treat as an educational waypoint, not the final state.
- Element schema discipline lives outside GraphQL. Python
dataclasses (
TextV1,IllustrationV1,ButtonV1, etc.) type-check the JSON payloads at backend build time, but the GraphQL schema itself is just "an opaque string" for elements. Clients that consume the config must know the element vocabulary out-of-band. - Register qualifications are first-match, order-sensitive.
If a feature declares multiple
Registerentries the first qualifying one wins — correctness depends on developers ordering from most-to-least-specific.
Source¶
- Original: https://engineeringblog.yelp.com/2025/07/chaos-inside-yelps-sdui-framework.html
- Raw markdown:
raw/yelp/2025-07-08-exploring-chaos-building-a-backend-for-server-driven-ui-1d30116d.md
Related¶
- systems/yelp-chaos — the system canonicalised
- systems/apollo-federation — the GraphQL federation substrate
- systems/strawberry-graphql — Python GraphQL library used
- concepts/server-driven-ui — upstream concept
- concepts/register-based-client-capability-matching — the gating mechanism for mixed client-version fleets
- concepts/json-string-parameters-for-schema-stability — the element-content-as-JSON trick
- patterns/federated-graphql-subgraph-per-domain — the multi-team API topology
- patterns/feature-provider-lifecycle — the 6-stage build contract
- patterns/two-loop-parallel-async-build — latency-shaping iteration structure
- patterns/error-isolation-per-feature-wrapper — per-feature try/except with essential-feature opt-out
- patterns/preloaded-view-flow-for-predictable-navigation — subsequent_views bundling
- patterns/view-placeholder-async-embed — nested-view lazy loading
- companies/yelp