Skip to content

YELP 2025-07-08 Tier 3

Read original ↗

Yelp — Exploring CHAOS: Building a Backend for Server-Driven UI

Summary

Yelp Engineering post (2025-07-08) that unpacks the backend of CHAOS, Yelp's internal SDUI framework. A companion to their earlier 2024-03 post introducing CHAOS; this post is the first-party disclosure of how the CHAOS backend actually constructs the per-request CHAOS SDUI Configuration (views + layout + components + actions) that clients render. Three axes of architecture are disclosed: (1) the GraphQL surface — an Apollo Federation subgraph implemented in Python via Strawberry, fronting multiple per-team REST backends that all implement the same CHAOS REST API; (2) the per-request build pipeline — ChaosConfigBuilderViewBuilderLayoutBuilderFeatureProvider — with a deliberately two-loop parallel async load/resolve structure for latency; (3) advanced primitives — View Flows (preloaded subsequent views linked by an OpenSubsequentView action) and View Placeholders (nested CHAOS views loaded after the parent renders). The post also documents concrete discipline: element content is carried as JSON strings under a stable GraphQL schema, client capability is matched at Register time (platform × required components/actions × presenter handler), and every FeatureProvider is wrapped in an error-isolation decorator so a single feature failure drops that feature rather than the whole view (unless the feature is flagged essential).

Key takeaways

  1. CHAOS is a GraphQL surface over many REST backends, not a monolith. Verbatim: "we support multiple CHAOS backends that implement a CHAOS REST API to serve CHAOS content in the form of CHAOS Configurations. This architecture allows different teams to manage their CHAOS content independently on their own services, while the GraphQL layer provides a unified interface for client requests." One federated subgraph; many REST-API-conforming backends behind it. Canonical instance of patterns/federated-graphql-subgraph-per-domain with a per-domain REST API contract that lets teams keep their own services. (Source: sources/2025-07-08-yelp-exploring-chaos-building-a-backend-for-server-driven-ui)

  2. Apollo Federation + Strawberry (Python) is the concrete GraphQL stack. Verbatim: "At Yelp, we have adopted Apollo Federation for our GraphQL architecture, utilizing Strawberry for federated Python subgraphs to leverage type-safe schema definitions and Python's type hints. The CHAOS-specific GraphQL schema resides in its own CHAOS Subgraph, hosted by a Python service." The CHAOS API authenticates requests at the supergraph layer and routes them to the relevant CHAOS backend service where "most of the build logic is handled." (Source: sources/2025-07-08-yelp-exploring-chaos-building-a-backend-for-server-driven-ui)

  3. Element content is JSON strings inside a stable schema, on purpose. Verbatim: "Instead of defining individual schemas for each element in the GraphQL layer, we use JSON strings for element content. This approach maintains a stable GraphQL schema and allows for rapid iteration on new elements or versions." A ChaosJsonComponent has three fields the schema cares about: identifier, componentType (e.g. chaos.text.v1, chaos.button.v1, chaos.illustration.v1), and a parameters string that carries the entire element-specific payload as escaped JSON. Python dataclasses type-check the content internally and serialise to JSON. Canonical instance of concepts/json-string-parameters-for-schema-stability. (Source: sources/2025-07-08-yelp-exploring-chaos-building-a-backend-for-server-driven-ui)

  4. The build pipeline is four layers deep, all named. ChaosConfigBuilder selects a ViewBuilder by view_id() from a registered list; the ViewBuilder returns a LayoutBuilder (e.g. SingleColumnLayoutBuilder with sections like main, or mobile layouts with toolbar/ footer); the LayoutBuilder holds an ordered list of FeatureProvider classes per section (order = render order on the client); each FeatureProvider produces one feature's components + actions. This is the build contract. (Source: sources/2025-07-08-yelp-exploring-chaos-building-a-backend-for-server-driven-ui)

  5. FeatureProvider lifecycle is six named stages, run in two parallel loops. Verbatim stages: registers, is_qualified_to_load, load_data, resolve, is_qualified_to_present, result_presenter. The first loop fires registers + qualification + load_data across all features in parallel — triggering async upstream requests. The second loop drives resolve + qualification + result_presenter once responses return. Verbatim: "the feature providers are iterated over twice. In the first loop, the build process is initiated, triggering any asynchronous calls to external services … The second loop waits for responses and completes the build process." Parent pattern: patterns/two-loop-parallel-async-build; lifecycle pattern: patterns/feature-provider-lifecycle. (Source: sources/2025-07-08-yelp-exploring-chaos-building-a-backend-for-server-driven-ui)

  6. Client capability is matched via Register declarations, not feature flags. Each Register declares a Condition(platform=[iOS, Android, web, ...], library=[ required component and action classes]) plus a presenter handler. At request time, the backend walks the ordered register list and picks the first register whose condition is satisfied by the requesting client — or drops the feature entirely if none match. Verbatim: "If no register qualifies, the feature is omitted from the final response." This is the mechanism that keeps old app versions working when new components ship. Canonical instance of concepts/register-based-client-capability-matching. (Source: sources/2025-07-08-yelp-exploring-chaos-building-a-backend-for-server-driven-ui)

  7. Every feature is error-wrapped; a single failure doesn't sink the view (unless marked essential). Verbatim: "each FeatureProvider is wrapped in an error-handling wrapper during the CHAOS build process. If an exception occurs, the individual feature is dropped, and the rest of the view remains unaffected. Unless developers choose to mark the feature as 'essential,' meaning its failure will affect the entire view." The pseudo-code is an @error_decorator that catches exceptions in final_result_presenter, checks self._is_essential_provider, and returns [] on non- essential failures. Failure telemetry — "feature name, ownership info, exception specifics, and additional request context" — is logged for alerting and owner notification "when problems reach a specified threshold." Canonical instance of patterns/error-isolation-per-feature-wrapper. (Source: sources/2025-07-08-yelp-exploring-chaos-building-a-backend-for-server-driven-ui)

  8. View Flows preload a sequence of related views in one response. A ViewBuilder.subsequent_views() method returns additional ViewBuilder classes whose output is packed into the same chaosView.views list. A chaos.open-subsequent-view.v1 action on any component navigates by viewId without a network round-trip. Yelp illustrates with a three-view loop (View 1 → View 2 → View 3 → View 1) and motivates it with the Yelp for Business customer support FAQ menu. Verbatim rationale: "By preloading these views, we eliminate the need for additional network requests for each view configuration, thereby enhancing the user experience by reducing latency." Canonical instance of patterns/preloaded-view-flow-for-predictable-navigation. (Source: sources/2025-07-08-yelp-exploring-chaos-building-a-backend-for-server-driven-ui)

  9. View Placeholders embed nested CHAOS views rendered asynchronously. ViewPlaceholderV1 is a component carrying a viewName, optional featureContext, and a full loading/error/empty/header/footer component-ID suite plus estimatedContentHeight. The client renders the parent immediately with a loading state; the placeholder fetches its own CHAOS configuration in the background and swaps in the resolved content when ready. Yelp's production example: Yelp for Business home screen embeds a Reminders feature (served by a different CHAOS backend) via a placeholder. Canonical instance of patterns/view-placeholder-async-embed. (Source: sources/2025-07-08-yelp-exploring-chaos-building-a-backend-for-server-driven-ui)

  10. The latest CHAOS backend moves to Python asyncio. Verbatim side-note: "the latest CHAOS backend framework introduces the next generation of builders using Python asyncio, which simplifies the interface. This will be explored in a future blog post." The disclosed two-loop iteration is the pre-asyncio mechanism for parallel async fan-out; asyncio collapses the two loops into structured concurrency. (Source: sources/2025-07-08-yelp-exploring-chaos-building-a-backend-for-server-driven-ui)

Architectural diagram (post-facto reconstruction)

  client (iOS / Android / web)
          │  GraphQL query { chaosConfiguration(name, context) { ... } }
  ┌────────────────────────────┐
  │     Yelp Supergraph         │   (Apollo Federation router)
  │  (authn + routing)          │
  └────────────┬────────────────┘
  ┌────────────────────────────┐
  │   CHAOS Subgraph (Python,   │
  │   Strawberry)               │
  │   – CHAOS-specific schema   │
  │   – resolves chaosView etc. │
  └────────────┬────────────────┘
               │  CHAOS REST API
  ┌────────────────────────────┐
  │   CHAOS Backend Service(s)  │   (one per owning team)
  │                             │
  │   ChaosConfigBuilder        │
  │       │                     │
  │       ▼                     │
  │   ViewBuilder (by view_id)  │
  │       │                     │
  │       ▼                     │
  │   LayoutBuilder             │
  │   (SingleColumn, mobile, …) │
  │       │                     │
  │       ▼ per section, ordered│
  │   [ FeatureProvider ]+      │
  │       │                     │
  │       ▼  loop 1 (parallel)  │
  │   registers / is_qualified_ │
  │   to_load / load_data →     │
  │   async upstream requests   │
  │       │                     │
  │       ▼  loop 2 (await)     │
  │   resolve /                 │
  │   is_qualified_to_present / │
  │   result_presenter →        │
  │   Components + Actions      │
  │       │                     │
  │       ▼  error-wrapped      │
  │   CHAOS Configuration       │
  │   (views + layout + compo-  │
  │    nents + actions, JSON)   │
  └─────────────────────────────┘

Build pipeline reference (from the post's code samples)

# 1. Request handler
def handle_chaos_request(request):
    context = get_chaos_context(request)
    ChaosConfigBuilder.register_view_builders([
        ConsumerWelcomeViewBuilder,
        # ... others
    ])
    return ChaosConfigBuilder(context).build()

# 2. ViewBuilder selects layout
class ConsumerWelcomeViewBuilder(ViewBuilderBase):
    @classmethod
    def view_id(cls) -> str: return "consumer.welcome"

    def subsequent_views(self) -> List[Type[ViewBuilderBase]]:
        return []

    def _get_layout_builder(self) -> LayoutBuilderBase:
        return SingleColumnLayoutBuilder(
            main=[WelcomeFeatureProvider], context=self._context,
        )

# 3. FeatureProvider lifecycle (abbreviated)
class FeatureProviderBase:
    @property
    def registers(self) -> List[Register]: ...
    def is_qualified_to_load(self) -> bool: return True
    def load_data(self) -> None: ...        # fires async requests
    def resolve(self) -> None: ...          # blocks on results
    def is_qualified_to_present(self) -> bool: return True
    def result_presenter(self) -> List[Component]: ...

# 4. Register declares client capability
Register(
    condition=Condition(
        platform=[Platform.IOS, Platform.ANDROID],
        library=[TextV1, IllustrationV1, ButtonV1],
    ),
    presenter_handler=self.result_presenter,
)

# 5. Error isolation wraps every feature
def error_decorator(f):
    @wraps(f)
    def wrapper(self, *args, **kwargs):
        try: return f(self, *args, **kwargs)
        except Exception as e:
            if self._is_essential_provider: raise
            log_error(exception=e, context=self._context)
        return []
    return wrapper

Operational details

  • Element content carrier: JSON string in parameters field of ChaosJsonComponent / ChaosJsonAction — GraphQL schema stays stable across element version bumps.
  • Element versioning: baked into componentType / actionType strings (e.g. chaos.text.v1, chaos.open-subsequent-view.v1).
  • Single-column layout example sections: one main section listing component IDs (welcome-to-yelp-header, welcome-to-yelp-illustration, find-local-businesses-button).
  • Mobile layouts: include additional named sections ("toolbar and footer") — not enumerated in full.
  • Error telemetry: logs feature name, ownership info, exception details, request context — drives alerting and owner-team notification on threshold breach.
  • Parallel build: first loop fans out async upstream calls for all features; second loop awaits and resolves — the post-asyncio version will collapse this.
  • View flows: subsequent views are packed alongside the primary view in the same views array; navigation uses chaos.open-subsequent-view.v1.
  • View placeholders: lazily-loaded nested CHAOS views fetched by the client after the parent renders; configurable loading / error / empty / header / footer component IDs.

Caveats / gotchas

  • Spec incomplete for authenticated-context flow. The post says "The CHAOS API authenticates requests and routes them to the relevant backend service" but doesn't detail the auth token format, scope model, or per-backend access control — presumably inherited from the Apollo Federation supergraph but not spelled out.
  • Layout repertoire is not enumerated. Only SingleColumnLayoutBuilder is named, plus a passing reference to "a basic mobile layout" with toolbar + footer. The full layout taxonomy is not disclosed.
  • No numbers. Zero operational quantities: no latency percentiles, no cache hit rates, no RPS, no view counts. The post is an architecture walkthrough, not a retrospective.
  • Two-loop iteration is already deprecated. Yelp flags that the "latest CHAOS backend framework" uses Python asyncio; the two-loop mechanism documented here is a transitional design. Treat as an educational waypoint, not the final state.
  • Element schema discipline lives outside GraphQL. Python dataclasses (TextV1, IllustrationV1, ButtonV1, etc.) type-check the JSON payloads at backend build time, but the GraphQL schema itself is just "an opaque string" for elements. Clients that consume the config must know the element vocabulary out-of-band.
  • Register qualifications are first-match, order-sensitive. If a feature declares multiple Register entries the first qualifying one wins — correctness depends on developers ordering from most-to-least-specific.

Source

Last updated · 476 distilled / 1,218 read