Skip to content

ZALANDO 2023-07-10

Read original ↗

Zalando — Rendering Engine Tales: Road to Concurrent React

Summary

Zalando's Rendering Engine (RE), the universal-rendering framework that serves the Fashion Store website, is being migrated to React 18's concurrent rendering. The post walks through four things: (1) why RE — whose architecture already predates React 18 with partial hydration and streaming primitives — is a natural fit for Suspense/concurrent APIs; (2) the design trade-off that drove Zalando away from the obvious "Render-As-You-Fetch" hook-based solution toward a custom Application-State layer outside React that externally dictates Suspense state; (3) the measured A/B-test impact of a narrow, early milestone — swapping renderToNodeStream / hydrate for renderToPipeableStream / hydrateRoot — on Core Web Vitals and business metrics; and (4) a detailed taxonomy of the hydration-mismatch errors that the more-strict React 18 hydration surfaced in a large production React codebase (timers, timezone/locale drift, a Safari-specific Intl.NumberFormat de-AT bug, invalid HTML nesting), along with concrete fixes and debugging tips.

Key takeaways

  1. RE's pre-React-18 architecture already had the concurrent primitives — partial hydration, partial streaming, lazy-loading — so each Renderer is a natural Suspense boundary. "Rendering Engine's own partial hydration/streaming features!" (Source.) This is the architectural reason to migrate: upstream React now provides a supported implementation of capabilities RE had been maintaining in-house.

  2. Zalando rejected the pure Render-As-You-Fetch hook approach. After a PoC, four concrete blockers pushed them off it: SuspenseList is still experimental with limitations (ordered streaming/hydration); useTransition doesn't consider nested Suspense boundaries (bad UX in nested-boundary scenarios); hook-initiated fetches couple fetch timing to render order (performance anti-pattern); and the streaming/caching layer for React data supply was not yet final at PoC time. "We finally decided to go with a mixed solution." (patterns/application-state-layer-outside-react.)

  3. The chosen design: an Application-State layer outside React dictates Suspense state. RE's resolveEntity step writes resolved Renderer data into a central Application State. Each Renderer is wrapped in <Suspense> and reads via a "Connector hook" — described by the post as "Imagine Redux's useSelector hook, but instead of immediately returning selected data you get a Promise that only resolves once a reducer has made the data available." The Connector either returns data or throws a Promise; React suspends until the Promise resolves. This preserves ordering, nested-boundary handling, and fetch-decoupling while still getting streaming, hydration, and rendering from React.

  4. Narrow API swap produced measurable wins. Solely changing RE's internal streaming + hydration APIs to the React 18 equivalents (renderToPipeableStreamrenderToNodeStream, hydrateRoothydrate) — with no other concurrent-feature adoption — and A/B-testing it across all Fashion Store pages produced:

  5. INP: −5.69 % overall (−6.76 % Catalog, −6.09 % Product Details, −2.92 % Home).
  6. FID: −8.81 % overall (−17.11 % Catalog, −6.06 % PDP, −2.98 % Home).
  7. LCP: −2.43 % overall.
  8. FCP: −0.23 % overall.
  9. Bounce rate: −0.24 % overall.
  10. Per-page Exit Rate: −0.43 % Home / −0.06 % Catalog / −0.06 % PDP.

These are A/B-test deltas against the React-17-streaming-API baseline, measured on production traffic. The largest wins are on INP/FID for the Catalog page — consistent with the React 18 hydration API being the key source of interactive-latency improvement.

  1. React 18's stricter hydration surfaced hundreds of latent mismatches. "We started receiving a lot more hydration error logs (via Sentry) … after fixing dozens of different types of issues deep inside hundreds of Renderers, we were able to considerably reduce the number of the hydration mismatch errors occuring in the wild." Two meta-lessons from this:
  2. Dynamic-content pages are especially hard: reproducing a specific user's rendered tree from a Sentry log is structurally difficult when backend personalisation picks the Entity tree per request and per user.
  3. React's production bundle gives poor debug signal (componentStack needs source-map unminification; stacks point to the consequence fiber, not the cause fiber).

  4. The hydration-mismatch taxonomy has four load-bearing categories. Documented with concrete Instead of / Do code pairs (see dedicated pattern pages):

  5. Timers / time-deltas — SSR vs CSR compute at different instants. Fix: suppressHydrationWarning={true} on the closest wrapping element (patterns/suppress-hydration-warning-for-unavoidable-mismatch).
  6. Timezone-localised datesIntl.DateTimeFormat(locale) without an explicit timeZone falls back to the host's timezone, which differs between the SSR server and the user. Fix: specify a timeZone parameter consistently on both sides, or move localization to the backend. See concepts/locale-host-default-ssr-csr-divergence + patterns/backend-localization-for-hydration-stability.
  7. Number locale + a specific Safari bug for de-AT"Safari … for the de-AT locale, the localisation APIs (like Intl.NumberFormat or tolocalestring) generate values like "2.345" but other browsers including Chrome and Firefox as well as Node.js generate values like "2 345" for the same locale!" Pure SSR-in-Node-vs-CSR-in-Safari mismatch with no application bug. Fix: move number localization to backend.
  8. Invalid HTML nesting — React 18 treats structurally- invalid HTML (<div> inside <p>, <button> inside <button>) as a hydration mismatch. Fix: use semantically valid nesting, consider eslint-plugin-validate-jsx-nesting.

  9. Mount-gated rendering is the universal escape hatch. For content that genuinely differs between SSR and CSR by product requirement (e.g. a device-specific app download banner), the suggested pattern is a useState(false) + useEffect(() => { setIsMounted(true) }, []) guard that renders a fallback (or null) during SSR and the real content only after client mount. Caveat: watch for layout shift. See patterns/mount-gated-client-only-rendering.

  10. Sentry-side error deduplication is required. React's onRecoverableError callback is called multiple times per single mismatch because once hydration fails on one DOM-node-vs-fiber comparison, subsequent ones also fail due to list-alignment drift. Zalando's mitigation: "we modified our error tracking code to only send the first hydration error log to Sentry" — see patterns/first-error-only-hydration-error-reporting. This is a production-observability pattern, not a React-runtime one, but without it the Sentry signal is too noisy to act on.

  11. Debugging tips for irreproducible mismatches. "In other cases where the cause is not very obvious, what we found helpful was to check the React dev bundle (react-dom/umd/react-dom.development.js) and put debuggers on places which log the hydration errors (usually the checkForUnmatchedText or throwOnHydrationMismatch functions)." The cause fiber is often a different element than the logged fiber — the logged one is whatever comes next after the actual missing node in React's fiber-list-vs- DOM-list walk. Practitioner-level detail worth citing.

Systems surfaced

  • Zalando Rendering Engine — the host of this migration. Pre-React-18 it already had partial hydration, partial streaming, lazy-loading, a universal-rendering shape (Node.js SSR + browser CSR), and an Entity-tree-to-Renderer-tree resolution step. Post-migration it keeps all of that but delegates streaming/hydration to React 18 APIs and wraps each Renderer in a <Suspense> boundary.
  • Zalando Interface Framework (IF) — the architecture RE is the runtime core of. This post extends the 2021-03 Part-1 / 2021-09 Part-2 narrative with a 2023 React-18-migration chapter.
  • React — React 18's concurrent rendering APIs (renderToPipeableStream, hydrateRoot, <Suspense>, useTransition, onRecoverableError, suppressHydrationWarning, unstable_scheduleHydration) named throughout.
  • Sentry — Zalando's error-tracking system, named explicitly as the destination for React's onRecoverableError callbacks and the place the deduplication happens.
  • Node.js — the SSR runtime; called out for having a different default host timezone/locale than the user's browser and for being the exact point where the Intl.NumberFormat de-AT Safari divergence surfaces.

Concepts surfaced

  • Concurrent rendering (React 18) — new page. React's server-driven streaming render combined with client-driven per-boundary hydration and the useTransition/<Suspense> scheduler.
  • Hydration mismatch — new page. The class of errors where SSR and CSR produce divergent markup for the same component; in React 18, much more strictly reported.
  • Render-as-you-fetch — new page. The React-native pattern of initiating fetch in render via Suspense-compatible hooks. Zalando evaluated, then rejected it.
  • Progressive hydration — new page. Hydrating UI in pieces as each becomes interactive, rather than one top-level hydrate.
  • Locale / host-default SSR/CSR divergence — new page. Intl.* and toLocaleString pick their default timezone/locale from the host; Node and the user's browser are different hosts, so defaults diverge.
  • React hydration — extended with the Zalando evidence: dozens of production-hydration-mismatch types in one codebase after the React 18 upgrade.
  • concepts/streaming-ssr — gains a second instance (Zalando Fashion Store) after Atlassian Confluence.
  • INP / FID / LCP / FCP — Core Web Vitals cited with the A/B-test deltas.
  • concepts/micro-frontends — RE's Renderer-as-Suspense- boundary fit is what makes concurrent React natural for a micro-frontend-composition engine.

Patterns surfaced

  • Application- State layer outside React — new page. A central state store (Redux-shaped) that sits outside React's component tree, holds resolved render data, and whose reads via a "Connector hook" return either data or a Promise (suspending React). The post's main architectural contribution.
  • Suppress hydration warning for unavoidable mismatch — new page. Using React's suppressHydrationWarning={true} on the closest wrapping element for timer-style content whose SSR and CSR values will always differ. Tightly scoped: one level deep only.
  • Mount-gated client-only rendering — new page. The useState(false) + useEffect(() => setIsMounted(true), []) guard for content that should genuinely only render client-side (device detection, browser-only state).
  • Backend localization for hydration stability — new page. Move locale-dependent conversions (dates, numbers) out of the component and into the backend; the component renders the already-localised string identically on SSR and CSR.
  • First- error-only hydration error reporting — new page. Suppress subsequent onRecoverableError calls from the same hydration pass before they reach the error tracker. Avoids log flood from the post-first-mismatch fiber-list drift.
  • patterns/suspense-boundary — extended with a second instance (Zalando RE Renderers as Suspense boundaries).
  • patterns/entity-to-renderer-mapping — unchanged but cross-referenced: RE's existing Entity→Renderer resolution is the layer that now writes into the Application-State layer.

Operational numbers

A/B-test deltas on Fashion Store pages

Narrow milestone: React 18 API upgrade only (renderToPipeableStream, hydrateRoot), no broader concurrent-feature adoption. Rolled out as an A/B test across all e-commerce pages.

Overall:

Metric Δ
INP −5.69 %
FID −8.81 %
LCP −2.43 %
FCP −0.23 %
Bounce rate −0.24 %

Per frequently-visited page:

Metric Home Catalog Product Details
INP −2.92 % −6.76 % −6.09 %
FID −2.98 % −17.11 % −6.06 %
Exit Rate −0.43 % −0.06 % −0.06 %

Read: the API swap wins mostly on interactive-latency metrics (INP, FID), less on paint metrics (LCP, FCP). Catalog — the largest / most Renderer-dense page — sees the biggest interactive improvement, consistent with hydrateRoot's more-efficient hydration over hydrate.

Scale signal (indirect)

  • "hundreds of Renderers" — the Fashion Store codebase size affected by the hydration-mismatch fixes.
  • "dozens of different types of issues" — the breadth of hydration-mismatch root causes found.
  • A/B test covered "all pages of our e-commerce website" — i.e. the full Fashion Store production traffic slice.

Caveats and gaps

  • No absolute latency numbers. Only percentage deltas; no baseline in ms. Without baselines, we can't convert the −5.69 % INP delta to an absolute improvement or compare to web.dev good/needs-improvement/poor thresholds.
  • The "next step" architecture isn't shown here. The Application-State layer + Suspense + Connector-hook design is described at principle level; the post defers the concrete ordered-streaming/hydration mechanism to "another post" ("We will share the details of the technical solution for ordered streaming/hydration in another post") which was not in the 2023-07 raw set ingested here.
  • No disclosure of Renderer count, render latency distribution, or server fleet size for the RE Node.js SSR tier. The 2021-03 Part-1 post claimed ~90% of Fashion Store traffic was served through the Rendering Engine; this post doesn't update that share.
  • No discussion of the streaming/caching layer for React data supply. Zalando cites it as "still not final" (referencing facebook/react#25502), which is part of why they chose their own Application-State layer, but doesn't disclose what they do instead for client-side data de-duplication / cache-rehydration.
  • Safari de-AT Intl.NumberFormat bug is described but not linked to a filed WebKit bug. Author describes it behaviourally; the wiki records it as-observed.
  • No specific infrastructure platform disclosure — no mention of the Node.js version, CDN layer, HTTP/2 push usage, compression / proxy layer, or where chunked transfer encoding is terminated. Standard streaming-SSR gotchas (proxy buffering, compression flush) aren't covered.
  • No observability metrics for the migration beyond Sentry hydration-error counts and the Core Web Vitals A/B deltas. No CPU profile, no TTFB/TTLB split, no per-boundary hydration timing.
  • React Server Components explicitly out of scope. Named as a future direction ("Next up, we're excited to start using React Server Components") but not implemented here.

Why this matters on the wiki

This post gives the wiki three things it didn't have before:

  1. A second fully-documented production instance of React 18 streaming SSR + Suspense — the first being Atlassian Confluence (sources/2026-04-16-atlassian-streaming-ssr-confluence). Zalando adds a non-hook architectural choice (Application- State layer outside React) and real A/B-test deltas against the React-17 baseline.
  2. A production-scale hydration-mismatch taxonomy. Existing coverage (React hydration, Atlassian) names hydration mismatch as a failure class but doesn't enumerate its causes. Zalando documents four categories with concrete fixes surfaced from "hundreds of Renderers".
  3. The Intl.NumberFormat de-AT Safari / Node divergence — a concrete, reproducible example of a locale-layer SSR/CSR split that no amount of application code can paper over, and therefore a concrete argument for patterns/backend-localization-for-hydration-stability.

Source

Last updated · 501 distilled / 1,218 read