Zalando — Rendering Engine Tales: Road to Concurrent React¶
Summary¶
Zalando's Rendering Engine (RE),
the universal-rendering framework that serves the Fashion Store
website, is being migrated to React 18's concurrent rendering. The
post walks through four things: (1) why RE — whose architecture
already predates React 18 with partial hydration and streaming
primitives — is a natural fit for Suspense/concurrent APIs; (2) the
design trade-off that drove Zalando away from the obvious
"Render-As-You-Fetch" hook-based solution toward a custom
Application-State
layer outside React that externally dictates Suspense state; (3)
the measured A/B-test impact of a narrow, early milestone — swapping
renderToNodeStream / hydrate for renderToPipeableStream /
hydrateRoot — on Core Web Vitals and business metrics; and (4) a
detailed taxonomy of the
hydration-mismatch errors that the
more-strict React 18 hydration surfaced in a large production
React codebase (timers, timezone/locale drift, a Safari-specific
Intl.NumberFormat de-AT bug, invalid HTML nesting), along with
concrete fixes and debugging tips.
Key takeaways¶
-
RE's pre-React-18 architecture already had the concurrent primitives — partial hydration, partial streaming, lazy-loading — so each Renderer is a natural Suspense boundary. "Rendering Engine's own partial hydration/streaming features!" (Source.) This is the architectural reason to migrate: upstream React now provides a supported implementation of capabilities RE had been maintaining in-house.
-
Zalando rejected the pure Render-As-You-Fetch hook approach. After a PoC, four concrete blockers pushed them off it:
SuspenseListis still experimental with limitations (ordered streaming/hydration);useTransitiondoesn't consider nested Suspense boundaries (bad UX in nested-boundary scenarios); hook-initiated fetches couple fetch timing to render order (performance anti-pattern); and the streaming/caching layer for React data supply was not yet final at PoC time. "We finally decided to go with a mixed solution." (patterns/application-state-layer-outside-react.) -
The chosen design: an Application-State layer outside React dictates Suspense state. RE's
resolveEntitystep writes resolved Renderer data into a central Application State. Each Renderer is wrapped in<Suspense>and reads via a "Connector hook" — described by the post as "Imagine Redux'suseSelectorhook, but instead of immediately returning selected data you get a Promise that only resolves once a reducer has made the data available." The Connector either returns data or throws a Promise; React suspends until the Promise resolves. This preserves ordering, nested-boundary handling, and fetch-decoupling while still getting streaming, hydration, and rendering from React. -
Narrow API swap produced measurable wins. Solely changing RE's internal streaming + hydration APIs to the React 18 equivalents (
renderToPipeableStream←renderToNodeStream,hydrateRoot←hydrate) — with no other concurrent-feature adoption — and A/B-testing it across all Fashion Store pages produced: - INP: −5.69 % overall (−6.76 % Catalog, −6.09 % Product Details, −2.92 % Home).
- FID: −8.81 % overall (−17.11 % Catalog, −6.06 % PDP, −2.98 % Home).
- LCP: −2.43 % overall.
- FCP: −0.23 % overall.
- Bounce rate: −0.24 % overall.
- Per-page Exit Rate: −0.43 % Home / −0.06 % Catalog / −0.06 % PDP.
These are A/B-test deltas against the React-17-streaming-API baseline, measured on production traffic. The largest wins are on INP/FID for the Catalog page — consistent with the React 18 hydration API being the key source of interactive-latency improvement.
- React 18's stricter hydration surfaced hundreds of latent mismatches. "We started receiving a lot more hydration error logs (via Sentry) … after fixing dozens of different types of issues deep inside hundreds of Renderers, we were able to considerably reduce the number of the hydration mismatch errors occuring in the wild." Two meta-lessons from this:
- Dynamic-content pages are especially hard: reproducing a specific user's rendered tree from a Sentry log is structurally difficult when backend personalisation picks the Entity tree per request and per user.
-
React's production bundle gives poor debug signal (
componentStackneeds source-map unminification; stacks point to the consequence fiber, not the cause fiber). -
The hydration-mismatch taxonomy has four load-bearing categories. Documented with concrete Instead of / Do code pairs (see dedicated pattern pages):
- Timers / time-deltas — SSR vs CSR compute at different
instants. Fix:
suppressHydrationWarning={true}on the closest wrapping element (patterns/suppress-hydration-warning-for-unavoidable-mismatch). - Timezone-localised dates —
Intl.DateTimeFormat(locale)without an explicittimeZonefalls back to the host's timezone, which differs between the SSR server and the user. Fix: specify atimeZoneparameter consistently on both sides, or move localization to the backend. See concepts/locale-host-default-ssr-csr-divergence + patterns/backend-localization-for-hydration-stability. - Number locale + a specific Safari bug for
de-AT— "Safari … for the de-AT locale, the localisation APIs (likeIntl.NumberFormatortolocalestring) generate values like"2.345"but other browsers including Chrome and Firefox as well as Node.js generate values like"2 345"for the same locale!" Pure SSR-in-Node-vs-CSR-in-Safari mismatch with no application bug. Fix: move number localization to backend. -
Invalid HTML nesting — React 18 treats structurally- invalid HTML (
<div>inside<p>,<button>inside<button>) as a hydration mismatch. Fix: use semantically valid nesting, considereslint-plugin-validate-jsx-nesting. -
Mount-gated rendering is the universal escape hatch. For content that genuinely differs between SSR and CSR by product requirement (e.g. a device-specific app download banner), the suggested pattern is a
useState(false)+useEffect(() => { setIsMounted(true) }, [])guard that renders a fallback (ornull) during SSR and the real content only after client mount. Caveat: watch for layout shift. See patterns/mount-gated-client-only-rendering. -
Sentry-side error deduplication is required. React's
onRecoverableErrorcallback is called multiple times per single mismatch because once hydration fails on one DOM-node-vs-fiber comparison, subsequent ones also fail due to list-alignment drift. Zalando's mitigation: "we modified our error tracking code to only send the first hydration error log to Sentry" — see patterns/first-error-only-hydration-error-reporting. This is a production-observability pattern, not a React-runtime one, but without it the Sentry signal is too noisy to act on. -
Debugging tips for irreproducible mismatches. "In other cases where the cause is not very obvious, what we found helpful was to check the React dev bundle (
react-dom/umd/react-dom.development.js) and put debuggers on places which log the hydration errors (usually thecheckForUnmatchedTextorthrowOnHydrationMismatchfunctions)." The cause fiber is often a different element than the logged fiber — the logged one is whatever comes next after the actual missing node in React's fiber-list-vs- DOM-list walk. Practitioner-level detail worth citing.
Systems surfaced¶
- Zalando Rendering Engine
— the host of this migration. Pre-React-18 it already had
partial hydration, partial streaming, lazy-loading, a
universal-rendering shape (Node.js SSR + browser CSR), and an
Entity-tree-to-Renderer-tree resolution step. Post-migration
it keeps all of that but delegates streaming/hydration to React
18 APIs and wraps each Renderer in a
<Suspense>boundary. - Zalando Interface Framework (IF) — the architecture RE is the runtime core of. This post extends the 2021-03 Part-1 / 2021-09 Part-2 narrative with a 2023 React-18-migration chapter.
- React — React 18's concurrent rendering
APIs (
renderToPipeableStream,hydrateRoot,<Suspense>,useTransition,onRecoverableError,suppressHydrationWarning,unstable_scheduleHydration) named throughout. - Sentry — Zalando's error-tracking system,
named explicitly as the destination for React's
onRecoverableErrorcallbacks and the place the deduplication happens. - Node.js — the SSR runtime; called out
for having a different default host timezone/locale than the
user's browser and for being the exact point where the
Intl.NumberFormatde-AT Safari divergence surfaces.
Concepts surfaced¶
- Concurrent rendering
(React 18) — new page. React's server-driven streaming
render combined with client-driven per-boundary hydration and
the
useTransition/<Suspense>scheduler. - Hydration mismatch — new page. The class of errors where SSR and CSR produce divergent markup for the same component; in React 18, much more strictly reported.
- Render-as-you-fetch — new page. The React-native pattern of initiating fetch in render via Suspense-compatible hooks. Zalando evaluated, then rejected it.
- Progressive hydration — new page. Hydrating UI in pieces as each becomes interactive, rather than one top-level hydrate.
- Locale /
host-default SSR/CSR divergence — new page.
Intl.*andtoLocaleStringpick their default timezone/locale from the host; Node and the user's browser are different hosts, so defaults diverge. - React hydration — extended with the Zalando evidence: dozens of production-hydration-mismatch types in one codebase after the React 18 upgrade.
- concepts/streaming-ssr — gains a second instance (Zalando Fashion Store) after Atlassian Confluence.
- INP / FID / LCP / FCP — Core Web Vitals cited with the A/B-test deltas.
- concepts/micro-frontends — RE's Renderer-as-Suspense- boundary fit is what makes concurrent React natural for a micro-frontend-composition engine.
Patterns surfaced¶
- Application- State layer outside React — new page. A central state store (Redux-shaped) that sits outside React's component tree, holds resolved render data, and whose reads via a "Connector hook" return either data or a Promise (suspending React). The post's main architectural contribution.
- Suppress hydration warning for unavoidable mismatch —
new page. Using React's
suppressHydrationWarning={true}on the closest wrapping element for timer-style content whose SSR and CSR values will always differ. Tightly scoped: one level deep only. - Mount-gated
client-only rendering — new page. The
useState(false) + useEffect(() => setIsMounted(true), [])guard for content that should genuinely only render client-side (device detection, browser-only state). - Backend localization for hydration stability — new page. Move locale-dependent conversions (dates, numbers) out of the component and into the backend; the component renders the already-localised string identically on SSR and CSR.
- First-
error-only hydration error reporting — new page.
Suppress subsequent
onRecoverableErrorcalls from the same hydration pass before they reach the error tracker. Avoids log flood from the post-first-mismatch fiber-list drift. - patterns/suspense-boundary — extended with a second instance (Zalando RE Renderers as Suspense boundaries).
- patterns/entity-to-renderer-mapping — unchanged but cross-referenced: RE's existing Entity→Renderer resolution is the layer that now writes into the Application-State layer.
Operational numbers¶
A/B-test deltas on Fashion Store pages¶
Narrow milestone: React 18 API upgrade only (renderToPipeableStream,
hydrateRoot), no broader concurrent-feature adoption. Rolled out as
an A/B test across all e-commerce pages.
Overall:
| Metric | Δ |
|---|---|
| INP | −5.69 % |
| FID | −8.81 % |
| LCP | −2.43 % |
| FCP | −0.23 % |
| Bounce rate | −0.24 % |
Per frequently-visited page:
| Metric | Home | Catalog | Product Details |
|---|---|---|---|
| INP | −2.92 % | −6.76 % | −6.09 % |
| FID | −2.98 % | −17.11 % | −6.06 % |
| Exit Rate | −0.43 % | −0.06 % | −0.06 % |
Read: the API swap wins mostly on interactive-latency metrics
(INP, FID), less on paint metrics (LCP, FCP). Catalog — the
largest / most Renderer-dense page — sees the biggest interactive
improvement, consistent with hydrateRoot's more-efficient
hydration over hydrate.
Scale signal (indirect)¶
- "hundreds of Renderers" — the Fashion Store codebase size affected by the hydration-mismatch fixes.
- "dozens of different types of issues" — the breadth of hydration-mismatch root causes found.
- A/B test covered "all pages of our e-commerce website" — i.e. the full Fashion Store production traffic slice.
Caveats and gaps¶
- No absolute latency numbers. Only percentage deltas; no
baseline in ms. Without baselines, we can't convert the
−5.69 %INP delta to an absolute improvement or compare to web.dev good/needs-improvement/poor thresholds. - The "next step" architecture isn't shown here. The Application-State layer + Suspense + Connector-hook design is described at principle level; the post defers the concrete ordered-streaming/hydration mechanism to "another post" ("We will share the details of the technical solution for ordered streaming/hydration in another post") which was not in the 2023-07 raw set ingested here.
- No disclosure of Renderer count, render latency distribution, or server fleet size for the RE Node.js SSR tier. The 2021-03 Part-1 post claimed ~90% of Fashion Store traffic was served through the Rendering Engine; this post doesn't update that share.
- No discussion of the streaming/caching layer for React data supply. Zalando cites it as "still not final" (referencing facebook/react#25502), which is part of why they chose their own Application-State layer, but doesn't disclose what they do instead for client-side data de-duplication / cache-rehydration.
- Safari de-AT
Intl.NumberFormatbug is described but not linked to a filed WebKit bug. Author describes it behaviourally; the wiki records it as-observed. - No specific infrastructure platform disclosure — no mention of the Node.js version, CDN layer, HTTP/2 push usage, compression / proxy layer, or where chunked transfer encoding is terminated. Standard streaming-SSR gotchas (proxy buffering, compression flush) aren't covered.
- No observability metrics for the migration beyond Sentry hydration-error counts and the Core Web Vitals A/B deltas. No CPU profile, no TTFB/TTLB split, no per-boundary hydration timing.
- React Server Components explicitly out of scope. Named as a future direction ("Next up, we're excited to start using React Server Components") but not implemented here.
Why this matters on the wiki¶
This post gives the wiki three things it didn't have before:
- A second fully-documented production instance of React 18 streaming SSR + Suspense — the first being Atlassian Confluence (sources/2026-04-16-atlassian-streaming-ssr-confluence). Zalando adds a non-hook architectural choice (Application- State layer outside React) and real A/B-test deltas against the React-17 baseline.
- A production-scale hydration-mismatch taxonomy. Existing coverage (React hydration, Atlassian) names hydration mismatch as a failure class but doesn't enumerate its causes. Zalando documents four categories with concrete fixes surfaced from "hundreds of Renderers".
- The
Intl.NumberFormatde-AT Safari / Node divergence — a concrete, reproducible example of a locale-layer SSR/CSR split that no amount of application code can paper over, and therefore a concrete argument for patterns/backend-localization-for-hydration-stability.
Source¶
- Original: https://engineering.zalando.com/posts/2023/07/rendering-engine-tales-road-to-concurrent-react.html
- Raw markdown:
raw/zalando/2023-07-10-rendering-engine-tales-road-to-concurrent-react-48ea9a3a.md
Related¶
- systems/zalando-rendering-engine · systems/zalando-interface-framework · systems/react · systems/sentry · systems/nodejs
- concepts/concurrent-rendering-react · concepts/hydration-mismatch · concepts/render-as-you-fetch · concepts/progressive-hydration · concepts/locale-host-default-ssr-csr-divergence · concepts/streaming-ssr · concepts/react-hydration · concepts/interaction-to-next-paint
- patterns/application-state-layer-outside-react · patterns/suppress-hydration-warning-for-unavoidable-mismatch · patterns/mount-gated-client-only-rendering · patterns/backend-localization-for-hydration-stability · patterns/first-error-only-hydration-error-reporting · patterns/suspense-boundary · patterns/entity-to-renderer-mapping
- companies/zalando