Skip to content

SYSTEM Cited by 2 sources

Istio

Istio is the reference sidecar-based service mesh: an systems/envoy proxy injected into every pod intercepting all traffic, with a central control plane (istiod) pushing config via xDS. Provides language-agnostic L7 load balancing, mTLS, retries, circuit breaking, and observability. In recent versions, Istio also offers Ambient Mesh: a no-sidecar mode that moves L4 mTLS to a per-node ztunnel and L7 features to a per-namespace waypoint proxy, reducing per-pod overhead.

Stub page. Filed here because this article explicitly evaluates and rejects Istio as an alternative to client-side LB.

Canonical strengths

  • Language-agnostic. Doesn't matter what your services are written in; sidecars intercept traffic for everyone.
  • Centralized resiliency policy. Retries, timeouts, circuit breaking, mTLS applied uniformly without per-service code changes.
  • Rich observability. Every request traverses Envoy; logs/metrics/traces are free.

Costs cited in Databricks' evaluation

From sources/2025-10-01-databricks-intelligent-kubernetes-load-balancing:

  • Operational complexity. Managing thousands of sidecars and control-plane components adds overhead, especially during upgrades and large rollouts.
  • Per-pod performance overhead. Sidecars add CPU, memory, and latency on every hop — significant at Databricks' scale (hundreds of services, many pods each).
  • Limited client flexibility for request-aware strategies. Because routing logic lives outside the application, it's hard to drive LB from in-process signals the application cares about.
  • Ambient Mesh also rejected. Databricks already had proprietary systems for functions like certificate distribution, their routing patterns were relatively static, and the team was small enough that mesh ops cost outweighed benefits.

When sidecar-mesh wins anyway

Databricks' own caveat: "One of the biggest advantages of sidecar-based meshes is language-agnosticism: teams can standardize resiliency and routing across polyglot services without maintaining client libraries everywhere." Their proxyless choice was viable because they're predominantly Scala / one shared framework. Polyglot orgs don't have the client-library lever and pay a higher proxyless cost.

Seen in

  • sources/2026-05-05-airbnb-monitoring-reliably-at-scaleIstio named and explicitly excluded from carrying observability traffic at Airbnb. The post names Istio as Airbnb's service mesh substrate for business traffic, then argues it's the wrong fit for telemetry at scale: "Airbnb uses Istio as its service mesh, and while Istio is excellent for many infrastructure benefits, it wasn't the right fit for our observability workloads." Three reasons enumerated: (1) circular dependency — mesh-metrics carried by the mesh itself; (2) volume asymmetry"orders of magnitude more observability traffic than business traffic"; (3) shared-capacity noisy-neighbour hazard in both directions. Airbnb keeps Istio for business workloads and builds a separate Envoy-based L7 tier for telemetry. Canonicalises Istio's "wrong fit for observability workloads" stance — a distinct framing from the Databricks 2025-10-01 cost-based rejection. See patterns/custom-l7-proxy-for-telemetry-over-service-mesh.
  • sources/2025-10-01-databricks-intelligent-kubernetes-load-balancing — listed and rejected as the "obvious" alternative to Databricks' proxyless design. Good reference for the tradeoff matrix.
Last updated · 542 distilled / 1,571 read