SYSTEM Cited by 2 sources
Skipper (Zalando HTTP proxy)¶
Definition¶
Skipper (github.com/zalando/skipper) is an HTTP router and reverse-proxy written in Go by Zalando, designed to be composable out of small filters and predicates along a routing DSL. At Zalando it is deployed as the default Kubernetes Ingress proxy across 140+ clusters (Source: ), sitting behind AWS ALBs provisioned by the Kubernetes Ingress Controller for AWS and in front of application pods.
What makes it distinctive¶
-
Eskip routing DSL — routes are declarative, composed as
predicate_chain -> filter_chain -> backend. Filters likecompress(),setPath("/..."),inlineContent(...),setDynamicBackendUrl(...)chain with->before the final backend token. Example from the Zalando blog-launch post:* -> compress() -> setDynamicBackendUrl("http://<BUCKET>.s3-website.<REGION>.amazonaws.com") -> <dynamic>;— this annotation (attached to anIngressaszalando.org/skipper-routes) rewrites every request so the ingress proxies to an S3 static-website endpoint with edgegzipcompression. -
Dynamic backends via
<dynamic>sentinel — backends can be resolved from request headers or from filters at request time (setDynamicBackendUrl), which is how Skipper can front a non-Kubernetes origin (S3 website, a legacy host, etc.) without needing a Service/Endpoints pair. -
Rich filter library for L7 manipulation —
compress()adds gzip encoding that upstreams don't provide;ratelimit(...),oauthTokeninfoAnyScope(...),stripQuery(), header add/remove/set filters, response-body rewriting. Used as the platform-wide policy enforcement point at Zalando. -
Ingress annotation integration — rather than a custom CRD for routes, Skipper reads routes from Kubernetes
Ingressobjects with thezalando.org/skipper-routesannotation containing eskip DSL. This keeps the Kubernetes-native API surface intact while exposing Skipper's full filter capability.
Seen in¶
- —
reused as the edge proxy for
engineering.zalando.cominstead of standing up CloudFront; single ingress-annotation route proxies to S3 website endpoint withcompress()for gzip. - — Skipper routes landing-page requests into Zalando's Interface Framework with per-platform header enrichment (distinguishing web from Zalando-app traffic) so the same Rendering Engine tier can be routed behind two different ingress paths.
- — Skipper's header-based routing is the substrate that lets a single service instance in Zalando Payments' test cluster dynamically switch between the real external dependency and its Hoverfly mock per request, based on whether the request carries a load-test tag. Canonical instance of patterns/header-routed-mock-vs-real-dependency; the tight integration of predicate-filtered eskip routes with existing Ingress deployments is what makes this a zero-new-infrastructure pattern at Zalando.
-
— Skipper's OAuth token-info filters
(
oauthTokeninfoAnyScope(...), etc.) named as the first of three default AuthN/AuthZ options for new Kotlin backend services at Zalando (the other two: Route Groups and Fabric Gateway). Rationale given: Skipper "is designed to handle a large number of requests and is less likely to be misconfigured than for example Spring security" — a misconfiguration-at-scale argument for the central choke-point gateway over per-service auth libraries. - sources/2025-02-16-zalando-scaling-beyond-limits-harnessing-route-server-for-a-stable-cluster
— control-plane decoupling via
Route Server (routesrv).
Skipper originally polled the Kubernetes API itself for
Ingress+RouteGroup; at ~180 Skippers per cluster × 200 clusters this fan-out (concepts/control-plane-fan-out-to-kubernetes-api) overwhelmed etcd and throttled the API-server CPU, threatening pod scheduling fleet-wide. Zalando inserted Route Server as a coalescing proxy — one routesrv polls the Kubernetes API at 3 s and serves all Skippers with ETag conditional polling (304 Not Modifiedsteady-state, full Eskip on change). Skipper retains the last routing table if routesrv goes away (concepts/last-known-good-routing-table). Rolled out via a three-position config flag (False/Pre/Exec) with shadow-modegit diffof Skipper-computed vs routesrv-computed Eskip before any cutover (patterns/three-mode-rollout-off-shadow-exec). Skipper HPA ceiling raised from ~180 to 300 pods per cluster; zero downtime, zero GMV loss. Kubernetes Informers were considered and rejected — they keep the N× fan-out shape at change events. -
— Consistent Hash Load Balancing (CHLB) + bounded-load
as the serving-tier ingress primitive. Zalando's
PRAPI deploys Skipper as the
ingress for its Products component with CHLB:
product-idhashes to a ring position, the clockwise-nearest pod owns that product, so the hot subset of Zalando's 10M-product catalogue stays in that pod's local Caffeine cache. The post documents two upstream contributions that came out of this deployment: (1) fixed-100-position placement (skipper#1712) assigning each pod to 100 fixed ring locations to reduce scale-event cache invalidations to 1/N where N is the prior pod count; (2) bounded-load (skipper#1769) capping per-pod traffic at 2× the average and spilling excess clockwise to the next non-overloaded pod, so limited-edition Nike drops can't overload their ring owner. The Batch component uses P2C instead of CHLB because it has no per-item stickiness requirement — same ingress, two different load-balancing algorithms per endpoint. Canonical wiki instance of patterns/bounded-load-consistent-hashing. - sources/2026-04-08-zalando-rejecting-invalid-ingress-routes-at-apply-time
— Skipper as its own admission-time validator. Zalando
adds a Kubernetes
validating admission
webhook (
ingress-admitter.teapot.zalan.do) that reuses Skipper's own filter registry, predicate specifications, route parser, and backend parser to validateIngress/RouteGroupobjects atkubectl applytime (patterns/reuse-runtime-logic-on-admission-path). The webhook answers "would Skipper accept this route?" — anIngresswithzalando.org/skipper-predicate: NonExistingPredicate()is rejected at admission with the literal Skipper error message ("unknown_predicate: predicate 'NonExistingPredicate' not found") instead of silently writing a broken route to etcd that would later fail at request time. Scale framing: 250+ clusters, 15k+ ingresses, ~200k routes, 500k–2M RPS — at that scale even "1% invalid routes" is ~2,000 broken routes and is treated as real production risk. Rollout:-enable-advanced-validationfeature flag (concepts/feature-flag-rollback-for-validator) with theskipper_route_invalid{route_id, reason}metric as the per-tier gate (concepts/invalid-route-observability-metric); tier-by- tier enablement such that teams writing valid manifests saw no change — canonical invisible rollout. Shipped in open-source Skipper v0.24.18+. Blast-radius framing is specifically control-plane-on-writes, not data-plane-on-traffic: a bad webhook would freeze CI/CD fleet-wide without affecting live customer requests (concepts/control-plane-change-blast-radius). -
— Skipper as the embedding host for
OPA-as-a-library delivering
concepts/authorization-as-a-service across 15,000 Ingresses
/ 5,000 routegroups / up-to-2M-rps. Skipper exposes a new
filter,
opaAuthorizeRequest("<bundle-name>"); adding this annotation to an Ingress is the entire opt-in for a team. Structural implications: - The filter hosts a virtual OPA instance per referenced application. One Skipper process multiplexes many tenants; each has its own Rego bundle (named after the app ID), labels, decision log, and status-report stream. Route churn doesn't thrash instances — a grace period absorbs it before GC.
- Bundles are fetched from AWS S3, not from the control plane, via patterns/s3-as-policy-bundle-source-for-availability: a Styra DAS outage leaves enforcement running.
- Skipper's filter input schema aligns with the upstream OPA Envoy plugin — patterns/align-with-upstream-plugin-input-schema. Every Rego example / doc / training from the community is reusable unchanged.
- Telemetry goes to Lightstep via OTel as two span paths (policy decision + control-plane round-trips); decision IDs link back into Styra DAS for forensics.
- Bounded memory discipline across bundle size, request- body parsing, decision / status log buffers — patterns/bounded-telemetry-data-structures-for-policy-engine — because OPA's OOM fate = Skipper's OOM fate.
- On-demand bootstrap: OPA is only wired in when at least one app in the cluster opts in + the cluster is OPA-enabled; Skipper replica counts are scaled up to absorb the CPU cost.
- Differentiated from the vanilla OPA Envoy plugin on two axes (multi-tenant virtual instances in one process; OPA can serve HTTP responses standalone for SPA / legacy-IAM migration use cases). This is the canonical wiki instance of patterns/ingress-layer-authorization-offload.
Comparable systems¶
- nginx / Envoy / HAProxy — general-purpose L7 proxies. Envoy is the dominant K8s ingress in service-mesh contexts; Skipper's differentiator is the eskip DSL and the tight integration with Ingress annotations at Zalando scale.
- Pingora (Cloudflare) — Rust, internal to CF.
- Fly Proxy — Go, runs Fly.io's anycast edge.
Related¶
- systems/kubernetes · systems/external-dns · systems/kube-ingress-aws-controller · systems/fabric-gateway-zalando · systems/zalando-route-server · systems/zalando-prapi · companies/zalando
- concepts/control-plane-fan-out-to-kubernetes-api · concepts/etag-conditional-polling · concepts/last-known-good-routing-table · concepts/consistent-hashing · concepts/validating-admission-webhook · concepts/shift-left-validation · concepts/control-plane-change-blast-radius · concepts/feature-flag-rollback-for-validator · concepts/invalid-route-observability-metric
- patterns/control-plane-proxy-with-etag-cache · patterns/three-mode-rollout-off-shadow-exec · patterns/bounded-load-consistent-hashing · patterns/power-of-two-choices · patterns/reuse-runtime-logic-on-admission-path · patterns/invisible-rollout-via-default-on-validation · patterns/feature-flagged-dual-implementation
Seen in: load-shedding as HTTP-only capability¶
- Zalando Communication Platform load-shedding investigation (2024-04-22, ) — canonical named reference to Skipper's own load-shedding feature (skipper#2004) as the pre-existing HTTP-side primitive that does not apply to event-driven ingress. Zalando's customer-communications team surveyed existing load-shedding options in the company stack; Skipper's HTTP-layer mechanism was the incumbent, but the Communication Platform's ingress is Nakadi events (not HTTP requests), so Skipper's mechanism did not transfer. The team adopted the underlying AIMD idea from Skipper's design space but re-implemented it at the Nakadi → RabbitMQ boundary inside Stream Consumer (see patterns/aimd-ingestion-rate-control). Canonical wiki instance of Skipper-as-design-inspiration-not- runtime-dependency for ingress boundaries Skipper doesn't cover — complement to the many entries above where Skipper is the actual substrate.
Seen in: parallel-run per-endpoint cutover¶
- Zalando Returns-service extraction (2021-11-03,
)
— Skipper as the cutover lever for a
parallel-run migration. The
Returns team moved one
operation_idat a time from the legacy monolith to the new Returns microservice by editing the Skipper route for that endpoint — no redeploy of either application. Rollback was the inverse rule change. Canonical instance of gradual per-endpoint cutover with Skipper as the traffic-shaping primitive, complementing the admission-time / serving-time cases elsewhere on this page.