Skip to content

PLANETSCALE 2023-11-21 Tier 3

Read original ↗

PlanetScale — Webhook security: a hands-on guide

Summary

Mike Coutermarsh (PlanetScale application-tier engineer, originally 2023-11-21, re-fetched 2026-04-21) publishes the canonical wiki field manual for building a webhook-sending service securely. The post frames webhook-sender services as uniquely exposed to server-side request forgery (SSRF) because the entire premise of the surface is "the user provides a URL, the service dispatches a request to it." Worked threat model: "if a web server is running an internal metrics endpoint that responds to HTTP POST requests, an attacker could direct the webhook service to send a request to the service. If the webhook service displays the response in the UI, the attacker has now gained access to your internal metrics data."

The post canonicalises a two-tier defense-in-depth (patterns/defense-in-depth-webhook-abuse-mitigation) architecture against two orthogonal abuse axes:

  1. SSRF — mitigated by URL validation at submission time + an isolated egress-proxy tier (Envoy) that re-enforces rules at send time because DNS can be changed between check and send.

  2. Abuse amplification (high-volume traffic generation) — mitigated by (a) API rate limiting, (b) Sidekiq unique-job enqueue dedup, (c) isolated-worker infrastructure capping blast radius, (d) strict per- webhook send timeouts, (e) per-database webhook count cap (5).

Load-bearing architectural inversion: "No matter how rigorous your URL validations are, you cannot fully trust any URL provided by a user. Because of this, it's critical to isolate and limit where the webhooks service can send HTTP requests." URL validation is necessary but not sufficient — the real boundary is the egress-proxy tier.

Key takeaways

  1. SSRF is the primary threat for webhook-sender services. The user-supplied URL is the attack surface: an attacker targets a victim service's internal endpoints (metrics, admin consoles, cloud-metadata services at 169.254.169.254, etc.) via the webhook sender. The response either leaks data back to the attacker (if surfaced in UI/logs) or triggers a side effect via POST. (Source: sources/2026-04-21-planetscale-webhook-security-a-hands-on-guide)

  2. URL validation is four orthogonal checks, not one. PlanetScale validates (a) HTTPS required; (b) private / loopback IPs blocked via Ruby's IPAddr .private? / .loopback?; (c) PlanetScale's own public domains blocklisted (prevents cross-product SSRF against PlanetScale's other services); (d) post-DNS-resolution re-check — Resolv.getaddresses(host) → block if any returned IP is private/loopback. The fourth check hardens against authoritative-records that point user-supplied- hostname.attacker.com at 10.0.0.0/8 at submission- resolve time. Canonicalised as concepts/webhook-url-validation. (Source: sources/2026-04-21-planetscale-webhook-security-a-hands-on-guide)

  3. DNS can be changed between check and send — canonical DNS rebinding axiom verbatim: "Remember, the user can always update the host's DNS after this check has passed. This alone is not enough to protect from SSRFs." The check-time IP and the send-time IP are not guaranteed to be the same host. This is the structural reason URL validation alone is insufficient and the egress-proxy tier is load-bearing. (Source: sources/2026-04-21-planetscale-webhook-security-a-hands-on-guide)

  4. Isolated-egress-proxy tier (Envoy) is the real boundary. PlanetScale deploys webhooks as a dedicated Kubernetes service that sends every HTTP request via an Envoy proxy which only allows HTTP requests outside of [PlanetScale's] network. Two rules at the proxy tier: (1) block any connections to internal/private IPs; (2) limit traffic to HTTPS ports. These mirror the URL-validation rules but execute at send time, after DNS has been resolved for the actual request — closing the check-to-send gap. Canonicalised as patterns/isolated-egress-proxy-for-user-urls. (Source: sources/2026-04-21-planetscale-webhook-security-a-hands-on-guide)

  5. Abuse-amplification defence stacks five layers. (a) Global API rate limit on all endpoints; (b) stricter per-endpoint limit on the test endpoint: 1 request per 20 seconds; (c) Sidekiq uniqueness per webhook-queue entry deduping high-frequency duplicate webhooks to a single send; (d) isolated machines for the webhook queue so abuse can be paused/disabled without touching other PlanetScale services — canonical application of the blast-radius-capping discipline at the queue-worker tier; (e) strict send timeouts so resources-tied-up-waiting attack vectors ("queueing many webhooks that resolve very slowly") can't exhaust workers. (Source: sources/2026-04-21-planetscale-webhook-security-a-hands-on-guide)

  6. Per-database webhook quota is a conservative admission-control gate. PlanetScale caps 5 webhooks per database as initial limit. Canonical framing verbatim: "Adding more later is always easier than taking it away." Opt-out-expensive default posture — same design discipline as patterns/instant-deploy-opt-in (safe-default + explicit opt-in for looser policy). (Source: sources/2026-04-21-planetscale-webhook-security-a-hands-on-guide)

  7. Validation-in-both-places is the composition. URL-validation rules and egress-proxy rules are near-identical on purpose: "It has similar rules as the URL validations above, but are executed when the webhook is being sent." Two-phase enforcement — submit-time feedback for UX (immediate invalid-URL error) + send-time enforcement for correctness (after DNS resolution, against the actual send target) — is the canonical shape of defense-in-depth for user-supplied-URL surfaces. (Source: sources/2026-04-21-planetscale-webhook-security-a-hands-on-guide)

Systems extracted

  • systems/envoy — the load-bearing egress-proxy tier; first canonical wiki disclosure of Envoy-as-egress-SSRF- guard role (prior wiki disclosures canonicalise Envoy as sidecar-mesh data plane / ingress gateway / JWT-validation point — this post adds the outbound-egress-filter role).
  • systems/sidekiq — Ruby background-job framework hosting the webhook send queue; Sidekiq's unique_for uniqueness check is reused here as the webhook-dedup layer.
  • systems/kubernetes — PlanetScale's application platform; webhook sender runs as an isolated K8s service separate from other PlanetScale services, so abuse of the webhook tier cannot impact other PlanetScale product surface.
  • systems/ruby-on-rails + Ruby IPAddr + Resolv.getaddresses — the URL-validation substrate at the application tier.
  • systems/planetscale — host product for the webhooks feature.

Concepts extracted

Patterns extracted

  • patterns/isolated-egress-proxy-for-user-urls — new canonical pattern: deploy user-URL-touching work as an isolated service whose egress goes through a dedicated proxy (Envoy here) that re-enforces private-IP-block + HTTPS-port-only rules at send time. Closes the DNS-rebinding check-to-send gap that URL-validation- only architectures leave open.
  • patterns/defense-in-depth-webhook-abuse-mitigation — new canonical pattern: compose five orthogonal abuse- amplification defences (global API rate limit, per- endpoint tighter limit, queue dedup, isolated infrastructure, strict timeouts) plus a conservative quota (5-per-database initial limit) rather than rely on any single layer. Same structural shape as patterns/pluggable-durability-rules but on the abuse-mitigation axis.

Operational numbers

  • 1 request per 20 seconds rate limit on the webhook test endpoint (tightest per-endpoint limit named in the post).
  • 5 webhooks per database — initial quota.
  • "Short timeout" on webhook requests — numerical value not disclosed in post.
  • "Single unique webhook" semantics: duplicate enqueues within the uniqueness window collapse to one send via Sidekiq's unique_for: option.

Caveats

  • Exact timeout duration for webhook sends is not disclosed.
  • Envoy configuration (listener + cluster + filter-chain YAML) is not shown — reader must cross-reference general Envoy docs for concrete allow-list / outlier-detection / retry config.
  • Post frames DNS-rebinding risk but does not characterise the specific attack (169.254.169.254 cloud-metadata service, AWS IMDS, short-TTL DNS flipping). Reader must bring SSRF-field-knowledge.
  • Sidekiq uniqueness is Sidekiq-Enterprise-only (unique_for: / unique_until: options) — not disclosed in post; cross-referenced via concepts/sidekiq-unique-jobs.
  • Mention of domain-blocklisting PlanetScale's own services is a specific hygiene check often omitted in public SSRF guidance; the post doesn't enumerate which domains.
  • No coverage of signed-webhook-payload integrity (HMAC signature with per-endpoint secret), nor of replay-attack protection. Those are receiver-side concerns PlanetScale surfaces in separate documentation; this post is scoped to sender-side abuse + SSRF prevention.
  • No production numbers on attacker activity (attempts blocked, abuse incidents observed) — practitioner field manual, not incident retrospective.
  • 2023-11-21 publication pre-dates PlanetScale Metal (2025-03) and PlanetScale Postgres (2025-07) — the architecture described is Vitess/MySQL-era application-tier.
  • The post's Ruby code sample for loopback-check has a typo (host_up vs host_ip) — minor and unrelated to architecture.

Cross-source continuity

  • Companion to sources/2026-04-21-planetscale-how-to-kill-sidekiq-jobs-in-ruby-on-rails (Elom Gomez 2022-08-15) — that post canonicalises PlanetScale's application-tier Sidekiq kill-switch middleware via Flipper feature flags. Mike Coutermarsh's webhook-security post is a sibling disclosure on the same Rails backend: both use Sidekiq as the async-job substrate, both canonicalise operational-lever primitives on top of Sidekiq, both rely on isolation discipline to bound blast radius. Together they bracket PlanetScale's 2022–2023 Rails application-tier operational posture.
  • Companion to sources/2026-04-21-planetscale-how-we-made-planetscales-background-jobs-self-healing (Mike Coutermarsh 2022-02-17) — same author, same Rails application tier, same Sidekiq substrate. The 2022 post canonicalises paired-scheduler-reconciler + Sidekiq Enterprise uniqueness for self-healing async jobs; this 2023 post reuses the uniqueness primitive as the webhook-dedup layer — a different failure-mode application of the same framework mechanism. Coutermarsh's fourth wiki ingest (after 2022-01-18 Rails-CI + 2022-02-17 self-healing-jobs + 2024-04-04 schema-change workflow).
  • Complements systems/envoy's existing Seen-in entries — Databricks uses Envoy as ingress gateway, Dropbox uses Envoy-via-EDS for PID-driven weights, AWS App Mesh / ECS Service Connect use Envoy as sidecar, Pinterest uses Envoy for JWT-validation. This post adds the egress-SSRF-guard role — a fifth canonical Envoy-role disclosure on the wiki, first one at the outbound-filter altitude.
  • Complements concepts/blast-radius — the existing blast-radius canon covers AWS account / region / AZ / tenant / shard boundaries; this post adds the isolated-service-per-failure-mode boundary at the Kubernetes-service tier. PlanetScale runs webhooks as an isolated K8s service so webhook abuse cannot impact other PlanetScale services — a deployment-altitude application of the same blast-radius-capping discipline canonicalised on prior PlanetScale ingests (Englander principles-of- extreme-fault-tolerance + Morrison II three-surprising- benefits-of-sharding).
  • No existing-claim contradictions — strictly additive. Extends the wiki's SSRF / webhook-security / egress-filtering coverage from zero canonical pages (prior coverage: concepts/egress-sni-filtering at the TLS-ClientHello altitude) to a full field manual at the application-tier altitude.

Source

Last updated · 470 distilled / 1,213 read