Skip to content

PLANETSCALE 2022-08-15 Tier 3

Read original ↗

PlanetScale — How to kill Sidekiq jobs in Ruby on Rails

Narrow how-to post from PlanetScale's application-tier Ruby-on-Rails backend (not the MySQL/Vitess data-plane product). Elom Gomez canonicalises a small operational primitive PlanetScale runs in production: a SidekiqMiddleware::SidekiqJobsFlipper client middleware that short-circuits any Sidekiq worker enqueue if a correspondingly-named Flipper feature flag is set, letting an operator disable a specific async job class in production from a Rails console without a deploy.

Summary

Operators regularly need to disable a misbehaving or errant Sidekiq job in production — the classical options are (a) flush the deploy queue for that job class or (b) push a hot-fix deploy that no-ops the job. Both require a deploy cycle and are destructive (a) or slow (b). PlanetScale's approach: use the already-deployed flipper gem to gate the enqueue step via a Sidekiq client middleware that inspects the worker class name, converts it via Rails .underscore conventions, and checks whether disable_<klass_underscored> is enabled in Flipper. If the flag is on, the middleware returns false and the job never reaches Redis.

To disable InvoiceJob in production, an operator opens a Rails console and runs Flipper.enable("disable_invoice_job"). The next enqueue attempt — and every subsequent one — short-circuits at the client middleware before the job is written to Redis. Re-enabling is Flipper.disable("disable_invoice_job").

The pattern is a killswitch specialised to async-job enqueue with the feature-flag system as the fast-propagation config channel and the Sidekiq middleware as the enforcement point.

Key takeaways

  1. Client-middleware interception is the enforcement point. Sidekiq's client middleware runs at enqueue time (when code calls InvoiceJob.perform_async(...)), not at worker-execution time. Returning false (or nil) from call prevents the job from being pushed to Redis at all. Verbatim comment in the code: "return false/nil to stop the job from going to redis". This gates the producer side — already-enqueued jobs still run on the worker side, but no new ones get added.
  2. Feature-flag name convention ties the two systems together without a registry. The middleware derives the expected flag name from the worker class: Flipper.enabled?("disable_#{klass.underscore.to_sym}"). InvoiceJobdisable_invoice_job. No catalog, no registration; the flag name is discoverable from the Rails class name. A naming-convention composition, not a configured binding.
  3. Rails console is the invocation surface. Flipper.enable("disable_invoice_job") run from a Rails console flips the flag instantly — Flipper's storage is typically a shared DB or Redis table that all dynos read, so propagation is measured in seconds. No deploy required.
  4. Orthogonal to the job queue's native admin UI. Sidekiq's admin UI can pause queues or kill in-flight jobs, but that's queue-level, not job-class-level. Flipper-middleware works at the class granularity — you can disable exactly InvoiceJob while every other job class in the same queue keeps running.
  5. Producer-side short-circuit, not consumer-side skip. The middleware prevents enqueue; it does not drain already-queued jobs from Redis. If the problem is "this job is destructive and I need to stop ALL instances NOW," the operator must additionally flush the queue — but this middleware prevents new jobs from piling up while that happens, which is the common operational need.

Systems

  • Sidekiq — Ruby background-job framework using Redis as the job store. Supports client middleware (runs at enqueue) and server middleware (runs at worker dequeue). PlanetScale installs the SidekiqJobsFlipper on the client chain.
  • Flipper — Ruby feature-flag library by @jnunemaker. Backed by a pluggable store (DB, Redis, in-memory, etc.); Flipper.enabled?(name) is the evaluation call.
  • Redis — Sidekiq's job substrate (not Flipper's, unless configured that way). Jobs live here as serialised Ruby hashes under per-queue list keys.
  • Ruby on Rails — the framework convention klass.underscore.to_sym turns InvoiceJob into :invoice_job, giving a stable flag-name mapping.

Concepts

  • Feature flag — canonical runtime switch; here used not to gate a user-facing code path but to gate an operational action (async-job enqueue).
  • Client-middleware interception — producer-side middleware as the enforcement point for cross-cutting operational gates. Runs in the caller's process at the moment of enqueue, not in the worker process at the moment of execution.
  • Deploy-less operational lever — a production control whose invocation is a config flip (flag toggle, parameter change) rather than a code change. Orthogonal to the deploy pipeline.

Patterns

  • Feature-flagged job enqueue rejectioncanonical new wiki pattern introduced by this ingest. Install a client middleware on the async-job framework that checks a per-job-class feature flag (derived from the class name via a naming convention) on every enqueue and returns falsy to short-circuit the enqueue if the flag is on.
  • Global feature killswitch — this post is a narrow application at the async-job altitude (contrast with Cloudflare's module-level killswitch at the request-hot-path altitude). Same fundamental shape: orthogonal fast-off lever driven by a fast-propagation config channel.

Operational numbers

None disclosed. No incident retrospective, no frequency-of-use statistics, no mention of which PlanetScale job classes have ever been disabled via this mechanism, no propagation-latency measurement for Flipper's store.

Caveats

  • Producer-side only. Does not drain in-flight or already- queued jobs. The operator still needs a separate action to clear Redis if the destructive-job problem is already running.
  • Per-class granularity. Cannot disable specific arguments to a job class (e.g. "kill InvoiceJob for account_id=42 only"). All-or-nothing per class.
  • No built-in audit trail. Flipper.enable(...) from a Rails console leaves no structured record beyond what Flipper's storage backend records (typically just current state, not a history log) and Rails production-console audit trails (if they exist).
  • Discovery depends on naming convention adherence. If a job class is renamed, the flag name changes silently — operator runbooks need to be updated.
  • Silent drop vs retry. Returning false makes the perform_async call look successful from the caller's perspective (it just returns false instead of the usual job-id). Callers that expect a job-id may need to handle this case; callers that assume fire-and-forget won't notice the drop.
  • 2022-era post. Sidekiq and Flipper APIs have evolved; the middleware shape is unchanged but default middleware- registration syntax may have shifted.
  • Application-tier detail, not PlanetScale product internals. This is how PlanetScale's Rails application operates internally — orthogonal to the MySQL/Vitess database product.

Source

Last updated · 378 distilled / 1,213 read