Skip to content

PLANETSCALE 2022-08-01 Tier 3

Read original ↗

PlanetScale — Ruby on Rails: 3 tips for deleting data at scale

Narrow application-tier how-to from PlanetScale's Rails backend engineer Ruby on Rails voice. Mike Coutermarsh — same author as the 2022-02-17 self-healing-background-jobs post + 2022-01-18 Rails-CI post + 2024-04-04 schema-change workflow post — canonicalises three load-bearing operational moves for deleting data in a Rails application at scale: (1) dependent: :destroy_async instead of :destroy or DB-level ON DELETE CASCADE for parent-child cascade deletion; (2) delete vs destroy + delete_all vs destroy_all — callback-skipping primitives for bulk ops where callback side-effects are undesirable; (3) batched Sidekiq job with self-requeue for continuous low-impact deletion of aged data. Each move addresses a distinct failure mode of naive SQL-level deletion on a high-traffic MySQL cluster: row-lock contention, request-path timeouts, replication lag amplification, and cascading fan-out to associated tables.

Summary

Deletion is deceptively dangerous in Rails at scale. The default dependent: :destroy association option looks harmless — delete the parent, ActiveRecord cascades through children — but under load it produces "thousands of deletions within a single transaction" that compound into request timeouts, InnoDB lock contention, and replication lag spikes. The post canonicalises three alternative primitives shipped with modern Rails + Sidekiq that make deletion safe at scale:

  1. dependent: :destroy_async (Rails 6.1+) — parent deletion triggers an Sidekiq background job that performs the child-row deletion outside the request transaction. Keeps the user- visible request fast; keeps the delete transaction small.
  2. destroy vs delete (and destroy_all vs delete_all) — destroy invokes ActiveRecord callbacks (validations, before_destroy, after_destroy, dependent: cascades); delete skips callbacks and issues raw SQL. Canonical use: cron- scheduled delete_all for cleanup that doesn't need callbacks; destroy in user-facing flows that do.
  3. Batched-Sidekiq-with-requeue pattern — a cron- scheduled job that deletes up to limit rows older than a threshold using delete_all, then re-enqueues itself if the delete count equalled the limit (implying more work remains). Paired with Sidekiq Enterprise unique_for to prevent concurrent instances causing deadlocks.

The post also surfaces PlanetScale's product-level safety features as the catch-all for mistakes: instant schema-change revert (concepts/schema-revert) and the warn-on-drop recently-queried dashboard gate. Deletion-at-scale discipline isn't a substitute for operational safety nets, and the post frames the three techniques alongside those nets.

Key takeaways

  1. dependent: :destroy + large child fan-out is the canonical latency bomb. "As a Rails application grows, it can be very easy to unintentionally delete a parent record and trigger a cascade of thousands of deletions. Having all of this happen within a request can lead to timeouts and added strain on your database." The trap shape: a small UI action (user deletes an Author) produces an O(children) SQL transaction that locks rows across multiple tables, blocking the request thread, the DB, and downstream replicas until it completes. The canonicalised concept: concepts/dependent-destroy-cascade-risk.
  2. dependent: :destroy_async decouples cascade from request path. Rails 6.1+ ships dependent: :destroy_async as a first-class association option: "It works similarly to dependent: :destroy, except that it will run the deletion via a background job rather than happening in request." The parent deletion returns immediately; a Sidekiq job (ActiveJob-backed) picks up the children-deletion work. Canonicalised as concepts/destroy-async.
  3. Prefer destroy_async over DB-level foreign-key ON DELETE CASCADE at scale. Post explicitly frames this: "We recommend replacing any usage of foreign key constraints with :destroy_async for safer deletes at scale." FK cascades execute in the same DB transaction as the parent delete — the same unbounded-work problem as synchronous dependent: :destroy, at a different layer. Both layers have the same failure mode; only destroy_async puts the cascade work on a separate execution path. Canonical pattern: patterns/foreign-key-cascade-vs-dependent-destroy-async.
  4. destroy_async has a silent-error failure mode. Because child deletion happens asynchronously, a validation failure or before_destroy callback abort on a child happens after the user sees the parent deletion succeed. "the user will be unaware of any errors and the job will end up in your error queue. If any child records have validations on delete, we recommend running them from the parent model." Mitigation: run critical child-side validations at the parent altitude so they fail the request before the async cascade fires.
  5. delete vs destroy is a callback-invocation choice. "destroy — Deletes the record while also triggering the models callbacks. delete — Skips the callbacks and deletes the record directly from the database." The delete_all / destroy_all pair applies the same split to bulk ops. Canonical rule: destroy/destroy_all in user-facing flows where callbacks matter; delete/delete_all in maintenance jobs where callbacks are overhead. A cron cleanup job that destroys 10,000 aged records via destroy_all invokes 10,000 callback chains — usually wasted work. Canonicalised as concepts/destroy-vs-delete-activerecord.
  6. Batched deletion with self-requeue is the maintenance-cron shape. The canonical code block:
class DeleteOldDataJob < BaseJob
  sidekiq_options unique_for: 1.hour,
                  unique_until: :start,
                  queue: :background,
                  retry: false

  def perform(limit: 500)
    deleted = Model.where("created_at < ?", 3.months.ago)
                    .limit(limit).delete_all
    self.class.perform_async if deleted == limit
  end
end

Structural moves: (a) delete_all (callback-skip) for throughput, (b) LIMIT 500 to cap transaction size, (c) self-requeue only when deleted == limit (implying more work remains), (d) Sidekiq Enterprise unique_for: 1.hour + unique_until: :start to prevent concurrent instances producing deadlocks, (e) retry: false because the cron will re-fire on its own schedule. Canonicalised as patterns/batched-background-deletion-with-requeue. 7. unique_for is load-bearing for batched deletion. "This example is making use of Sidekiq's unique jobs. This protects us from having several of these jobs running concurrently (which could result in deadlocks). If you are using a job system without uniqueness, an alternative is setting up a queue with a concurrency of 1 and running the cleanup job there." Without uniqueness, the self.class.perform_async re-enqueue could overlap with the cron's next tick + a retried instance, producing multiple concurrent DELETE ... LIMIT 500 transactions on the same table. Each acquires row + gap locks on overlapping ranges → deadlock. Cross-reference to concepts/sidekiq-unique-jobs (canonicalised via the sibling 2022-02-17 self-healing-jobs post) + concepts/gap-locking for the InnoDB-specific mechanism. 8. PlanetScale-specific safety nets named at the close. The post ends by naming two product-level backstops for when deletion goes wrong: "instantly revert a bad schema change without losing any data" (concepts/schema-revert) and "throw a warning in the dashboard if you're dropping a table that was recently queried" (patterns/warn-on-drop-recently-queried). Framing: deletion discipline at the application altitude is necessary but not sufficient; platform-level safeguards exist for the cases where discipline fails.

Systems

  • Ruby on Rails — the framework substrate. All three tips are Rails-API shapes: dependent: option on association declarations, destroy/delete/destroy_all/delete_all on ActiveRecord::Relation, ActiveJob-backed BaseJob superclass.
  • Sidekiq — the async-job substrate that backs dependent: :destroy_async (via ActiveJob's adapter) and the batched DeleteOldDataJob. Sidekiq Enterprise's unique_for + unique_until options are named as load-bearing.
  • MySQL — the database engine. The post's framing (locking, replication lag, large transactions) assumes MySQL / InnoDB semantics; most applies equivalently to Postgres.
  • PlanetScale — the managed database platform. Named in the closing paragraph as provider of schema-revert + warn-on-drop safety nets.
  • ActiveJob — Rails' abstraction over async-job frameworks. dependent: :destroy_async emits ActiveJob jobs that dispatch to whatever adapter the app configures (Sidekiq, Delayed::Job, Que, etc.). The post assumes Sidekiq.

Concepts

  • Dependent- destroy cascade riskcanonical new wiki concept. ActiveRecord's dependent: :destroy option as the default parent-child cascade mechanism, and the unbounded-work failure mode it produces when child-row count grows with application age.
  • Destroy-asynccanonical new wiki concept. Rails 6.1's background-job variant of dependent: :destroy that decouples cascade work from the request transaction via ActiveJob.
  • Destroy vs delete in ActiveRecordcanonical new wiki concept. The callback-invoking vs callback-skipping split at the single-record and collection altitudes (destroy/delete, destroy_all/delete_all).
  • Sidekiq unique jobs — already canonical via the 2022-02-17 self-healing-jobs post; this post extends its Seen-in with a second canonical use case beyond duplicate-enqueue prevention: deadlock prevention in batched maintenance jobs by ensuring at most one instance of a delete- batch job holds locks at a time.
  • Replication lag — already canonical; this post frames excessive large transactions as a contributor: a cascade-delete transaction that takes 10s on the primary produces 10s+ of replication lag on every replica as the binlog event replays.
  • Row-level lock contention — already canonical; this post names large unbounded cascade-delete transactions as a contention source (they hold X-locks on every affected row for the transaction duration, blocking concurrent reads via FOR UPDATE/FOR SHARE and writes).
  • Soft-delete vs hard-delete — already canonical; this post is implicitly hard-delete territory (all three tips address physical row removal, not tombstone-flag updates). The post doesn't name the alternative but the techniques apply to hard-delete workflows.

Patterns

  • Batched background deletion with requeuecanonical new wiki pattern. Self-requeuing Sidekiq job that deletes up to limit rows per invocation, re-enqueues itself if deletion count == limit, guarded by unique_for to prevent concurrent instances causing deadlocks. The canonical shape for continuous low- impact deletion of aged data on large tables.
  • Foreign-key cascade vs dependent destroy-asynccanonical new wiki pattern. The trade-off between pushing cascade deletion into the database layer (ON DELETE CASCADE foreign key constraint) and keeping it in the application layer via background jobs (dependent: :destroy_async). Framing: DB-layer cascade is atomic + fast-on-small-cascades but produces the same unbounded-work problem as synchronous dependent: :destroy once child-row counts grow; async cascade is always bounded but loses atomicity.
  • Paired scheduler–reconciler — already canonical via sibling post; this post's DeleteOldDataJob is a deletion-specialisation of the scheduler pattern. The scheduler tick finds aged rows; the worker deletes a batch and re-enqueues. Same shape, same framework, same substrate.
  • Jittered job scheduling — already canonical; potentially applicable to DeleteOldDataJob when the cron cadence is faster than per-batch deletion time, to avoid overlapping deletion bursts. The post doesn't name it but the pair composes naturally.

Operational numbers

  • LIMIT 500 — the canonical batch size in the DeleteOldDataJob code block. No benchmark justification; conventional value in the Rails-plus- MySQL community as a transaction-size sweet spot (small enough to avoid long lock holds; large enough to amortise per-batch overhead).
  • unique_for: 1.hour — the uniqueness window on the DeleteOldDataJob. Chosen such that if the job runs for less than an hour (common case), a second enqueue is rejected; if it runs for longer, the window expires and a new instance can start. unique_until: :start releases the lock when the current instance begins execution, letting a queued re-enqueue wait until then.
  • 3.months.ago — the canonical age threshold in the code example. Arbitrary; chosen to illustrate the shape of the WHERE clause rather than a specific retention policy.
  • No quantified before/after latency or throughput. The post is a pattern catalogue, not a retrospective. No PlanetScale production numbers on cascade-delete latency distributions, how often dependent: :destroy bit them, adoption rate of destroy_async, or observed deadlock frequency on the batched-delete pattern. Pedagogical voice throughout.

Caveats

  • Application-tier Rails-specific post, not PlanetScale data-plane internals. This is operational advice for the Rails application substrate; orthogonal to how PlanetScale's MySQL/Vitess cluster handles deletion internally. Sibling of the 2022-02-17 self-healing-jobs post and 2022-08-15 Sidekiq-kill- switch post in that regard — all three are PlanetScale-Rails-backend-engineer voice narrowing to specific operational primitives.
  • destroy_async's silent-failure caveat under-emphasised. The post names the issue ("job will end up in your error queue") and the mitigation (run validations at parent altitude) but doesn't walk through the observability implications: how you detect that child-cascade deletion is failing silently, how you distinguish it from other Sidekiq failures, how you alert on it. In practice this is where destroy_async adoption bites teams.
  • No discussion of dependent: :restrict_with_error or :restrict_with_exception. The association options restrict_with_error + restrict_with_exception reject parent deletion when children exist, pushing the cleanup responsibility to the application. Valid alternative to cascade for data-integrity-first workflows; elided by the post's three-tip focus.
  • delete_all skips callbacks, which is the point, but also skips counter_cache updates, touch propagation to parent records, destroyed? flag setting, and any custom before_destroy/after_destroy hooks. The post lists the mechanism (skips callbacks) but not the side-effect catalogue.
  • No mention of DELETE ... WHERE ... LIMIT N on unindexed predicates — the DELETE-LIMIT asymptote canonicalised via the 2023-04-20 Hazen Insights post. The example code Model.where( "created_at < ?", 3.months.ago).limit(limit). delete_all relies on a created_at index being present; without one, the job itself becomes the antipattern it's trying to prevent. Pre-requisite for the pattern to work; elided by the post.
  • Foreign-key-constraint framing oversimplified. The "replace foreign keys with :destroy_async" advice is an application-level recommendation, not a data-integrity one. FK constraints also prevent orphan rows (child row referencing deleted parent) — a consistency guarantee :destroy_async doesn't provide during the async window (parent is deleted; children exist until the async job runs). For most PlanetScale workloads this window is short and observability-manageable, but applications that read children without filtering by parent existence can see stale references.
  • 2022-era post (originally 2022-08-01, re-fetched 2026-04-21). Rails APIs have evolved since: dependent: :destroy_async has been extended (per-batch batch_size option, explicit queue selection); ActiveRecord's batching APIs continue to mature (in_batches, find_each, delete_by). The fundamental three-tip framing remains valid; specific API details may have drifted.
  • No Postgres-specific variations. Postgres has its own batched-delete ergonomics (CTE-based delete, LIMIT inside DELETE requires subquery wrapping); the post is MySQL-flavoured and doesn't discuss the Postgres differences. Applications that run both engines need to maintain two variants.
  • Cron orchestration deferred. The post assumes a cron system ("scheduled by a cron to run once per hour") wires up the initial DeleteOldDataJob invocation. Which cron system, what cadence, how to prevent cron-lag from compounding into eternal queue backlog — all elided.

Source

Last updated · 470 distilled / 1,213 read