Skip to content

PATTERN Cited by 1 source

Batched background deletion with requeue

A maintenance-cron Sidekiq job that deletes up to limit aged rows per invocation via delete_all, then re-enqueues itself if the deletion count equalled the limit (implying more work remains). Guarded by Sidekiq Enterprise unique_for to prevent concurrent instances producing lock contention or deadlocks. The canonical shape for continuous low-impact deletion of large aged-data backlogs in a Rails application.

The pattern

class DeleteOldDataJob < BaseJob
  sidekiq_options unique_for: 1.hour,
                  unique_until: :start,
                  queue: :background,
                  retry: false

  def perform(limit: 500)
    deleted = Model.where("created_at < ?", 3.months.ago)
                    .limit(limit)
                    .delete_all

    # If more records to delete, requeue itself
    self.class.perform_async if deleted == limit
  end
end

Paired with a cron trigger (sidekiq-cron or equivalent) that fires the job once per hour as the entry point.

The four moves

  1. delete_all over destroy_all — callback-skipping primitive issues a single DELETE ... WHERE ... LIMIT ? statement. Orders of magnitude faster; appropriate for cleanup jobs where callbacks are overhead.
  2. LIMIT N for bounded transaction size — caps the work per invocation. At 500 rows the DELETE transaction completes in milliseconds, holds row locks briefly, produces a small binlog event, doesn't spike replication lag. The canonical value 500 is a Rails-MySQL community convention; adjust per workload.
  3. Self-requeue only when deleted == limit — signals "we hit the batch cap, probably more work to do". When deleted < limit, the query found fewer matches than the cap — either we're done or we're at the asymptote. Natural termination without a separate coordination mechanism.
  4. unique_for: 1.hour + unique_until: :start — prevents the cron's next tick, the self-requeue, and any retry from overlapping. Critical for deadlock prevention (see below).

Why unique_for is load-bearing

From sources/2026-04-21-planetscale-ruby-on-rails-3-tips-for-deleting-data-at-scale:

"This example is making use of Sidekiq's unique jobs. This protects us from having several of these jobs running concurrently (which could result in deadlocks). If you are using a job system without uniqueness, an alternative is setting up a queue with a concurrency of 1 and running the cleanup job there."

Without uniqueness, three independent paths can produce concurrent DeleteOldDataJob instances:

  • Cron tick fires hourly regardless.
  • Self-requeue fires when the previous instance hits the batch cap.
  • Sidekiq retry (if retry: false is not set) fires on any exception.

Each concurrent instance issues DELETE ... LIMIT 500 against the same table. MySQL's optimizer may choose different scan ranges per instance; the instances acquire row + gap locks on overlapping ranges; lock-wait graphs form cycles; InnoDB detects deadlock and rolls back one. The rolled-back instance retries (or re-fires on the next cron tick), producing cascading deadlocks.

unique_for: 1.hour rejects the concurrent enqueue at the Sidekiq client middleware — only one instance runs at a time. The fallback "queue with concurrency of 1" achieves the same property at the queue-dispatch layer.

unique_until: :start semantics

The uniqueness lock releases when the job begins executing, not when it completes. This means:

  • While instance A is running, any re-enqueue attempt is rejected.
  • Once A begins its perform, the lock releases.
  • The self-requeue at A's end (via self.class. perform_async) is accepted, because A's lock is gone.
  • Instance B enqueues and runs after A's worker thread is available.

Contrast with unique_until: :success, which would hold the lock through execution and prevent the self-requeue from succeeding. The :start semantic is deliberately chosen so self-requeue works.

Why retry: false

Sidekiq defaults to 25 retries with exponential backoff. For this job, that's wrong:

  • The cron will re-fire on the next tick regardless.
  • A retry during the same hour could collide with the next cron tick's instance.
  • The job is idempotent — repeated invocations just delete more aged rows — so retries add no correctness value.

retry: false keeps the failure surface simple: if the job fails, the next cron tick picks up the work.

The termination condition

self.class.perform_async if deleted == limit

The predicate deleted == limit is a heuristic: "we asked for 500, got exactly 500 → probably more to find". In practice, this over-estimates in one edge case: exactly 500 rows matched the predicate. The next invocation finds 0, does no work, terminates.

Exact termination would require a separate exists? check after deletion, which adds a query per invocation for no practical benefit. The heuristic is cheap and self-correcting.

Batch-size trade-off

  • Smaller (e.g. 100): each transaction commits faster, less lock hold time, less replication-lag impact — but more Sidekiq round-trips to drain a backlog.
  • Larger (e.g. 5,000): fewer round-trips, more work per transaction — but longer lock holds, bigger binlog events, potential to trip DELETE-LIMIT asymptote if the predicate is unindexed.

500 is the canonical PlanetScale choice, matching the 1,000 for enqueue in the bulk-enqueue pattern. The difference (500 vs 1,000) reflects that deletion holds locks while enqueueing doesn't — deletion's transaction cost per row is structurally higher.

Required index

The WHERE predicate must be indexed. The canonical example .where("created_at < ?", 3.months.ago) assumes a created_at index. Without one, the query becomes a full table scan, and the DELETE-LIMIT asymptote kicks in: early batches fast, later batches approach table-scan cost as matching rows deplete.

The pattern only works when the predicate can be served from an index. Before shipping a DeleteOldDataJob, verify the index exists (EXPLAIN on the inner SELECT).

When to use it

  • Large aged-data cleanup. Event logs, audit trails, session records, soft-deleted rows awaiting physical reap — any table with millions of rows where the oldest fraction can be pruned continuously.
  • Predicate-indexed deletion. The WHERE clause must resolve to an index range for the pattern to stay cheap.
  • Callback-free maintenance. The delete_all primitive skips ActiveRecord callbacks; the pattern is inappropriate if the deletion needs after_destroy side effects (audit logging, counter-cache updates, dependent cascades). Use paired scheduler-reconciler with destroy_all instead.

When not to use it

  • Cascade-heavy parents. Deletion cascades across associations can't be expressed in delete_all (which skips cascades). Use dependent: :destroy_async or a per-parent cleanup job.
  • Strict all-or-nothing atomicity. The pattern drains the backlog over many transactions. If a regulatory requirement demands "all aged records deleted atomically", this pattern doesn't provide that guarantee.
  • Small data volumes. Under a few thousand rows, a single non-batched delete_all is simpler. Reserve the pattern for scales where transaction size matters.

Composition with other patterns

  • Paired scheduler-reconciler — architectural cousin. Both use cron-triggered Sidekiq jobs to reconcile DB state. The reconciler enqueues work jobs for individual records; this pattern executes the deletion directly in the scheduler-tick body. When the work per record is small (a row delete), direct execution avoids the enqueue-dispatch overhead.
  • Jittered scheduling — composable when multiple DeleteOldDataJob classes are scheduled together (one per table) and you want to spread their cron triggers. Not usually needed within a single deletion job (the unique_for guard handles concurrency).
  • Sidekiq unique jobs — the deadlock-prevention substrate. This pattern is a second canonical use case for unique_for beyond duplicate-enqueue prevention in the paired-scheduler-reconciler pattern.

Testing approach

The source post includes a test template:

class DeleteOldDataJobTest < ActiveJob::TestCase
  test "deletes data over 3 months old" do
    expired = create(:data, minute: 3.months.ago - 1.hour)
    retained = create(:data, minute: 3.months.ago + 1.hour)
    DeleteOldDataJob.new.perform
    assert Data.where(id: expired.id).empty?
    assert Data.where(id: retained.id).exists?
  end

  test "requeues if more to delete" do
    create(:data, minute: 3.months.ago - 1.hour)
    create(:data, minute: 3.months.ago - 1.hour)
    assert_enqueued_sidekiq_jobs(1, only: DeleteOldDataJob) do
      DeleteOldDataJob.new.perform(limit: 1)
    end
  end
end

Two cases: predicate correctness (expired deleted, retained preserved) + requeue correctness (hit the limit → enqueue follow-up).

Seen in

Last updated · 470 distilled / 1,213 read