PATTERN

Batched background deletion with requeue¶

A maintenance-cron Sidekiq job that deletes up to limit aged rows per invocation via delete_all, then re-enqueues itself if the deletion count equalled the limit (implying more work remains). Guarded by Sidekiq Enterprise unique_for to prevent concurrent instances producing lock contention or deadlocks. The canonical shape for continuous low-impact deletion of large aged-data backlogs in a Rails application.

The pattern¶

class DeleteOldDataJob < BaseJob
  sidekiq_options unique_for: 1.hour,
                  unique_until: :start,
                  queue: :background,
                  retry: false

  def perform(limit: 500)
    deleted = Model.where("created_at < ?", 3.months.ago)
                    .limit(limit)
                    .delete_all

    # If more records to delete, requeue itself
    self.class.perform_async if deleted == limit
  end
end

Paired with a cron trigger (sidekiq-cron or equivalent) that fires the job once per hour as the entry point.

The four moves¶

delete_all over destroy_all — callback-skipping primitive issues a single DELETE ... WHERE ... LIMIT ? statement. Orders of magnitude faster; appropriate for cleanup jobs where callbacks are overhead.
LIMIT N for bounded transaction size — caps the work per invocation. At 500 rows the DELETE transaction completes in milliseconds, holds row locks briefly, produces a small binlog event, doesn't spike replication lag. The canonical value 500 is a Rails-MySQL community convention; adjust per workload.
Self-requeue only when deleted == limit — signals "we hit the batch cap, probably more work to do". When deleted < limit, the query found fewer matches than the cap — either we're done or we're at the asymptote. Natural termination without a separate coordination mechanism.
unique_for: 1.hour + unique_until: :start — prevents the cron's next tick, the self-requeue, and any retry from overlapping. Critical for deadlock prevention (see below).

Why `unique_for` is load-bearing¶

From :

"This example is making use of Sidekiq's unique jobs. This protects us from having several of these jobs running concurrently (which could result in deadlocks). If you are using a job system without uniqueness, an alternative is setting up a queue with a concurrency of 1 and running the cleanup job there."

Without uniqueness, three independent paths can produce concurrent DeleteOldDataJob instances:

Cron tick fires hourly regardless.
Self-requeue fires when the previous instance hits the batch cap.
Sidekiq retry (if retry: false is not set) fires on any exception.

Each concurrent instance issues DELETE ... LIMIT 500 against the same table. MySQL's optimizer may choose different scan ranges per instance; the instances acquire row + gap locks on overlapping ranges; lock-wait graphs form cycles; InnoDB detects deadlock and rolls back one. The rolled-back instance retries (or re-fires on the next cron tick), producing cascading deadlocks.

unique_for: 1.hour rejects the concurrent enqueue at the Sidekiq client middleware — only one instance runs at a time. The fallback "queue with concurrency of 1" achieves the same property at the queue-dispatch layer.

`unique_until: :start` semantics¶

The uniqueness lock releases when the job begins executing, not when it completes. This means:

While instance A is running, any re-enqueue attempt is rejected.
Once A begins its perform, the lock releases.
The self-requeue at A's end (via self.class. perform_async) is accepted, because A's lock is gone.
Instance B enqueues and runs after A's worker thread is available.

Contrast with unique_until: :success, which would hold the lock through execution and prevent the self-requeue from succeeding. The :start semantic is deliberately chosen so self-requeue works.

Why `retry: false`¶

Sidekiq defaults to 25 retries with exponential backoff. For this job, that's wrong:

The cron will re-fire on the next tick regardless.
A retry during the same hour could collide with the next cron tick's instance.
The job is idempotent — repeated invocations just delete more aged rows — so retries add no correctness value.

retry: false keeps the failure surface simple: if the job fails, the next cron tick picks up the work.

The termination condition¶

self.class.perform_async if deleted == limit

The predicate deleted == limit is a heuristic: "we asked for 500, got exactly 500 → probably more to find". In practice, this over-estimates in one edge case: exactly 500 rows matched the predicate. The next invocation finds 0, does no work, terminates.

Exact termination would require a separate exists? check after deletion, which adds a query per invocation for no practical benefit. The heuristic is cheap and self-correcting.

Batch-size trade-off¶

Smaller (e.g. 100): each transaction commits faster, less lock hold time, less replication-lag impact — but more Sidekiq round-trips to drain a backlog.
Larger (e.g. 5,000): fewer round-trips, more work per transaction — but longer lock holds, bigger binlog events, potential to trip DELETE-LIMIT asymptote if the predicate is unindexed.

500 is the canonical PlanetScale choice, matching the 1,000 for enqueue in the bulk-enqueue pattern. The difference (500 vs 1,000) reflects that deletion holds locks while enqueueing doesn't — deletion's transaction cost per row is structurally higher.

Required index¶

The WHERE predicate must be indexed. The canonical example .where("created_at < ?", 3.months.ago) assumes a created_at index. Without one, the query becomes a full table scan, and the DELETE-LIMIT asymptote kicks in: early batches fast, later batches approach table-scan cost as matching rows deplete.

The pattern only works when the predicate can be served from an index. Before shipping a DeleteOldDataJob, verify the index exists (EXPLAIN on the inner SELECT).

When to use it¶

Large aged-data cleanup. Event logs, audit trails, session records, soft-deleted rows awaiting physical reap — any table with millions of rows where the oldest fraction can be pruned continuously.
Predicate-indexed deletion. The WHERE clause must resolve to an index range for the pattern to stay cheap.
Callback-free maintenance. The delete_all primitive skips ActiveRecord callbacks; the pattern is inappropriate if the deletion needs after_destroy side effects (audit logging, counter-cache updates, dependent cascades). Use paired scheduler-reconciler with destroy_all instead.

When not to use it¶

Cascade-heavy parents. Deletion cascades across associations can't be expressed in delete_all (which skips cascades). Use dependent: :destroy_async or a per-parent cleanup job.
Strict all-or-nothing atomicity. The pattern drains the backlog over many transactions. If a regulatory requirement demands "all aged records deleted atomically", this pattern doesn't provide that guarantee.
Small data volumes. Under a few thousand rows, a single non-batched delete_all is simpler. Reserve the pattern for scales where transaction size matters.

Composition with other patterns¶

Paired scheduler-reconciler — architectural cousin. Both use cron-triggered Sidekiq jobs to reconcile DB state. The reconciler enqueues work jobs for individual records; this pattern executes the deletion directly in the scheduler-tick body. When the work per record is small (a row delete), direct execution avoids the enqueue-dispatch overhead.
Jittered scheduling — composable when multiple DeleteOldDataJob classes are scheduled together (one per table) and you want to spread their cron triggers. Not usually needed within a single deletion job (the unique_for guard handles concurrency).
Sidekiq unique jobs — the deadlock-prevention substrate. This pattern is a second canonical use case for unique_for beyond duplicate-enqueue prevention in the paired-scheduler-reconciler pattern.

Testing approach¶

The source post includes a test template:

class DeleteOldDataJobTest < ActiveJob::TestCase
  test "deletes data over 3 months old" do
    expired = create(:data, minute: 3.months.ago - 1.hour)
    retained = create(:data, minute: 3.months.ago + 1.hour)
    DeleteOldDataJob.new.perform
    assert Data.where(id: expired.id).empty?
    assert Data.where(id: retained.id).exists?
  end

  test "requeues if more to delete" do
    create(:data, minute: 3.months.ago - 1.hour)
    create(:data, minute: 3.months.ago - 1.hour)
    assert_enqueued_sidekiq_jobs(1, only: DeleteOldDataJob) do
      DeleteOldDataJob.new.perform(limit: 1)
    end
  end
end

Two cases: predicate correctness (expired deleted, retained preserved) + requeue correctness (hit the limit → enqueue follow-up).

Seen in¶

— canonical wiki introduction. Mike Coutermarsh (PlanetScale, 2022-08-01) canonicalises the pattern with the DeleteOldDataJob code block + test template + explicit callout of unique_for as deadlock prevention.