PATTERN Cited by 1 source
Batched background deletion with requeue¶
A maintenance-cron Sidekiq job that deletes up to limit
aged rows per invocation via delete_all, then
re-enqueues itself if the deletion count equalled the
limit (implying more work remains). Guarded by Sidekiq
Enterprise unique_for to prevent concurrent instances
producing lock
contention or deadlocks. The canonical shape for
continuous low-impact deletion of large aged-data backlogs
in a Rails application.
The pattern¶
class DeleteOldDataJob < BaseJob
sidekiq_options unique_for: 1.hour,
unique_until: :start,
queue: :background,
retry: false
def perform(limit: 500)
deleted = Model.where("created_at < ?", 3.months.ago)
.limit(limit)
.delete_all
# If more records to delete, requeue itself
self.class.perform_async if deleted == limit
end
end
Paired with a cron trigger (sidekiq-cron or equivalent) that fires the job once per hour as the entry point.
The four moves¶
delete_alloverdestroy_all— callback-skipping primitive issues a singleDELETE ... WHERE ... LIMIT ?statement. Orders of magnitude faster; appropriate for cleanup jobs where callbacks are overhead.LIMIT Nfor bounded transaction size — caps the work per invocation. At 500 rows theDELETEtransaction completes in milliseconds, holds row locks briefly, produces a small binlog event, doesn't spike replication lag. The canonical value 500 is a Rails-MySQL community convention; adjust per workload.- Self-requeue only when
deleted == limit— signals "we hit the batch cap, probably more work to do". Whendeleted < limit, the query found fewer matches than the cap — either we're done or we're at the asymptote. Natural termination without a separate coordination mechanism. unique_for: 1.hour+unique_until: :start— prevents the cron's next tick, the self-requeue, and any retry from overlapping. Critical for deadlock prevention (see below).
Why unique_for is load-bearing¶
From sources/2026-04-21-planetscale-ruby-on-rails-3-tips-for-deleting-data-at-scale:
"This example is making use of Sidekiq's unique jobs. This protects us from having several of these jobs running concurrently (which could result in deadlocks). If you are using a job system without uniqueness, an alternative is setting up a queue with a concurrency of 1 and running the cleanup job there."
Without uniqueness, three independent paths can produce
concurrent DeleteOldDataJob instances:
- Cron tick fires hourly regardless.
- Self-requeue fires when the previous instance hits the batch cap.
- Sidekiq retry (if
retry: falseis not set) fires on any exception.
Each concurrent instance issues DELETE ... LIMIT 500
against the same table. MySQL's optimizer may choose
different scan ranges per instance; the instances acquire
row + gap locks on overlapping ranges; lock-wait graphs
form cycles; InnoDB detects deadlock and rolls back one.
The rolled-back instance retries (or re-fires on the next
cron tick), producing cascading deadlocks.
unique_for: 1.hour rejects the concurrent enqueue at the
Sidekiq client middleware — only one instance runs at a
time. The fallback "queue with concurrency of 1" achieves
the same property at the queue-dispatch layer.
unique_until: :start semantics¶
The uniqueness lock releases when the job begins executing, not when it completes. This means:
- While instance A is running, any re-enqueue attempt is rejected.
- Once A begins its
perform, the lock releases. - The self-requeue at A's end (via
self.class. perform_async) is accepted, because A's lock is gone. - Instance B enqueues and runs after A's worker thread is available.
Contrast with unique_until: :success, which would hold
the lock through execution and prevent the self-requeue
from succeeding. The :start semantic is deliberately
chosen so self-requeue works.
Why retry: false¶
Sidekiq defaults to 25 retries with exponential backoff. For this job, that's wrong:
- The cron will re-fire on the next tick regardless.
- A retry during the same hour could collide with the next cron tick's instance.
- The job is idempotent — repeated invocations just delete more aged rows — so retries add no correctness value.
retry: false keeps the failure surface simple: if the
job fails, the next cron tick picks up the work.
The termination condition¶
The predicate deleted == limit is a heuristic: "we
asked for 500, got exactly 500 → probably more to
find". In practice, this over-estimates in one edge
case: exactly 500 rows matched the predicate. The next
invocation finds 0, does no work, terminates.
Exact termination would require a separate exists?
check after deletion, which adds a query per invocation
for no practical benefit. The heuristic is cheap and
self-correcting.
Batch-size trade-off¶
- Smaller (e.g. 100): each transaction commits faster, less lock hold time, less replication-lag impact — but more Sidekiq round-trips to drain a backlog.
- Larger (e.g. 5,000): fewer round-trips, more work per transaction — but longer lock holds, bigger binlog events, potential to trip DELETE-LIMIT asymptote if the predicate is unindexed.
500 is the canonical PlanetScale choice, matching the 1,000 for enqueue in the bulk-enqueue pattern. The difference (500 vs 1,000) reflects that deletion holds locks while enqueueing doesn't — deletion's transaction cost per row is structurally higher.
Required index¶
The WHERE predicate must be indexed. The canonical
example .where("created_at < ?", 3.months.ago) assumes
a created_at index. Without one, the query becomes a
full table scan, and the
DELETE-LIMIT asymptote
kicks in: early batches fast, later batches approach
table-scan cost as matching rows deplete.
The pattern only works when the predicate can be served
from an index. Before shipping a DeleteOldDataJob,
verify the index exists (EXPLAIN on the inner
SELECT).
When to use it¶
- Large aged-data cleanup. Event logs, audit trails, session records, soft-deleted rows awaiting physical reap — any table with millions of rows where the oldest fraction can be pruned continuously.
- Predicate-indexed deletion. The
WHEREclause must resolve to an index range for the pattern to stay cheap. - Callback-free maintenance. The
delete_allprimitive skips ActiveRecord callbacks; the pattern is inappropriate if the deletion needsafter_destroyside effects (audit logging, counter-cache updates, dependent cascades). Use paired scheduler-reconciler withdestroy_allinstead.
When not to use it¶
- Cascade-heavy parents. Deletion cascades across
associations can't be expressed in
delete_all(which skips cascades). Usedependent: :destroy_asyncor a per-parent cleanup job. - Strict all-or-nothing atomicity. The pattern drains the backlog over many transactions. If a regulatory requirement demands "all aged records deleted atomically", this pattern doesn't provide that guarantee.
- Small data volumes. Under a few thousand rows, a
single non-batched
delete_allis simpler. Reserve the pattern for scales where transaction size matters.
Composition with other patterns¶
- Paired scheduler-reconciler — architectural cousin. Both use cron-triggered Sidekiq jobs to reconcile DB state. The reconciler enqueues work jobs for individual records; this pattern executes the deletion directly in the scheduler-tick body. When the work per record is small (a row delete), direct execution avoids the enqueue-dispatch overhead.
- Jittered
scheduling — composable when multiple
DeleteOldDataJobclasses are scheduled together (one per table) and you want to spread their cron triggers. Not usually needed within a single deletion job (theunique_forguard handles concurrency). - Sidekiq unique
jobs — the deadlock-prevention substrate. This
pattern is a second canonical use case for
unique_forbeyond duplicate-enqueue prevention in the paired-scheduler-reconciler pattern.
Testing approach¶
The source post includes a test template:
class DeleteOldDataJobTest < ActiveJob::TestCase
test "deletes data over 3 months old" do
expired = create(:data, minute: 3.months.ago - 1.hour)
retained = create(:data, minute: 3.months.ago + 1.hour)
DeleteOldDataJob.new.perform
assert Data.where(id: expired.id).empty?
assert Data.where(id: retained.id).exists?
end
test "requeues if more to delete" do
create(:data, minute: 3.months.ago - 1.hour)
create(:data, minute: 3.months.ago - 1.hour)
assert_enqueued_sidekiq_jobs(1, only: DeleteOldDataJob) do
DeleteOldDataJob.new.perform(limit: 1)
end
end
end
Two cases: predicate correctness (expired deleted, retained preserved) + requeue correctness (hit the limit → enqueue follow-up).
Seen in¶
- sources/2026-04-21-planetscale-ruby-on-rails-3-tips-for-deleting-data-at-scale —
canonical wiki introduction. Mike Coutermarsh
(PlanetScale, 2022-08-01) canonicalises the pattern
with the
DeleteOldDataJobcode block + test template + explicit callout ofunique_foras deadlock prevention.