Skip to content

PATTERN Cited by 1 source

Downsample / recode for long-term archive

Intent

Before moving media into long-term retention (cold storage, Glacier, tape), re-encode or downsample it to a lower- fidelity format that preserves just-enough quality for compliance / occasional-lookup access — shedding 5–10x on storage cost without losing the regulatory-required signal.

Motivating example: Amazon Connect call recordings

From the 2022-07-11 roundup (Source: sources/2022-07-11-highscalability-stuff-the-internet-says-on-scalability-for-july-11th-2022), via an AWS architecture-blog post titled Serverless architecture for optimizing Amazon Connect call-recording archival costs:

"Long term retention requirements cost a lot in storage. But you don't probably have to keep data stored in its original high fidelity encoding. Recode and downsample it to save big bucks."

Classic case: a 7-year call-recording retention requirement for financial / regulatory compliance. Calls are recorded at high-fidelity WAV; 99.9% of them will never be accessed again. Downsample to 8-kHz narrowband MP3 before the 30-day-old cold-storage transition: intelligible for compliance pulls, 1/10 the storage footprint.

Generalized pattern

Stage Fidelity Cost tier
Live / hot Original-quality WAV / FLAC / raw video S3 Standard
Recent-review window (~30 days) Original S3 Standard / IA
Archive (~30 days – years) Downsampled + re-encoded S3 Glacier / Deep Archive

The re-encode step is cheap compared to the retention cost; the breakeven is typically a few days of retained storage.

When not to apply

  • Forensic / legal-hold data: some regulations require preservation of the original bitstream.
  • Training data for ML that may benefit from high-fidelity features (speech recognition, video understanding) — keep a high-fidelity path for model-retraining use cases and a downsampled path for compliance.
Last updated · 517 distilled / 1,221 read