Skip to content

CONCEPT Cited by 1 source

Sequential vs random I/O on SSD/EBS

Definition

The same byte-level I/O demand has very different IOPS cost depending on whether the reads/writes are sequential (contiguous in the logical block address space) or random (scattered). On Amazon EBS — and many other shared network-attached block-storage systems — sequential I/O is bundled into fewer, larger operations; random I/O pays the full per-IOP cost regardless of how many bytes were actually requested.

This is the counterintuitive bit: the sequential vs random distinction survives the HDD → SSD transition on EBS, even though the underlying SSD hardware has no head-seek penalty that would naturally punish random access.

Dicken's framing

"Computing the total amount of bytes we can move to and from disk per second is not as simple as calculating number_of_iops * 64 KiB. This is due to the difference between how EBS handles sequential and random reads. For sequential reads, EBS will bundle requests together, allowing you to maximize IOPS. For random reads, each read counts as a full IOP, even if it is less than 64k. For example, a single random read of a 4k block from disk will count as a full 64k IOP. Some think that once you move to SSDs, your sequential vs random read patterns no longer matter. However, this applies to workloads on both HDDs and SSDs on EBS."

(Source: sources/2026-04-21-planetscale-increase-iops-and-throughput-with-sharding)

Why sequential is cheaper (on EBS / shared network storage)

The EBS client driver and the storage-fabric protocol can:

  1. Coalesce contiguous requests into a single, larger operation (e.g. sixteen adjacent 4 KiB reads → one 64 KiB read).
  2. Pipeline the network round-trip for a range scan (one request, many bytes returned).
  3. Exploit the backing SSD's native block sizes — SSDs read/write at the 4 KiB or 16 KiB page level internally; sequential access lets the driver align on those boundaries.

The net effect: a sequential workload moving M bytes issues roughly M / 64 KiB IOPS; a random workload accessing the same M bytes at 4 KiB granularity issues M / 4 KiB IOPS — 16× more.

The 4 KiB random read costs a full 64 KiB IOP

The architecturally important detail Dicken flags verbatim:

"each read counts as a full IOP, even if it is less than 64k. For example, a single random read of a 4k block from disk will count as a full 64k IOP."

A workload requesting scattered 4 KiB blocks on gp3:

  • At 3,000 IOPS default, can serve 3,000 × 4 KiB = 12 MiB/s of useful data.
  • Burns the IOPS budget at the rate of a full 64 KiB operation per 4 KiB requested.
  • Hits the IOPS cap at 12 MiB/s rather than the 192 MiB/s the naive formula suggests.

Random I/O with small block sizes burns IOPS 16× faster than it burns throughput.

Why the distinction survives the HDD → SSD transition on EBS

On local disks, sequential vs random was an HDD-physics phenomenon (head-seek cost amortised across contiguous reads; see concepts/hdd-sequential-io-optimization). On raw local SSD, random and sequential are roughly equivalent at the hardware level — the drive has no seek.

On EBS (network-attached SSD), the distinction returns, but for a different reason: the EBS IOP-accounting protocol is fixed-block-sized, and sequential bundling is a protocol-layer optimisation the client driver + storage fabric implement. "this applies to workloads on both HDDs and SSDs on EBS" — because EBS's accounting unit is the protocol-layer IOP, not the SSD's internal page.

Architectural implications for databases

  • Index scans and full table scans are cheap — sequential access, bundle-friendly, throughput-bound not IOPS-bound.
  • Point lookups on unindexed columns are expensive — random access, one IOP per row, burns IOPS fast.
  • Random secondary-index access is expensive — jumping between index and heap pages is random I/O.
  • WAL / binlog writes are sequential — append-only, stream-friendly, mostly throughput-bound.
  • Checkpoint flushes are random — dirty pages scattered across the buffer pool, one IOP per page.

Dicken's operational summary:

"The more random reads you have, the less efficiently you'll use your IOPS. The more sequential reads, the better."

"In practice, database workloads often have a mix of sequential and random disk IO operations. We rarely will be operating at maximum IOPS efficiency, so 3000 IOPS will pair acceptably with 125 MiB/s in many situations."

Mitigations

  • B+tree clustered indexes localise scans on the primary key — range scans become sequential.
  • Buffer pool absorbs random hits — cache hit serves the request without any EBS IOP.
  • Group commit + large write batches convert many small random writes into fewer larger sequential writes.
  • SSTable / LSM-tree architectures (e.g. RocksDB) deliberately re-arrange writes into large sequential flushes, trading write amplification for IOPS efficiency.
  • Direct-attached NVMe (Metal, systems/nvme-ssd) bypasses the EBS IOP-accounting protocol — random I/O at 4 KiB granularity is native, cheap, and doesn't burn the 64 KiB IOP accounting tax.

Seen in

  • sources/2026-04-21-planetscale-increase-iops-and-throughput-with-sharding — Ben Dicken's canonical formulation on EBS: "For sequential reads, EBS will bundle requests together … For random reads, each read counts as a full IOP, even if it is less than 64k." The 4 KiB-random-read-burns-64 KiB-IOP worked example is the definitional datum; the "this applies to workloads on both HDDs and SSDs on EBS" caveat breaks the common intuition that moving to SSDs eliminates the sequential-vs-random distinction.
Last updated · 378 distilled / 1,213 read