CONCEPT Cited by 1 source

Locality-aware scheduling¶

Locality-aware scheduling is the scheduler policy of placing a unit of work on the node that already holds its input data, minimising network transfer. It is a necessary condition for zero-copy intranode sharing (concepts/zero-copy-sharing) to actually pay off — if the task executes on a different node from its inputs, the inputs have to be shipped first.

Why it matters at scale¶

Shuffles dominate distributed compute cost when they happen. The point of locality-aware scheduling is to keep most of the work non-shuffling — only the rebalancing steps cross nodes.
Columnar merges (patterns/streaming-k-way-merge) typically group input by a partition key, then merge; locality-aware placement pushes each partition's workers to the node that holds its already-materialised columnar fragments.
Object-store-backed systems (Ray's distributed object store, Spark's block manager, Flink's state backend) all benefit — the store references are local so the pull is memory-to-memory, not network-to-network.

Mechanism¶

Classic shapes:

Preferred-location hints — each task declares one or more nodes that hold its inputs; scheduler tries those first, falling back to any node after a timeout.
Gang scheduling — place a set of tasks that share data on the same node or rack.
Sticky assignment — once an actor / worker has cached state, subsequent work goes to the same worker.
Two-level scheduler — a global scheduler hands coarse-grained work out to node-local schedulers which optimise placement locally.

Relation to task/actor frameworks¶

In a pure dataflow engine the optimiser can infer locality from the query plan. In a task-and-actor framework (concepts/task-and-actor-model) the scheduler must see the object references each task takes, map them to their storing nodes, and bias placement. Ray is the canonical modern instance — the scheduler consults the distributed object store's location map on every task placement.

Seen in¶

sources/2024-07-29-aws-amazons-exabyte-scale-migration-from-apache-spark-to-ray-on-ec2 — Ray's "efficient locality-aware scheduler" named as one of the five load-bearing Ray features Amazon BDT bought into; specifically powers "zero-copy intranode object exchange during locality-aware shuffles" in the Ray compactor.