Skip to content

CONCEPT Cited by 1 source

Task and actor model

The task and actor model is a low-level distributed-compute programming model that exposes two orthogonal primitives:

  • Task — a stateless, serialisable unit of remote function invocation. Inputs are references / values; output is a future reference. Scheduled anywhere in the cluster that has capacity.
  • Actor — a stateful, long-lived remote class instance with its own addressable identity. Has an internal mutable state; methods called on it run on the actor's home worker serially (or with explicit concurrency control).

The model is deliberately closer to a distributed runtime than to a dataflow DAG. It is what systems/ray exposes; it is also the shape Amazon's in-house distributed compute service offered (though with usability and efficiency issues that ruled it out for BDT's compactor workload).

Why "low-level" is worth it

Higher-level engines (Spark dataflows, SQL planners, Beam pipelines) compose operators over structured data and give the runtime optimisation freedom. That's a great deal for general-purpose ETL and SQL-shaped work.

But for specialist workloads — large-scale compaction, ML training loops, per-request micro-pipelines, locality-sensitive shuffles — the dataflow abstraction hides the mechanisms you need to hand-craft the distributed algorithm. Tasks + actors give back:

  • Direct control over placement (same node, same rack, or anywhere).
  • Direct control over object locality and what's shuffled vs referenced in place.
  • Direct expression of stateful patterns — caches, checkpoints, coordinators — without trying to squeeze them into a stateless operator graph.
  • Direct handling of heterogeneous workers and resource demands (e.g. memory-aware scheduling, GPU affinity).

Amazon BDT's framing: "BDT didn't need a generalist for the compaction problem, but a specialist, and Ray let them narrow their focus down to optimising the specific problem at hand." That's the task-and-actor sell.

(Source: sources/2024-07-29-aws-amazons-exabyte-scale-migration-from-apache-spark-to-ray-on-ec2)

Downsides to name

  • Application complexity is yours. No free query optimiser; no automatic operator placement; no free retry logic.
  • Debugging distributed state is harder than debugging a dataflow DAG — actors have history.
  • Portability is worse than SQL. Dataflow engines translate across clusters; a task/actor program is a specific distributed algorithm.

Lineage

The pattern predates Ray — Erlang/OTP, Akka, Orleans, Dask, and internal cluster frameworks at the hyperscalers all expose some form of actors or tasks. Ray's articulation — tasks + actors + object store + locality scheduler — is the current canonical framing for general-purpose distributed compute outside the dataflow world.

Seen in

Last updated · 200 distilled / 1,178 read