SYSTEM Cited by 2 sources

Ray¶

Ray is an open-source distributed compute framework out of UC Berkeley's RISELab (arXiv:1712.05889). It exposes a low-level tasks + actors programming model on top of a distributed object store with zero-copy intranode sharing and an autoscaling + locality-aware scheduler. Originally pitched as a framework for scaling ML workloads, it has since been shown — decisively at exabyte scale — to also be a world-class specialist data-processing substrate.

Programming model¶

Ray's primitives are deliberately lower-level than Spark's dataflow abstractions (concepts/task-and-actor-model):

Tasks — stateless remote function invocations.
Actors — stateful remote classes.
Object store — a horizontally-scalable store of immutable serialized objects, keyed by object-ref. One per node, pooled across the cluster.
Zero-copy intranode sharing — references to objects in the local node's store are handed to co-located tasks/actors without re-serialization.
Locality-aware scheduler — prefers to place a task on the node that already holds its input references, minimising cross-node shuffle (concepts/locality-aware-scheduling).
Autoscaling clusters — fine-grained add/remove of workers matching load; supports heterogeneous instance types.
Memory-aware scheduling — tasks can declare expected memory usage at invocation time, letting the scheduler pack by the real bottleneck resource rather than CPU alone (concepts/memory-aware-scheduling).

Positioning vs. Apache Spark¶

Spark offers rich, safe, abstract dataflow operators; great for generalist data processing; tuned primarily for RAM+disk clusters.
Ray offers a lower-level escape hatch: you hand-craft the distributed program. Reward: for specialist workloads you can beat Spark by large factors on both throughput and cost. Cost: no free optimiser, no SQL, application complexity is yours.

In Amazon BDT's case the ability to copy untouched S3 files by reference during a distributed merge (patterns/reference-based-copy-optimization) and to locality-schedule shuffles with zero-copy intranode sharing produced an 82% better cost efficiency per GiB of S3 input vs the prior Spark compactor (sources/2024-07-29-aws-amazons-exabyte-scale-migration-from-apache-spark-to-ray-on-ec2).

Production operational realities (from the BDT migration)¶

Out-of-memory errors were the most common failure mode at exabyte scale; resolved by memory-aware scheduling using past workload memory profiles to size tasks.
EC2 instance-management pain — Ray clusters wanted more EC2 instances than a single instance type could reliably supply; resolved by heterogeneous cluster provisioning (patterns/heterogeneous-cluster-provisioning): pick a set of instance types that can meet the resource shape, then provision whichever are most readily available across AZs. Trade-off: applications must not assume a fixed CPU arch / disk type.
Reliability gap vs Spark is real: BDT reports 2023 Ray first-time success rate up to 15% behind Spark; 2024 average 99.15% vs Spark's 99.91% (0.76 pp). Closing that gap took years.
Memory utilisation caps the efficiency win — BDT averaged 54.6% cluster-memory utilisation in Q1 2024; closing to 90% would raise Ray vs Spark efficiency gap from 82% → >90%.

Ray's reach¶

ML training/serving/hyperparameter-search was its original positioning.
Data processing at scale: Exoshuffle (arXiv:2203.05072) holds the 2022 Cloud Terasort cost record. Amazon BDT runs exabyte-scale production compaction on it.
Open-source data-catalog tooling: systems/deltacat (Ray project) targets managed open table formats — systems/apache-iceberg, systems/apache-hudi, systems/delta-lake.
Serverless Ray: systems/anyscale-platform (commercial Ray company) and systems/aws-glue-for-ray (AWS managed) mean users no longer need to build their own serverless-Ray job management.

Seen in¶

sources/2024-07-29-aws-amazons-exabyte-scale-migration-from-apache-spark-to-ray-on-ec2 — Ray as the successor to Spark for Amazon Retail BDT's exabyte-scale compactor. Q1 2024: 1.5 EiB input Parquet, 4 EiB in-memory Arrow, >10,000 vCPU-years, 26,846-vCPU / 210-TiB RAM peak cluster; 82% better cost efficiency per GiB vs Spark; 100% on-time delivery across Q4 2023 + Q1 2024; 99.15% first-time success (0.76 pp behind Spark). Ray Summit 2021 talk
petabyte-scale datalake slides.
sources/2026-02-17-instacart-turning-data-into-velocity-capers-edge-and-cloud-data-flywheel-with-capsight — Ray as training substrate for Instacart's Capsight Learner (the distributed-training stage of Caper's edge→cloud data flywheel). Cited as "a distributed, Ray-based training platform"; dropped the model-training stage from one week to two days.

systems/apache-spark — the generalist Ray is displacing in specialist Amazon-internal workloads.
systems/deltacat — open-source Ray project for catalogued table-format compaction; Amazon contributed The Flash Compactor.
systems/anyscale-platform, systems/aws-glue-for-ray — managed-Ray runtimes.
systems/apache-arrow — in-memory columnar format Ray's DeltaCAT workloads materialise into.
concepts/task-and-actor-model, concepts/locality-aware-scheduling, concepts/memory-aware-scheduling — the mechanism-level concepts.
patterns/heterogeneous-cluster-provisioning, patterns/reference-based-copy-optimization — deployment and optimisation patterns seen in production Ray use.