Skip to content

SYSTEM Cited by 1 source

Amazon EMR

Amazon EMR (Elastic MapReduce) is AWS's managed Hadoop-ecosystem runtime — hosts for systems/apache-spark, systems/apache-hive, Presto, Flink, HBase, and other OSS big-data engines on systems/aws-ec2 (and more recently on EKS and serverless). It is the canonical "big data cluster as a service" on AWS and the substrate behind much of the post-Hadoop data-lake workload on systems/aws-s3.

Role for this wiki

EMR typically shows up as the thing you were running Spark on before something changed (a scale-out, a cost crunch, a move to a managed warehouse or a specialist engine like systems/ray). In the Amazon BDT Spark → Ray story, the Spark compactor ran on EMR clusters; Ray clusters run directly on EC2 (via the serverless job management substrate BDT built on top of systems/dynamodb + systems/aws-sns + systems/aws-sqs + systems/aws-s3).

Seen in

Last updated · 200 distilled / 1,178 read