SYSTEM Cited by 2 sources
AWS Glue¶
AWS Glue is AWS's serverless ETL + data-catalog offering. It bundles a Hive-Metastore-compatible catalog (the "Glue Data Catalog") and a serverless Spark runtime (and more recently a serverless Ray runtime; see systems/aws-glue-for-ray). Sits alongside systems/amazon-emr as the serverless option for Spark jobs on AWS, and is commonly used as the metadata catalog for data-lake engines (Athena, Redshift Spectrum, EMR, Spark, Databricks).
Role for this wiki¶
Glue appears in two roles:
- Serverless Spark / Ray runtime — a managed alternative to running Ray or Spark on raw systems/aws-ec2.
- Catalog substrate — Iceberg tables on AWS Glue are the canonical "tables on S3 with a catalog" shape outside Databricks.
Seen in¶
- sources/2024-07-29-aws-amazons-exabyte-scale-migration-from-apache-spark-to-ray-on-ec2 — named as one of the compute-framework choices for Amazon BDT table subscribers; and systems/aws-glue-for-ray called out as one of the two managed Ray runtimes that mean users don't need to build their own serverless Ray job management (alongside systems/anyscale-platform).
- sources/2026-04-20-databricks-mercedes-benz-cross-cloud-data-mesh — Mercedes-Benz's ~60 TB after-sales dataset is stored as Iceberg-on-AWS-Glue on the producer side, then federated into systems/unity-catalog for cross-cloud sharing via Delta Sharing.
Related¶
- systems/aws-glue-for-ray — managed Ray subflavor.
- systems/apache-spark — Glue's longer-established runtime.
- systems/apache-iceberg — typical catalog-resident table format on Glue.
- systems/apache-hive — Glue Data Catalog speaks the Hive Metastore protocol.