Skip to content

SYSTEM Cited by 2 sources

AWS Glue

AWS Glue is AWS's serverless ETL + data-catalog offering. It bundles a Hive-Metastore-compatible catalog (the "Glue Data Catalog") and a serverless Spark runtime (and more recently a serverless Ray runtime; see systems/aws-glue-for-ray). Sits alongside systems/amazon-emr as the serverless option for Spark jobs on AWS, and is commonly used as the metadata catalog for data-lake engines (Athena, Redshift Spectrum, EMR, Spark, Databricks).

Role for this wiki

Glue appears in two roles:

  1. Serverless Spark / Ray runtime — a managed alternative to running Ray or Spark on raw systems/aws-ec2.
  2. Catalog substrate — Iceberg tables on AWS Glue are the canonical "tables on S3 with a catalog" shape outside Databricks.

Seen in

Last updated · 200 distilled / 1,178 read