SYSTEM Cited by 1 source
Apache Hive¶
Apache Hive is the Hadoop-era SQL-on-HDFS query engine + metastore — the system that popularised the "tables over a data lake" idea at Facebook in the late 2000s and became the default metadata catalog for early cloud data lakes. The Hive Metastore (HMS) protocol is still the catalog interface many newer engines implement (or federate with).
Role for this wiki¶
Hive appears in two guises:
- As a legacy SQL engine over HDFS / S3 — commonly superseded by newer engines (systems/apache-spark, systems/amazon-athena, systems/amazon-redshift, Trino) but still running in many enterprise stacks.
- As a catalog lineage ancestor — the "table over data files" concept that open table formats like systems/apache-iceberg generalised and made transactional.
Seen in¶
- sources/2024-07-29-aws-amazons-exabyte-scale-migration-from-apache-spark-to-ray-on-ec2 — Amazon BDT's post-Oracle BI stack started with "a mix of Amazon Redshift, Amazon RDS, and Apache Hive on systems/amazon-emr" for compute over S3. Hive is named as one of the compute frameworks a table subscriber could pick from.
Related¶
- systems/apache-spark — the successor generalist engine.
- systems/amazon-emr — typical AWS managed substrate.
- systems/apache-iceberg — Iceberg modernises what Hive's metastore + partitioned-directory layout did.
- systems/aws-glue — AWS's managed Hive-Metastore-compatible catalog service.