Skip to content

SYSTEM Cited by 1 source

Apache Hive

Apache Hive is the Hadoop-era SQL-on-HDFS query engine + metastore — the system that popularised the "tables over a data lake" idea at Facebook in the late 2000s and became the default metadata catalog for early cloud data lakes. The Hive Metastore (HMS) protocol is still the catalog interface many newer engines implement (or federate with).

Role for this wiki

Hive appears in two guises:

  1. As a legacy SQL engine over HDFS / S3 — commonly superseded by newer engines (systems/apache-spark, systems/amazon-athena, systems/amazon-redshift, Trino) but still running in many enterprise stacks.
  2. As a catalog lineage ancestor — the "table over data files" concept that open table formats like systems/apache-iceberg generalised and made transactional.

Seen in

Last updated · 200 distilled / 1,178 read