SYSTEM Cited by 3 sources

Amazon Athena¶

Amazon Athena is AWS's serverless SQL query engine over systems/aws-s3 — a managed Presto/Trino deployment that reads systems/apache-parquet (and other open formats) directly from S3 with no cluster to provision. Canonical example of the concepts/compute-storage-separation pattern at the engine level: the storage is S3; the compute spins up on demand per query.

Role for this wiki¶

Athena is one of the interchangeable SQL compute engines over the shared S3 data lake, alongside systems/amazon-redshift, AWS Glue, systems/apache-spark on systems/amazon-emr, and Apache Hive. It is commonly used as the ad-hoc engine when a dedicated warehouse is overkill.

Seen in¶

sources/2025-09-26-yelp-s3-server-access-logs-at-scale — canonical wiki reference for Athena's operational envelope at fleet scale across three named axes: (1) the raw-to-Parquet log-compaction engine — Yelp runs daily Athena INSERT queries converting TiBs/day of raw S3 Server Access Logs into Parquet (85 % storage + 99.99 % object-count reduction); (2) the load-bearing case for Glue partition projection (enum for bucket_name, date-granular timestamp) over managed partitions — canonicalised as patterns/projection-partitioning-over-managed-partitions; and (3) the canonical wiki disclosure of Athena's shared-resource contention — "Athena is a shared resource so a query may be killed any time due to the cluster being overloaded, or occasionally hitting S3 API limits" — which forces idempotent INSERT via self-LEFT-JOIN on requestid as a retry-safe shape. Also surfaces Athena's $path pseudo-column (for extracting S3 locations of source objects) and GetQueryRuntimeStatistics API (for post-query row-count verification that avoids expensive COUNT(*) on newly- written Parquet). First fleet-scale-operational Seen-in on this page.
sources/2024-07-29-aws-amazons-exabyte-scale-migration-from-apache-spark-to-ray-on-ec2 — one of the compute frameworks available to Amazon BDT table subscribers; and one of the three query engines used in BDT's Data Reconciliation Service for the Ray migration (Spark + Redshift + Athena), to verify that Spark-compacted vs Ray-compacted tables produced equivalent results across multiple frameworks.
sources/2026-04-01-aws-automate-safety-monitoring-with-computer-vision-and-generative-ai — canonical wiki reference for patterns/data-driven-annotation-curation at ML-ops scale. Athena queries inference-results + customer- feedback data in S3 to aggregate false-positive rates across camera types + deployment conditions, prioritising retraining on image sources with elevated error rates. Also surfaces below-confidence-threshold inferences for targeted annotation. Replaces untenable blanket per-site daily annotation jobs at hundreds of sites.

systems/amazon-redshift — the dedicated-cluster warehouse peer.
systems/aws-s3 — the shared storage layer.
systems/apache-iceberg — Athena speaks the Iceberg REST catalog API directly.
concepts/compute-storage-separation — Athena is the canonical serverless form of this pattern on AWS.

Amazon Athena¶

Role for this wiki¶

Seen in¶

Related¶