Skip to content

SYSTEM Cited by 2 sources

Google BigQuery

Google BigQuery is Google Cloud's serverless enterprise data warehouse — a column-store OLAP engine with compute-storage separation, SQL-native query surface, and integrations with GCS and other GCP data sources. On this wiki, BigQuery surfaces as an Iceberg-reading query engine via its CREATE EXTERNAL TABLE ... format = 'ICEBERG' primitive against Iceberg tables written to GCS by a streaming or batch producer.

Product docs: cloud.google.com/bigquery.

Architectural role on this wiki

BigQuery is a canonical consumer of external table over Iceberg metadata pointer — the pattern where a query engine registers an Iceberg table by pointing at a specific vN.metadata.json file in object storage rather than going through a REST catalog service.

Worked DDL (verbatim from the source demo):

CREATE EXTERNAL TABLE YOUR_PROJECT_ID.YOUR_BIGQUERY_DATASET.YOUR_TABLE_NAME
WITH CONNECTION 'YOUR_FULL_CONNECTION_ID'
OPTIONS (
  format = 'ICEBERG',
  metadata_file_paths = ['gs://your-bucket-name/path/to/your/iceberg/table/metadata/vX.metadata.json']
);

The YOUR_FULL_CONNECTION_ID is a BigQuery Cloud Resource Connection — the IAM-layer construct that lets BigQuery read the external GCS bucket with the right scoped permissions.

Why it shows up in this corpus

BigQuery was named in Redpanda's 25.1 GA post as one of the Iceberg-compatible query engines (alongside ClickHouse, Snowflake, Databricks, Dremio, Spark SQL, Flink, Trino) but was not the featured integration for the GA walkthrough. The BYOC tutorial (2025-05-13) is where BigQuery becomes the canonical wiki worked example of a catalog-protocol-agnostic Iceberg reader — one that uses a file-based metadata pointer rather than a REST catalog to find the table.

This is the read-side surface of Redpanda's BYOC → GCS → Iceberg pipeline walked in the source: Redpanda Iceberg topic projects row-oriented records into Parquet on GCS + writes Iceberg metadata; BigQuery's CREATE EXTERNAL TABLE opens the metadata JSON and serves SQL queries against the Parquet files without a Redpanda REST catalog in the query path.

Minimal-viable stub

This is a stub page anchored by a single Iceberg-external-table source. BigQuery's internals (Dremel execution engine, Capacitor column store, Borg scheduling, slot-based billing model) aren't canonicalised here; a deeper treatment waits on a dedicated BigQuery-internals source. The pattern's system role on this wiki is narrow: "the query engine that reads Iceberg via file- based catalog."

Caveats

  • External tables show a static snapshot pointer. Per the source: "update the external table definition in BigQuery if the location of the latest metadata file changes or you want to query a newer snapshot of the table data." New producer snapshots don't auto-propagate without a refresh mechanism (scheduled DDL re-run, event-driven refresh, or ALTER EXTERNAL TABLE).
  • Cost-per-query on external tables. BigQuery charges for the data scanned; Iceberg external tables can scan the full table if queries aren't shaped to hit partition / manifest pruning. Partition-pruning hygiene is a cost lever.
  • Limited compared to BigLake / BigLake Iceberg tables. BigLake offers richer managed-table integration than the basic external-table path (including Iceberg REST catalog integration); the source's tutorial uses the simpler external-table shape.
  • Cloud-bound. BigQuery reads data out of GCS efficiently but reading Iceberg tables from S3 (cross-cloud) requires network egress and a different connection-credential shape.

BigLake metastore as REST-catalog alternative

As of 2025-11-06 (Redpanda 25.3), Redpanda Iceberg Topics adds support for Google BigLake metastore as a REST-catalog option alongside the file-based pointer shape canonicalised above. The two shapes coexist:

  • File-based catalog (2025-05-13 BYOC beta path above) — BigQuery points at a specific vN.metadata.json; new producer snapshots require an external-table refresh.
  • REST catalog via BigLake (25.3 path) — BigQuery discovers tables via BigLake metastore + Dataplex governance; producer snapshots auto-propagate to reader catalog views.

See systems/google-biglake for the metastore system page.

Seen in

Last updated · 470 distilled / 1,213 read