CONCEPT Cited by 1 source

Iceberg file-based catalog¶

An Iceberg file-based catalog is a catalog-integration shape in which a Apache Iceberg-writing producer publishes its table's current snapshot pointer as a metadata file directly in object storage, and Iceberg-aware readers open the table by pointing at that metadata file's object key — bypassing a REST catalog protocol entirely.

Source: sources/2025-05-13-redpanda-getting-started-with-iceberg-topics-on-redpanda-byoc.

The shape¶

Where REST catalog sync stores the table's current-snapshot pointer in a managed HTTP service (Snowflake Open Catalog, Databricks Unity, AWS Glue) that readers authenticate against, a file-based catalog stores the pointer as an ordinary JSON file in the same object store that holds the data. A reader with read access to the bucket can discover and query the table by referencing the latest vN.metadata.json file directly.

Canonical reader-side integration is Google BigQuery's CREATE EXTERNAL TABLE primitive (verbatim from the source):

CREATE EXTERNAL TABLE YOUR_PROJECT_ID.YOUR_BIGQUERY_DATASET.YOUR_TABLE_NAME
WITH CONNECTION 'YOUR_FULL_CONNECTION_ID'
OPTIONS (
  format = 'ICEBERG',
  metadata_file_paths = ['gs://your-bucket-name/path/to/your/iceberg/table/metadata/vX.metadata.json']
);

The pattern also applies to Amazon Athena external Iceberg tables, Snowflake external-stage Iceberg reads, and standalone Spark / Trino / DuckDB sessions with iceberg.catalog-impl = org.apache.iceberg.hadoop.HadoopCatalog or equivalent "point at a metadata JSON" configuration.

Relationship to the broker-owned object-store catalog fallback¶

Redpanda 25.1 GA canonicalised a "built-in object-store-based catalog" as a fallback when no REST catalog is configured — "suitable for ad hoc access by data engineers when no REST catalog is available." The 2025-05-13 BYOC tutorial reframes this as a primary integration option for engines (BigQuery) that read Iceberg directly from a metadata pointer rather than speak the REST-catalog protocol:

"Direct integration with popular REST catalogs like Snowflake Open Catalog, or with Iceberg clients like Google BigQuery via a file-based catalog."

Whether the object-store fallback and the file-based catalog are the same broker-owned mechanism or distinct shapes is not clarified by the BYOC post. The wiki treats them as aspects of the same underlying property: Iceberg metadata lives in object storage as a JSON file, and any reader with bucket access can walk it.

Trade-offs vs REST catalog¶

Axis	File-based catalog	REST catalog
Snapshot discovery	Reader sees a static pointer; must re-point on new snapshot	Reader re-queries catalog; always sees latest
Auth	Object-store IAM (bucket-level)	OIDC token against catalog service
ACL granularity	Object-key-scoped	Table-scoped
Cross-engine consistency	Each engine maintains its own metadata-file pointer	Single source of truth
Catalog availability	No separate service to fail	Catalog is a write-path dependency
Auto-registration	Reader must know the metadata path	Auto-register works

When file-based catalog wins¶

Reader doesn't speak REST catalog protocol. BigQuery, Athena, Spark HadoopCatalog configs — pointing at a metadata JSON is native; paying for a REST catalog just to query the table is overhead.
Single-reader workload. When only one engine reads the table, the cross-engine-consistency property of a REST catalog isn't load-bearing.
Customer owns the bucket and wants direct-read access without a middleman. The BYOC-data-ownership framing: customer-owned bucket + customer- owned query engine = no need for a Redpanda-operated catalog endpoint in the query path.
Catalog availability is a write-path liability the operator wants to avoid. REST catalogs couple producer availability to catalog availability; the file-based shape decouples them (at the cost of losing cross-engine consistency).

When REST catalog wins¶

Multi-engine / multi-writer workload where snapshot consistency across readers matters.
Organisations with table-level ACL requirements — object-scoped IAM can't express "Alice can read the Orders table but not the PII-Orders table".
Discovery: REST catalogs list tables; file-based catalogs don't — readers have to know the metadata-path convention.
Schema federation across cloud accounts / clouds — Unity / Polaris / Glue are designed for this; file-based catalogs aren't.

Costs / caveats¶

No auto-refresh on new snapshots. BigQuery external tables have to be re-created (or updated via ALTER EXTERNAL TABLE) to see a newer vN.metadata.json. Verbatim from the source: "update the external table definition in BigQuery if the location of the latest metadata file changes or you want to query a newer snapshot of the table data." This makes the file-based shape semi-static; it's not a live-streaming reader.
No table-level ACL. Object-key-scoped IAM is the policy surface. Fine-grained per-column, per-row, or time-travel policies aren't expressible.
No cross-writer serialisation. Multiple producers writing concurrently to the same Iceberg table without a REST-catalog- mediated commit protocol can overwrite each other's metadata pointers; safe concurrent multi-writer access requires the REST-catalog path with optimistic-concurrency commits.
Schema-evolution visibility across engines. Each reader holds its own metadata-file pointer; a producer's schema change must be explicitly re-pointed for each reader.

Seen in¶

sources/2025-05-13-redpanda-getting-started-with-iceberg-topics-on-redpanda-byoc — canonical wiki disclosure. Redpanda 25.1 BYOC-beta Iceberg Topics walkthrough uses the file-based catalog via BigQuery CREATE EXTERNAL TABLE on a GCS-hosted Iceberg metadata JSON. Framed as an alternative to REST-catalog sync for Iceberg clients like BigQuery that read directly from a metadata pointer.

concepts/iceberg-catalog-rest-sync — the sibling catalog shape (REST catalog protocol) that a file-based catalog replaces or complements.
concepts/iceberg-topic · systems/redpanda-iceberg-topics — the broker-native producer that can target either catalog shape.
systems/apache-iceberg — the table format.
systems/google-bigquery — the canonical file-based- catalog reader the source demoes.
systems/google-cloud-storage — the object store hosting the metadata JSON in the demo.
concepts/byoc-data-ownership-for-iceberg — the BYOC context in which file-based catalog is most often preferred.
patterns/external-table-over-iceberg-metadata-pointer — the pattern that consumes a file-based catalog from the query- engine side.
concepts/open-table-format — broader architectural context.