SYSTEM Cited by 1 source

Databricks Auto Loader¶

Databricks Auto Loader is a high-throughput Spark Structured Streaming source that incrementally discovers and processes new files landing in cloud object storage (S3 / ADLS / GCS) without requiring manual file listing or state management.

Official docs.

Core mechanics¶

File discovery: Auto Loader tracks which files have been processed using a state store backed by object-storage notifications (where available) or directory listings.
Incremental processing: new files are processed in micro- batches as they arrive, driving downstream Structured Streaming transformations.
Automatic state: metadata about discovered files is persisted by Auto Loader — users don't implement their own watermark / checkpoint logic for file discovery.
Near-real-time arrival patterns: designed for steady streams of new files (cloud-native logs, metric exports, CDC outputs).

Seen in¶

sources/2026-05-05-databricks-10-trillion-samples-a-day-scaling-beyond-traditional-monitoring — canonical observability ingestion use case. Hydra uses Auto Loader as its Structured Streaming source to "efficiently discover and ingest millions of object storage files" at the 20-billion-active- timeseries scale. Auto Loader "automatically persists metadata about discovered files and scales to handle near- real-time arrival patterns," making it viable as the front-door for a lakehouse-native observability platform.

systems/apache-spark — Structured Streaming host
systems/delta-lake — common sink
systems/hydra — canonical observability use case
systems/databricks
companies/databricks
concepts/lakehouse-native-observability

Databricks Auto Loader¶

Core mechanics¶

Seen in¶

Related¶