Skip to content

SYSTEM Cited by 3 sources

Scuba (Meta)

Scuba is Meta's warm / real-time data store — optimised for interactive slice-and-dice analytics on recent operational data. Stub page on this wiki; cited by name in the 2024-12-02 cryptographic monitoring post. The public Scuba reference is the 2013 Scuba: Diving into Data at Facebook paper.

Role for this wiki

  • Warm tier downstream of Scribe: ingested events land in Scuba for real-time analysis and dashboarding.
  • Warm tier + the cold tier together: Scuba for short-term, Hive for long-term, forming Meta's two-tier telemetry storage pattern.
  • Not a warehouse. Per the 2024-12-02 post, Scuba is "optimized to be performant for real-time data (i.e., warm storage) and can be inefficient if used for larger datasets" — the cost motivation for tiering large long-retention datasets into Hive.

Seen in

  • sources/2024-12-02-meta-built-large-scale-cryptographic-monitoring — Scuba is the warm destination for FBCrypto's aggregated counts; post discloses "occasionally put an increased load on Scuba" alongside Scribe as a named capacity-management challenge.
  • sources/2025-03-07-meta-strobelight-a-profiling-service-built-on-open-source-technology — Scuba is the primary tool Strobelight customers use — the query language + database + UI for all Strobelight-captured profile data. "If someone runs an on-demand profile, it's just a few seconds before they can visualize this data in the Scuba UI (and send people links to it)." Second canonical Meta upstream producer on the wiki (after FBCrypto); cements Scuba as the warm-query surface for both aggregated-count telemetry and symbolized profile samples — two different upstream shapes feeding the same store. Every flame-graph, time-series, and distribution view in Strobelight is a Scuba query.
  • sources/2026-05-12-meta-migrating-data-ingestion-systems-at-meta-scaleThird canonical Meta upstream producer: data-quality mismatch logs from the data-ingestion-system migration's quality-analysis tool. The hourly augmented-log-stream pattern lives on Scuba — partition- level mismatches log on detection; an hourly tool re-reads mismatches, runs targeted queries to find example offending rows, and re-logs the augmented analysis to Scuba. Operators query the augmented stream rather than the source data. Plus: Scuba is the substrate for the migration's per-job lifecycle signals consumed by the automated promotion loop driving tens of thousands of jobs through phase transitions. Reinforces Scuba's role as Meta's universal warm-query substrate for fleet-wide telemetry of any structured event type.
Last updated · 542 distilled / 1,571 read