CONCEPT Cited by 1 source
Partition quality marking¶
Definition¶
Partition quality marking is the practice of annotating a data partition's metadata with a quality flag (good / bad / unknown) so that downstream consumers + the ingestion system itself can branch on the flag rather than treating all landed data as authoritative. Crucially, the data in the partition is left in place — the marking is a metadata-level signal, not an in-place correction or deletion.
"During the reverse shadow phase, if any data quality issues were detected in a specific partition, that partition would be marked in its metadata as having bad data quality." — Source: sources/2026-05-12-meta-migrating-data-ingestion-systems-at-meta-scale
Two operational behaviours follow from the marking¶
The mark is interpreted by partition role:
| Partition role | Bad-quality behaviour |
|---|---|
| Delta partition | New data stops landing + alert sent to operator |
| Target partition | System selects an older partition + merges with more deltas to produce a substitute, transparent to consumers |
The two behaviours are not symmetric — and they shouldn't be. A bad delta would corrupt every downstream computation if allowed to merge forward, so halting CDC consumption is the correct response. A bad target partition is a snapshot of state at a moment — it can be substituted with an older known-good state plus catch-up deltas, producing a fresh target partition that bypasses the corruption without consumer-visible disruption.
Why mark, not delete¶
Three reasons to leave the data and mark it:
- Reversibility. A bad-quality determination might be wrong (false positive); leaving the data lets you un-mark without redoing the computation.
- Rollback substrate. "For rollback, we could quickly query the metadata to find all partitions that were marked with bad data quality and fix them with backfill." The marks form an index of partitions that need backfill — without the marks, finding them post-hoc would require comparing against an external reference.
- Forensic value. Even after remediation, the original bad data is the evidence for understanding why the bug existed — deleting it loses that evidence.
Containment of CDC bad-data propagation¶
This is the operational mechanism for stopping CDC bad-data propagation: a single bad partition is bounded inside its partition rather than allowed to feed forward into every subsequent target-table state. "In this way we could stop bad data propagation quickly."
See patterns/partition-marking-stops-cdc-bleeding for the abstracted operational pattern.
Distinguishing from related primitives¶
- vs soft-delete (e.g.
is_deleted=true): soft-delete is at the row grain; partition quality marking is at the partition grain. - vs partition tombstone: tombstones permanently mark a partition as removed; quality marks are reversible.
- vs DLQ (dead-letter queue): DLQs hold messages that failed processing; partition quality marking holds partitions that succeeded processing but produced suspect output.
- vs error events: error events are emitted on detection but don't change downstream behaviour; quality marks drive downstream behaviour (stop landing / substitute partition).
Seen in¶
- sources/2026-05-12-meta-migrating-data-ingestion-systems-at-meta-scale — Meta's data-ingestion-system migration; canonical wiki instance with both delta-partition + target-partition behaviours specified.
Related¶
- concepts/cdc-bad-data-propagation — the hazard this contains
- concepts/data-quality-checksum-comparison — the detection mechanism that triggers marking
- concepts/blast-radius — the broader containment concept
- patterns/partition-marking-stops-cdc-bleeding — the operational pattern wrapping this
- systems/meta-data-ingestion-system — canonical wiki instance
- companies/meta — company hub