CONCEPT Cited by 1 source
Data Gap Analysis¶
Definition¶
Data Gap Analysis is the exercise of reconciling a source system's native data shape against a target system's required template, producing a structured two-axis output:
- Immediate solutions — approximations, composite fields, pre-processing — that unblock integration today without requiring structural source-system changes.
- Long-term solutions — foundational changes (unified schemas, new dedicated subsystems, product-catalog services) that eventually eliminate the need for approximation.
Yelp's 2025-02-19 post canonicalises the two-axis output format as the deliverable of a data gap analysis between their custom order-to-cash system and a third-party revenue-recognition SaaS ("REVREC service").
The two gap classes from Yelp's case study¶
Gap class 1: Data Gaps (missing mapping)¶
Issue: No direct mapping between fields in Yelp's system and the 3rd Party System.
Immediate solutions:
- Approximation — use gross billing amounts as list prices where the true standalone price isn't recorded.
- Composite data — create unique monthly contract identifiers by combining product unique IDs with revenue periods.
Long-term solution: Develop a dedicated Product Catalog system for common product attributes.
Gap class 2: Inconsistent Product Implementation¶
Issue: Data attributes scattered across various databases for different products.
Immediate solutions:
- Pre-process data from different tables into a unified schema before feeding the pipeline.
- Categorise and pre-process data by event type (subscription, billing, fulfillment).
Long-term solution: Propose unification of billing data models into a centralised schema for future needs.
Why the two-axis format matters¶
Most data-mapping analyses produce a flat list of "mismatches" that implicitly demand either all-at-once structural fixes (too expensive to ship in reasonable time) or pure hacks (no path to ever improve). The two-axis format forces the analyst to answer both questions:
- What approximation unblocks us today?
- What's the right fix we'd like to converge on?
The value of answering both at once:
- Unblocks integration work — engineering can start building against the approximations without waiting for the ideal schema.
- Creates a roadmap — the long-term column becomes the backlog for structural simplification work. Yelp's list of "Future Improvements" in the same post (unified billing data models, Product Catalog system) is the direct output of this column.
- Makes the debt visible — approximations that aren't paid down live forever. Naming the long-term fix per approximation creates the contract that someone will do the work.
When to use¶
- Third-party integrations where a SaaS / external system requires a specific template you must map to (revenue recognition, billing, tax, fraud scoring, shipping, analytics tools).
- Heterogeneous source consolidation where multiple internal systems produce the same logical data with different shapes (multi-product pipelines, multi-region billing).
- Legacy-to-modern migrations where gradual shape evolution is required without a big-bang rewrite.
Related methodology¶
Complements Glossary Dictionary Requirement Translation:
- Glossary Dictionary aligns vocabulary — "what does 'invoice amount' mean?"
- Data Gap Analysis aligns data shape — "which field in which table has invoice amount, does it exist in all products, and how do we reshape it for the target template?"
Together they cover the cross-functional + technical halves of third-party-integration project scoping.
Yelp's outcome attribution¶
"The outcome of data gap analysis not only clarified the immediate solutions to support automated revenue recognition with status quo data, but also gave us better direction on long term investment that would make the custom order-to-cash system closer to industry standards so that the data mapping would be more straightforward."
This is the load-bearing claim: the analysis output is both a tactical unblock and a strategic roadmap, and naming both explicitly is what makes the deliverable valuable beyond the immediate integration.
Seen in¶
- sources/2025-02-19-yelp-revenue-automation-series-building-revenue-data-pipeline — canonical wiki instance. Two gap classes (Data Gaps, Inconsistent Product Implementation) each documented with (a) immediate approximation, (b) composite-data fallback, (c) long-term structural fix.
Related¶
- concepts/glossary-dictionary-requirement-translation — the vocabulary-alignment complement.
- concepts/revenue-recognition-automation — the domain motivating the Yelp example.
- concepts/schema-evolution — the long-term fix typically requires schema evolution across multiple services.
- patterns/business-to-engineering-requirement-translation — the portable pattern.
- companies/yelp