SYSTEM Cited by 1 source
Yelp Redshift Connector¶
Definition¶
Yelp Redshift Connector is Yelp's internal Data Connector that loads records from Yelp's Data Pipeline streams into AWS Redshift for warehouse-resident analytics. It's part of Yelp's broader Data Pipeline ecosystem for streaming data across internal services.
Originally documented in a 2016-10 Yelp Engineering post (not yet on the wiki); surfaces again in the 2025-05-27 Revenue Automation testing post as the component whose latency drove the Staging Pipeline design.
Role on this wiki¶
The 2025-05-27 post cites the connector's behaviour as the motivating constraint behind the Yelp Staging Pipeline architecture. Specifically:
"Publishing the data generated by the pipeline to the Redshift database experiences latency of more than 10 hours before the data becomes available."
Consequence: any verification or integrity check that depends on data landing in Redshift waits ~10 hours after the pipeline completes. For the Revenue Data Pipeline's daily cadence, this makes same-day bug diagnosis infeasible through the production data path — motivating the parallel Staging Pipeline writing to systems/aws-glue catalog tables readable immediately via systems/amazon-redshift-spectrum.
Known characteristics¶
- Input: Yelp Data Pipeline streams.
- Output: Amazon Redshift tables (data warehouse).
- Latency: "approximately 10 hours" disclosed by the 2025-05-27 post for Revenue Data Pipeline output.
- Scope: connector is one of several Data Connectors in Yelp's data-pipeline ecosystem.
Caveats¶
- Stub page. Yelp's 2016-10 original post on the Redshift Connector is not yet ingested on the wiki. Expand when ingested.
- Latency is dataset-specific. The 10-hour number is for Revenue Data Pipeline output as of 2025-05-27; other Data Pipeline streams may have different latencies.
Seen in¶
- sources/2025-05-27-yelp-revenue-automation-series-testing-an-integration-with-third-party-system — cites ~10-hour latency as the motivating constraint for the staging pipeline's Glue+Spectrum design.
External references¶
- Yelp Engineering — Redshift Connector (2016-10): https://engineeringblog.yelp.com/2016/10/redshift-connector.html
Related¶
- systems/amazon-redshift — the destination warehouse
- systems/amazon-redshift-spectrum — the bypass query path
- systems/aws-glue — the bypass output catalog
- systems/yelp-staging-pipeline — built specifically to route around this connector's latency
- companies/yelp
- concepts/redshift-connector-latency