Skip to content

SYSTEM Cited by 1 source

Yelp Redshift Connector

Definition

Yelp Redshift Connector is Yelp's internal Data Connector that loads records from Yelp's Data Pipeline streams into AWS Redshift for warehouse-resident analytics. It's part of Yelp's broader Data Pipeline ecosystem for streaming data across internal services.

Originally documented in a 2016-10 Yelp Engineering post (not yet on the wiki); surfaces again in the 2025-05-27 Revenue Automation testing post as the component whose latency drove the Staging Pipeline design.

Role on this wiki

The 2025-05-27 post cites the connector's behaviour as the motivating constraint behind the Yelp Staging Pipeline architecture. Specifically:

"Publishing the data generated by the pipeline to the Redshift database experiences latency of more than 10 hours before the data becomes available."

Consequence: any verification or integrity check that depends on data landing in Redshift waits ~10 hours after the pipeline completes. For the Revenue Data Pipeline's daily cadence, this makes same-day bug diagnosis infeasible through the production data path — motivating the parallel Staging Pipeline writing to systems/aws-glue catalog tables readable immediately via systems/amazon-redshift-spectrum.

Known characteristics

  • Input: Yelp Data Pipeline streams.
  • Output: Amazon Redshift tables (data warehouse).
  • Latency: "approximately 10 hours" disclosed by the 2025-05-27 post for Revenue Data Pipeline output.
  • Scope: connector is one of several Data Connectors in Yelp's data-pipeline ecosystem.

Caveats

  • Stub page. Yelp's 2016-10 original post on the Redshift Connector is not yet ingested on the wiki. Expand when ingested.
  • Latency is dataset-specific. The 10-hour number is for Revenue Data Pipeline output as of 2025-05-27; other Data Pipeline streams may have different latencies.

Seen in

External references

Last updated · 476 distilled / 1,218 read