SYSTEM Cited by 1 source

Yelp content-fetching engine¶

Definition¶

The Yelp content-fetching engine is the single internal API that returns business content "all or selected sources" in an LLM-friendly shape at <100 ms p95. It is the load-bearing abstraction that lets Biz Ask Anything (and any future business-centric LLM application at Yelp) consume Yelp's curated content without each consumer re-implementing index coordination, Cassandra EAV access, and shape translation. Canonical reference: 2026-03-27 Yelp Engineering post (sources/2026-03-27-yelp-building-biz-ask-anything-from-prototype-to-product).

Architectural position¶

 caller (BAA question-analysis → retrieval step)
      │
      ▼
 ┌────────────────────────────────────────────────┐
 │     Content-fetching engine                    │
 │   - input: business_id + sources + keywords    │
 │   - output: LLM-friendly string bundle         │
 │   - SLO: p95 < 100 ms                          │
 └───┬──────────┬──────────┬────────────┬─────────┘
     │          │          │            │
     ▼          ▼          ▼            ▼
 ┌────────┐┌────────┐┌──────────┐┌─────────────┐
 │Reviews ││Photos  ││Website / ││ Cassandra   │
 │NRT     ││NRT +   ││menus /   ││ structured  │
 │index   ││embed + ││AtC NRT   ││ facts       │
 │        ││caption ││index     ││ (EAV)       │
 └────────┘└────────┘└──────────┘└─────────────┘

The caller specifies the business ID, the subset of sources to consult (chosen by the BAA Content Source Selection step), and keywords (when applicable, emitted by the Keyword Generation step). The engine returns a bundled LLM-friendly response in under 100 ms at p95.

Data backing layers¶

Three NRT indices¶

From the post:

"Three near-real-time indices for a) reviews, b) photos and photo embeddings, and c) website/menu content and Ask the Community feature on the business page. Reviews and photos are large and metadata-rich, so each review and each photo is a document."

Index	Content	Freshness	Retrieval
Reviews	Each review is a document	<10 min (streamed)	Keyword
Photos	Photo metadata + embeddings + captions	<10 min (streamed)	Embedding + caption-text hybrid
Website / menu / AtC	Owner-authored + community content	Weekly batch	Keyword

Cassandra EAV structured-facts store¶

From the post:

"A Cassandra store with an EAV (Entity-Attribute-Value) schema for business structured information. In particular a (business_id, field_name, field_group, value, update_ts) table."

Freshness: <10 min via streaming. The deliberately-non-normalised EAV shape is permissioned by the LLM-downstream consumer: the LLM accepts unstructured strings, so the engine doesn't need to validate structured types against a rigid schema. See concepts/eav-schema-for-llm-consumption.

Ingestion¶

Streaming from source-of-truth databases → joins / transforms → data pipeline → Cassandra + NRT indexers.
Weekly batches for websites, menus, AtC.
Replayability + idempotent upserts required — "some datasets are derived from chains of 3-4 streams, so replayability and idempotent upserts are required" (Yelp's framing applies the concepts/stream-replayability-for-iterative-pipelines discipline at the business-content layer).

Why "one content-fetch API" is load-bearing¶

From the post's Key Learnings:

"One content-fetch API helps everyone. A single interface that returns 'all' or selected sources in an LLM-friendly shape reduces coupling. It's an abstraction that helps other business-centric LLM applications besides ours."

The API is an LLM-facing primitive, not a general-purpose read API. Three consequences:

Shape translation is the API's job, not the caller's. Cassandra rows, Lucene hits, photo embeddings — all collapsed to text suitable for prompt injection.
Source selection is a caller parameter. The caller decides which subset of sources to consult before the fetch; the fetch honours that subset.
Sub-100 ms p95 is the permission structure for serial post-processing. Downstream steps ( Aho-Corasick snippet extraction, prompt composition, token generation) can each chew a chunk of the latency budget knowing retrieval is <100 ms.

Reads & writes¶

Reads: <100 ms p95, "all or selected sources" shape.
Writes: out of scope for this API — writes go through the streaming + weekly-batch ingestion path.
Idempotency: upserts into Cassandra + NRT indices are idempotent so replay of a stream from a failure point doesn't corrupt state.

Recovery & correctness¶

From the post:

"When ingestion issues surface (e.g., bad joins or transforms), we replay streams from the point of failure; some datasets are derived from chains of 3-4 streams, so replayability and idempotent upserts are required."

This is the log-as-truth / database-as-cache stance applied at the business-content layer — see concepts/log-as-truth-database-as-cache.

Design Choices¶

Stream where staleness is noticed; batch the rest. Yelp limits streaming to reviews + photos + business properties "where freshness is key, and kept websites/Ask the Community business page feature on weekly batches." Stream debugging is cited as cumbersome (and so used sparingly).
Store data in the shape you read it. Putting data together before reading avoids large fan-outs, hot-path joins, and other reliability issues. The engine composes once at ingest time so the read path is cheap.
Keyword-first was the right v1. With existing IR/ranking the team shipped faster with keyword search + an LLM expansion step; embeddings-based retrieval for reviews + website is the named followup.

Downstream consumers¶

BAA — canonical consumer. The engine is the retrieval step of BAA's life- of-a-question.
Other business-centric LLM applications at Yelp — explicitly named as future consumers (not detailed in the post).

Tradeoffs / gotchas¶

<100 ms p95 envelope says nothing about tail beyond p95 — the post doesn't quote a p99 / p99.9 number.
Keyword-first recall limit — generic-intent questions ("What's the ambiance here?") are known to have weaker recall; the engine emits the signal but the caller's RAG quality inherits the limitation.
Schema drift caught late in EAV — the EAV permission structure pays for schema flexibility with the loss of read-time type checking; new attributes appear without schema changes, and callers must defensively handle unknown field_names.
Replay is expected, not exceptional — the 3-4-stream dependency chain means a single ingestion bug can require replay across the whole chain. The post treats this as design, not a failure mode.
Consumer-API authorisation / rate-limiting not disclosed — how Yelp governs access across consumers isn't described.

Seen in¶

sources/2026-03-27-yelp-building-biz-ask-anything-from-prototype-to-product — canonical first-party reference.

systems/yelp-biz-ask-anything — the canonical consumer.
systems/apache-cassandra — the structured-facts store.
systems/yelp-assistant — parent brand of the consumer.
concepts/eav-schema-for-llm-consumption — the load- bearing schema choice.
concepts/retrieval-augmented-generation — the downstream architectural shape.
concepts/stream-replayability-for-iterative-pipelines — the ingestion discipline.
concepts/log-as-truth-database-as-cache — the stance applied at the business-content layer.
companies/yelp

Yelp content-fetching engine¶

Definition¶

Architectural position¶

Data backing layers¶

Three NRT indices¶

Cassandra EAV structured-facts store¶

Ingestion¶

Why "one content-fetch API" is load-bearing¶

Reads & writes¶

Recovery & correctness¶

Design Choices¶

Downstream consumers¶

Tradeoffs / gotchas¶

Seen in¶

Related¶