Skip to content

Yelp

Yelp Engineering blog (engineeringblog.yelp.com) is a Tier-3 source on the sysdesign-wiki. Yelp operates the local-business-discovery platform (reviews, ratings, search, photos) for US / Canada / parts of Europe; the platform combines a curated business-graph (millions of SKUs in the catalog sense), user-generated reviews, and a search stack that routes the raw query through a query-understanding layer before retrieval and ranking.

Per AGENTS.md Tier-3 guidance, Yelp posts are ingested selectively — only where they explicitly cover distributed- systems internals, scaling trade-offs, infrastructure architecture, production incidents, storage/networking/streaming design, or — as with the 2025-02-04 LLM post — a concrete production serving-infrastructure architecture built around LLMs (distinct from pure ML research).

Wiki anchor

Six on-scope Yelp ingests establish the wiki's Yelp coverage across five distinct stack altitudes (seventh axis added with the 2026-04-07 Cassandra 4.x upgrade ingest — datastore platform / database upgrade):

  • 2025-02-04 — search query understanding with LLMs (LLM / serving-infra axis). Yelp's first-party disclosure of the production serving architecture for three query- understanding tasks (segmentation, spell correction, review- highlight phrase expansion). Canonicalises Yelp's reusable three-phase productionisation playbook (Formulation → Proof of Concept → Scaling Up) and a three-tier serving cascade (pre-computed head cache → offline fine-tuned GPT-4o-mini for 95%+ of traffic → BERT/T5 realtime tail). As a serving architecture, it is the earliest wiki canonical instance of head- cache-plus-tail applied to LLM-driven search query understanding — pre-dating Instacart's 2025-11-13 Intent Engine canonicalisation by nine months.
  • 2025-02-19 — Revenue Automation Series: Building Revenue Data Pipeline (financial-systems / data-platform axis). Yelp's second Revenue Automation Series post (after the 2024-12 billing-system modernisation) on how Yelp built a batch Revenue Data Pipeline feeding a third-party Revenue Recognition SaaS ("REVREC service") to close the books ~50% faster. Documents the methodology stack (Glossary Dictionary → Data Gap Analysis → system-design evaluation across four architectures), architectural selection (Data Lake + Spark ETL wins), and concrete Spark implementation (internal spark-etl package with feature- DAG abstraction, YAML topology-inferred DAG, checkpoint-to- scratch debugging, PySpark UDFs for complex business logic). Three other architectures (MySQL+Python batch, Warehouse+dbt, Event Streams) are explicitly rejected with load-bearing reasons.
  • 2025-04-15 — Journey to Zero Trust Access (corporate- security / networking axis). Yelp's Corporate Systems + Client Platform Engineering teams retire Ivanti Pulse Secure as the employee VPN in favour of Netbird, an open-source WireGuard-based ZTA platform. Five named selection pillars: Okta/OIDC, simple UI, open source, throughput/latency, fault tolerance. Load-bearing architectural disclosures: WireGuard mesh topology with router peers provides <2s transparent failover; OIDC+device-posture access gate replaces SAML-via-Pulse flow; open source provides both response agency and realised upstream contribution ("multiple changes ... pushed upstream to Netbird's main branch from Yelpers"). Canonical instance of concepts/vpn-to-zta-migration as a motion rather than a flip — Netbird coexists with Yelp's MTLS-based Edge Gateway, with VPN utilisation "reducing ... to more granular use cases in the future". Opens Yelp's corporate-security-and-networking axis on the wiki, distinct from the 2025-02-04 search-ML axis and the 2025-02-19 financial-systems axis.
  • 2025-05-27 — Revenue Automation Series: Testing an Integration with Third-Party System (financial-systems / integration-testing deepening). Third Revenue Automation Series post — documents how Yelp verified the pipeline built in 2025-02-19 rather than building anything new. Six-step strategy: (1) a parallel Staging Pipeline consuming production data but publishing to Glue catalog tables on S3 queryable immediately via Redshift Spectrum — bypassing the ~10-hour Redshift Connector latency that makes same-day verification through the production path infeasible; (2) manual test-data backport from production edge cases to dev fixtures; (3) dual-cadence integrity checks (99.99% contract- invoice match threshold for the monthly against billing- system truth; daily lightweight SQL against staging for fast iteration); (4) Schema Validation Batch polling the REVREC mapping API before each upload to guard against partner-side schema drift; (5) SFTP standardised over REST after testing both (reliability + file-size ceiling: 500k-700k records/file SFTP vs 50k/file REST → 4-5 files/day vs 15 files/day); (6) documented escalation for third-party SFTP server / upload-job / storage failures. Deepens the 2025-02-19 axis rather than opening a new one.
  • 2025-09-26 — S3 server access logs at scale (storage / data-engineering axis; opens a fifth Yelp axis on the wiki). First-party retrospective on operationalising S3 Server Access Logs (SAL) at fleet scale ("TiBs of S3 server access logs per day"). Canonicalises the Yelp S3 SAL pipeline: daily Tron batch that converts raw-text SAL objects to Parquet via Athena INSERTs (patterns/raw-to-columnar-log-compaction) achieving 85 % storage + 99.99 % object-count reduction; weekly access-based table joining S3 Inventory with a week of SAL for prefix-granularity retention; tag-based lifecycle expiration via S3 Batch Operations ("the only scalable way to delete per object"patterns/object-tagging-for-lifecycle-expiration). Load-bearing architectural disclosures: Glue partition projection with enum over managed partitions (patterns/projection-partitioning-over-managed-partitions); idempotent Athena INSERT via self-LEFT-JOIN on requestid for shared- resource retry-safety; SAL's best-effort delivery measured at < 0.001 % > 2-day late. Three parsing hazards canonicalised: user-controlled log fields (unescaped quotes / SQLi / shellshock in request_uri / referrer / user_agent) with Yelp's optional non-capturing tail regex fix; URL-encoding idiosyncrasy on S3 keys (most ops double-encode; BATCH.DELETE.OBJECT / S3.EXPIRE.OBJECT single-encode). Preferred over AWS's CloudTrail Data Events on cost: "$1 per million data events — that could be orders of magnitude higher!" First Yelp storage-axis ingest; opens axis distinct from the 2025-02-04 LLM / search-serving-infra axis, the 2025-02-19 / 2025-05-27 financial-systems axis, and the 2025-04-15 corporate-security axis.
  • 2025-07-08 — Exploring CHAOS: Building a Backend for Server-Driven UI (SDUI / client-platform-framework axis; opens a fourth Yelp axis on the wiki). First-party unpacking of the CHAOS backend — the server-driven-UI framework that authors per-request view configurations (views + layouts + components + actions) that Yelp's iOS, Android, and web clients render. Canonical instance of concepts/server-driven-ui. Three architectural layers disclosed: (1) GraphQL surface — a Yelp-internal Apollo Federation subgraph implemented in Python via Strawberry, fronting multiple team-owned REST backends all conforming to a common CHAOS REST API. Canonical instance of patterns/federated-graphql-subgraph-per-domain. (2) Build pipelineChaosConfigBuilderViewBuilderLayoutBuilderFeatureProvider composition with a six-stage feature- provider lifecycle (registersis_qualified_to_loadload_dataresolveis_qualified_to_presentresult_presenter) executed as a two-loop parallel async build (loop 1 fans out upstream calls, loop 2 awaits + composes — max-latency not sum-latency). (3) Advanced primitives preloaded view flows (subsequent_views() + the chaos.open-subsequent-view.v1 action) for predictable sequential navigation without extra round-trips (customer- support FAQ menu example) and view placeholders (ViewPlaceholderV1) for lazy-loaded nested views served by different CHAOS backends (Reminders embedded in Yelp for Business home screen). Three load-bearing correctness mechanisms canonicalised: (a) Register-based client capability matching — first-match Condition(platform=[...], library=[required components and actions]) decides whether a feature is included for this client or dropped, the mechanism that keeps old app versions rendering while new components ship; (b) JSON-string parameters for schema stability — element content carried as opaque JSON inside a stable GraphQL schema so new elements / versions ship without schema churn, with backend Python dataclasses type-checking the payload; (c) error isolation per feature wrapper — an @error_decorator wraps every FeatureProvider; exceptions drop the feature (not the view), unless marked IS_ESSENTIAL_PROVIDER = True; telemetry logs feature name + owner + exception + request context for threshold-based alerting. Post flags that "the latest CHAOS backend framework introduces the next generation of builders using Python asyncio" — the two-loop iteration is a transitional design. No operational numbers (latency, RPS, cache hit rates); walkthrough not retrospective.
  • 2026-02-02 — Back-Testing Engine for Ad Budget Allocation (ad-tech / experimentation axis; opens a sixth Yelp axis on the wiki). First-party disclosure of the Back-Testing Engine — an eight-component hybrid back-testing + simulation system that evaluates proposed changes to the Ad Budget Allocation algorithms against historical campaign data before committing to A/B tests. Canonical instance of concepts/filter-before-ab-test (back-testing filters the candidate space; A/B validates the survivors) and concepts/hybrid-backtesting-with-ml-counterfactual (historical inputs + alternative code path + ML-predicted counterfactual outcomes). Five load-bearing architectural moves canonicalised: (1) Production code as Git Submodules (patterns/production-code-as-submodule-for-simulation) — Budgeting and Billing repos included as submodules pointing at branches under test; "blurs the line between prototyping and production"; (2) CatBoost regressors as counterfactual-outcome predictor (systems/catboost, concepts/counterfactual-outcome-prediction) — same models for all candidates so cross-candidate deltas are attributable to the algorithm, not predictor noise; (3) Poisson sampling over expected values (concepts/poisson-sampling-for-integer-outcomes) — converts the regressor's smooth averages into realistic integer counts; (4) Scikit-Optimize Bayesian search over a YAML-declared search space — 25 max_evals budget, learns from prior candidates; grid + listed search also supported but "not really an optimizer, just a wrapper that yields the next candidate"; (5) MLflow as experiment store + visualization substrate — first non-LLM MLflow Seen-in on the wiki. Overfitting-to-historical-data named explicitly as a limitation (concepts/overfitting-to-historical-data), mitigated organisationally by keeping A/B tests in the loop. Opens ad-tech / experimentation axis distinct from the five prior axes.
  • 2026-04-07 — Zero downtime Upgrade: Yelp's Cassandra 4.x Upgrade Story (datastore-platform / database-upgrade axis; opens a seventh Yelp axis on the wiki). Yelp Database Reliability Engineering upgraded > 1,000 Cassandra nodes from 3.11 to 4.1 on Kubernetes with zero downtime
  • zero incidents + zero client-code changes. Canonical wiki instance of in-place over new-DC at fleet scale (rejected on time + cost + EACH_QUORUM-preservation grounds). Five upgrade patterns canonicalised from this post: (1) version-specific images per Git branch (3.11 and 4.1 images published from dedicated branches, env-var selected at bootstrap); (2) pre-flight / flight / post-flight three-stage upgrade (schema-agreement gate + anti-entropy-repair pause → one-node-at-a-time rolling with dual-Stargate → repairs
  • schema-changes re-enabled); (3) dual-run version-specific proxies for Stargate around Cassandra 4.1's MigrationCoordinator schema-fetch change, with service-mesh alias so clients see one endpoint; (4) benchmark in your own environment (own-env measurement of 4% p99 + 11% mean + 11% throughput aligned with the DataStax whitepaper direction of travel, built confidence that later unlocked the 58% p99 reduction observed in production); (5) production qualification criteria upfront (six-criterion list — performance / functional / security / rollback / observability / component-health — evaluated per cluster). Three Cassandra-specific concepts canonicalised: init-container IP-gossip pre-migration sequencing IP + version changes into two distinct gossip events (CASSANDRA-19244); CDC commit-log write- point change (flush → mutation, CASSANDRA-12148) that breaks CDC consumers at the major-version boundary; schema disagreement on CDC-enabled clusters remediated by dummy multi-node schema changes. Two general-purpose concepts canonicalised: mixed-version cluster as a named operational state; transient mid-upgrade regression vs genuine regression distinction. First wiki first-party operator retrospective on a Cassandra major upgrade — every prior Cassandra Seen-in was third-party explainer or downstream user. Also surfaces Yelp's Cassandra Source Connector (CDC → Kafka) and the Stargate proxy as distinct wiki systems.

Key systems

LLM / search-serving-infra axis (2025-02-04)

  • systems/yelp-query-understanding — the named LLM-powered query-understanding pipeline. Multiple tasks (segmentation + spell correction, review-highlight phrase expansion) with RAG side-inputs (business names viewed for query; top business categories from in-house predictive model) feeding into a cascaded serve path.
  • systems/yelp-search — the parent production context; consumer of the query-understanding outputs (location rewrite into geobox; token-probability of name tag into ranking).

Financial-systems / data-platform axis (2025-02-19)

  • systems/yelp-revenue-data-pipeline — the named batch data pipeline that feeds a third-party Revenue Recognition SaaS. Data Lake + Spark ETL architecture; daily MySQL snapshots → S3 → spark-etl-orchestrated feature DAG → REVREC-template output to S3 → REVREC service → 50% faster book-close.
  • systems/yelp-spark-etl — Yelp's internal PySpark orchestration package. Feature-based DAG abstraction (web-API-shaped source + transformation sub-types), YAML topology-inferred DAG declaration, checkpoint-to-scratch debugging via --checkpoint flag + systems/jupyterhub.
  • systems/yelp-billing-system — Yelp's custom order-to- cash system; stub page pending ingest of the 2024-12 billing-system modernisation post. Upstream source-of-truth for revenue contracts / invoices / fulfillment events.
  • systems/yelp-staging-pipeline — the parallel verification pipeline (2025-05-27). Runs the same code path on production data, publishes to AWS Glue tables on S3, queryable immediately via Redshift Spectrum. Bypasses the ~10-hour Redshift Connector latency for same-day verification loops.
  • systems/yelp-schema-validation-batch — the pre-upload guard (2025-05-27). Polls the third-party REVREC mapping API (REST) before each upload and aborts on schema mismatch on any of three axes (date format, column name, column data type).
  • systems/yelp-redshift-connector — Yelp's named Data Connector that publishes from Data Pipeline streams to Redshift. The ~10-hour latency of this connector is the specific constraint that motivates the staging-pipeline + Glue + Spectrum bypass.

Corporate-security / networking axis (2025-04-15)

  • systems/netbird — the open-source WireGuard-based Zero Trust Access platform Yelp chose; five named selection pillars (Okta/OIDC, simple UI, open source, high throughput, fault tolerance). Canonical instance of mesh topology with router peers for transparent <2s failover. Yelp has contributed upstream fixes.
  • systems/wireguard — the data-plane protocol under Netbird; Yelp's deployment contributed a new corporate-ZTA altitude Seen-in on the wiki's WireGuard page (distinct from Fly.io's gateway-mesh altitude).
  • systems/okta — Yelp's OIDC identity provider for the ZTA substrate; enforces device-posture policies.
  • systems/pulse-secure — the retired predecessor VPN ("a more reliable solution" was needed). Only-instance wiki page.

Storage / data-engineering axis (2025-09-26)

  • systems/yelp-s3-sal-pipeline — the named Yelp system for operationalising S3 Server Access Logs (SAL) at fleet scale. Daily Tron batch compacting TiBs/day of raw-text SAL into Parquet via Athena; weekly access-based retention table; tag-based lifecycle expiration via S3 Batch Operations or direct per-object tagging. 85 % storage + 99.99 % object-count reduction from compaction.
  • systems/tron — Yelp's in-house batch processing system (open-source: github.com/Yelp/Tron); orchestrates the daily SAL compaction and the weekly access-based-retention build. First wiki page.
  • systems/s3-batch-operations — AWS's per-bucket batch- job primitive for PutObjectTagging fanout. Flat $0.25 per bucket per job — the biggest cost contributor for low-volume buckets, driving Yelp's two-scale dispatch rule (direct tag for low-volume, Batch Ops for high-volume).
  • systems/s3-inventory — the daily object-listing used as one side of the access-based retention join with a week of SAL.

SDUI / client-platform-framework axis (2025-07-08)

  • systems/yelp-chaos — the named server-driven-UI framework. Per-request composition of views + layouts + components + actions, delivered as a JSON-stable GraphQL configuration. Six-stage FeatureProvider lifecycle runs features in a two-loop parallel async build; per-feature error isolation keeps bad features from taking down the whole view. Advanced primitives: preloaded view flows for predictable navigation, view placeholders for lazy nested views served by different backends.
  • systems/apollo-federation — the GraphQL substrate. Yelp's Supergraph router composes the CHAOS Subgraph with every other per-domain subgraph. Canonical wiki instance of patterns/federated-graphql-subgraph-per-domain; CHAOS adds the twist that the subgraph itself fronts multiple team-owned REST backends, not a single data store.
  • systems/strawberry-graphql — Python GraphQL library Yelp picked for the CHAOS Subgraph to "leverage type-safe schema definitions and Python's type hints." First wiki instance.

Ad-tech / experimentation axis (2026-02-02)

  • systems/yelp-back-testing-engine — the named eight-component system that simulates proposed ad-budget- allocation algorithm changes against historical campaign data. Canonicalises the hybrid back-testing + simulation methodology (historical inputs + alternative code path + ML-predicted counterfactual outcomes) and positions back-testing as the discovery phase upstream of A/B validation.
  • systems/yelp-ad-budget-allocation — the parent system the Back-Testing Engine simulates; splits each campaign's monthly budget between on-platform Yelp inventory and the off-platform Yelp Ad Network, with day-by-day budget decisions that depend on previous days' outcomes (the feedback loop that makes naive aggregate-math simulation wrong).
  • systems/scikit-optimize — the Bayesian-optimization library Yelp uses as the Engine's default optimizer; the other search types (grid, listed) are "just wrappers".
  • systems/catboost — the gradient-boosted regressors predicting impressions / clicks / leads from budget + campaign features. Non-parametric by design to capture diminishing returns; shared across candidates for fair comparison.
  • systems/mlflow — the experiment store + visualization substrate; first non-LLM-evaluation MLflow Seen-in on the wiki, reinforcing MLflow as domain-general experiment database rather than LLM-specific.

Datastore-platform / database-upgrade axis (2026-04-07)

  • systems/apache-cassandra — the target datastore. Yelp's

    1,000-node Cassandra fleet runs on Kubernetes via a Cassandra operator; this ingest is the wiki's first first-party operator retrospective on a Cassandra major-version upgrade.

  • systems/stargate-cassandra-proxy — the DataStax open-source Cassandra data-proxy; runs as two version- specific fleets in parallel during Yelp's upgrade window under a single service-mesh alias.
  • systems/cassandra-source-connector — Yelp's in-house CDC → Kafka bridge; two components split on rollout (DataPipeline Materializer fleet-wide pre-upgrade; CDC Publisher in-lockstep per pod).
  • systems/kubernetes-init-containers — the Kubernetes primitive Yelp uses to sequence the simultaneous new IP + new Cassandra version change into two distinct gossip-observable events (CASSANDRA-19244).
  • systems/spark-cassandra-connector — listed as a component that had to be made 4.1-compatible; Yelp's use documented in an earlier 2024-09 post on direct Spark→Cassandra ingestion for ML pipelines.
  • systems/yelp-pushplan-automation — Yelp's declarative Cassandra schema-change system; user-initiated schema changes were disabled for the duration of each cluster upgrade.

Key concepts and patterns

LLM / search-serving-infra axis (2025-02-04)

Corporate-security / networking axis (2025-04-15)

Financial-systems / data-platform axis (2025-02-19)

Financial-systems / integration-testing axis (2025-05-27)

Storage / data-engineering axis (2025-09-26)

  • concepts/s3-server-access-logs — the AWS primitive Yelp operationalises at fleet scale. Best-effort delivery, 25+-field line format, SimplePrefix vs PartitionedPrefix + EventTime delivery options. New concept canonicalised on this ingest.
  • concepts/partition-projection — Glue/Athena partitioning primitive that avoids MSCK REPAIR / metastore-lookup overhead; enum vs injected types; 1M partition cap on enum if unconstrained. New concept canonicalised on this ingest.
  • concepts/best-effort-log-delivery — the delivery- semantics tier Yelp accepts on SAL; measured < 0.001 % > 2-day late. Load-bearing for why prefix-granularity retention is safe. New concept canonicalised on this ingest.
  • concepts/athena-shared-resource-contention — Athena's shared-cluster model with TooManyRequestsException + per- account / per-region DML concurrency quotas; retry-safe job design is mandatory. New concept canonicalised on this ingest.
  • concepts/user-controlled-log-fields — the general hazard Yelp documents on SAL (request_uri, referrer, user_agent carry unescaped arbitrary bytes). New concept canonicalised on this ingest.
  • concepts/url-encoding-idiosyncrasy-s3-keys — most SAL operations double-encode key; BATCH.DELETE.OBJECT / S3.EXPIRE.OBJECT single-encode. Naive url_decode(url_decode(key)) unsafe. New concept canonicalised on this ingest.
  • patterns/raw-to-columnar-log-compaction — daily compact-to-Parquet pattern; Yelp's 85 % storage + 99.99 % object-count reduction is the canonical datapoint. New pattern canonicalised on this ingest.
  • patterns/object-tagging-for-lifecycle-expiration — tag each object + tag-based lifecycle policy; the only scalable per-object deletion primitive at fleet scale; composes with S3 Batch Operations PutObjectTagging (Delete is not a supported action). New pattern canonicalised on this ingest.
  • patterns/idempotent-athena-insertion-via-left-join — self-LEFT-JOIN on target's unique column to make INSERT INTO ... SELECT retry-safe; partition filters duplicated in ON and WHERE for planner-pruning. New pattern canonicalised on this ingest.
  • patterns/projection-partitioning-over-managed-partitions — choose partition projection when prefix template is known; avoids MSCK REPAIR churn + metastore-lookup planning latency. New pattern canonicalised on this ingest.
  • patterns/s3-access-based-retention — inventory ⋈ SAL at prefix granularity; equality-join beats LIKE-join (70k rows: 5 min → 2 sec); prefix granularity is what makes best-effort SAL delivery acceptable as access signal. New pattern canonicalised on this ingest.
  • patterns/optional-non-capturing-tail-regex — wrap user-controlled tail fields of a log regex in (?:<rest>)? so the non-user-controlled prefix always parses; empty parsed rows are the failure signal. New pattern canonicalised on this ingest.

Ad-tech / experimentation axis (2026-02-02)

  • concepts/filter-before-ab-test — the experimentation- workflow position Yelp occupies with the Back-Testing Engine: cheap pre-filter (back-testing) before expensive validation (A/B testing); A/B is preserved for final validation rather than discovery. New concept canonicalised on this ingest.
  • concepts/hybrid-backtesting-with-ml-counterfactual — the methodology Yelp named explicitly as "not a pure back-testing approach, but rather a hybrid that combines elements of both simulation and back-testing". Historical inputs + alternative code path + ML-predicted counterfactual outcomes. New concept canonicalised on this ingest.
  • concepts/counterfactual-outcome-prediction — the sub-concept: CatBoost regressors predict outcomes that never actually happened under the alternative treatment; non-parametric so they capture diminishing returns on budget. New concept canonicalised on this ingest.
  • concepts/poisson-sampling-for-integer-outcomes — the trick that converts the regressor's smooth averages into realistic integer counts, restoring live-system stochasticity to the simulation. New concept canonicalised on this ingest.
  • concepts/bayesian-optimization-over-parameter-space — sequential model-based optimization; Yelp's default via Scikit-Optimize. Grid + listed are "just wrappers that yield the next candidate". New concept canonicalised on this ingest.
  • concepts/overfitting-to-historical-data — the named risk. Yelp's mitigation is organisational (keep A/B tests in the loop), not technical. New concept canonicalised on this ingest.
  • patterns/production-code-as-submodule-for-simulation — the fidelity primitive. Budgeting and Billing repos as Git Submodules pointing at branches under test; "blurs the line between prototyping and production". New pattern canonicalised on this ingest.
  • patterns/historical-replay-with-ml-outcome-predictor — the full simulation-loop shape (historical inputs + alternative code via submodule + ML outcome predictor + Poisson sampling); generalises to dynamic pricing, recommendation-ranking, bandit-policy domains. New pattern canonicalised on this ingest.
  • patterns/yaml-declared-experiment-config — the configuration surface (date range, search space, search strategy, metric, max_evals) Yelp's Back-Testing Engine consumes. New pattern canonicalised on this ingest; sibling of the 2025-02-19 yaml-declared-feature-dag (same Yelp YAML-config discipline applied to different problem).

Datastore-platform / database-upgrade axis (2026-04-07)

Key model zoo (named by the 2025-02-04 post)

  • systems/gpt-4 — the formulation-phase LLM; also used to generate golden datasets for distillation.
  • systems/o1-preview / systems/o1-mini — reserved for "newer and more complex tasks that require logical reasoning".
  • systems/gpt-4o-mini — the fine-tuned offline student; ~100× cost reduction vs. direct GPT-4 prompt at equivalent quality on query-understanding tasks.
  • systems/bert / systems/t5 — the realtime tail- query models; production serving tier for never-seen-before queries that miss the cache.

Recent articles

Last updated · 476 distilled / 1,218 read