PATTERN Cited by 1 source
SFTP for bulk daily upload¶
Summary¶
For bulk, daily, multi-hundred-thousand-record uploads to a third-party system, pick SFTP over REST API. Three named axes from Yelp's 2025-05-27 decision: reliability, file-size ceiling, and setup complexity. Counterintuitive because REST is the default first-reach choice for modern integrations — production reality at Yelp overrode that default.
When to use¶
Applies when all hold:
- Target system supports both REST API upload and SFTP.
- Upload volumes are large (hundreds of thousands of records per file, multiple files per day).
- Cadence is periodic (daily / hourly), not real-time.
- Both directions have acceptable reliability-vs-ease-of-use tradeoffs in test.
Does not apply when:
- Real-time or near-real-time upload is required (SFTP adds polling latency on the consuming side).
- Small record volumes where REST's 50k-per-file ceiling is fine.
- Target system doesn't offer SFTP.
Empirical basis (Yelp, 2025-05-27)¶
Yelp's Revenue Recognition Team built both integrations, tested them, then picked SFTP for production. Specific numbers:
| Axis | REST API | SFTP |
|---|---|---|
| File size ceiling | 50,000 records/file | 500,000-700,000 records/file |
| Daily file count | ~15 | 4-5 |
| Month-end file count | ~50 | few dozen |
| Reliability in test | "flaky", multiple retries | stable |
| Setup complexity | higher | lower |
Three quotes from the post:
- "Testing revealed that the availability and performance of REST APIs posed reliability concerns. API responses were found to be flaky, which lead to inconsistent uploads and multiple retries."
- "REST APIs have a predefined limit of 50,000 records per file, which resulted in approximately 15 files being generated on a daily basis and approximately 50 files during the month-end closing period."
- "The SFTP upload setup process was found to be less complex when compared to the API set up."
Yelp standardises on SFTP across "multiple pipelines at Yelp, such as the revenue contract and invoice data pipelines."
Why SFTP wins for this shape¶
Three underlying reasons for the three axes:
- Reliability: SFTP is protocol-simple — file upload over a persistent SSH channel. REST APIs involve more indirection: request parsing, application-layer handlers, middleware, database writes — more layers means more failure modes. Vendor REST integration APIs are often less hardened than their core product.
- File size ceiling: REST uploads tend to be memory- bounded server-side (the receiving handler buffers). SFTP streams files to disk. Larger files per transfer → fewer transfers → lower operational surface.
- Setup complexity: SFTP needs a username + SSH key + a host; REST APIs involve auth tokens, header shape, multipart-form-data body, content-type negotiation, template-name headers. More knobs means more things to misconfigure.
Counterintuitive default¶
REST is the obvious first choice for a modern third-party integration. Yelp's post is explicit about this: "It is common practice to opt for standard HTTP methods, which are easy to use and scalable. This option was initially explored and used for uploads during the development phase."
Then production experience overrode the default. The pattern is worth documenting precisely because it's against the grain — the modern reach is "REST everything", but for bulk daily file uploads to a third-party SaaS, the older protocol wins.
Caveats¶
- Not a universal recommendation. For real-time event ingestion, REST / gRPC / messaging protocols beat SFTP soundly.
- Vendor-specific. Some third-parties don't offer SFTP; some only offer REST. The pattern applies where both exist and you're choosing.
- SFTP adds its own ops burden. SSH key rotation, host- key pinning, channel allocation — real work. Yelp notes "less complex" on initial setup but long-term SFTP ops is not free.
- Encryption-in-transit equivalence is assumed. SFTP (via SSH) and REST-over-HTTPS both encrypt; neither has a security advantage for in-transit. Compliance requirements may still specify one or the other.
Related patterns¶
- patterns/schema-validation-pre-upload-via-mapping-api — pairs naturally with SFTP: both support bulk, batched uploads to a third-party.
- patterns/parallel-staging-pipeline-for-prod-verification — the verification layer before the bulk upload.
Seen in¶
- sources/2025-05-27-yelp-revenue-automation-series-testing-an-integration-with-third-party-system — canonical instance. Three named axes (reliability, file size, setup); concrete numbers (50k vs 500k-700k records); both protocols curl examples. Yelp's production standardisation.