SYSTEM Cited by 1 source
Amazon Data Firehose¶
Amazon Data Firehose (formerly Amazon Kinesis Data Firehose; rebranded February 2024) is AWS's fully managed capture-buffer-transform-deliver streaming service for real-time data. Producers (applications, AWS-service streams, CloudWatch Metric Streams, Kinesis Data Streams consumers, etc.) write records into a delivery stream; Firehose buffers records by size or time, optionally invokes a transformation Lambda synchronously per record batch, and writes the result to a configured destination. The customer never provisions or manages shards, brokers, or compute — Firehose is the canonical "push streaming records somewhere" managed primitive on AWS.
Destinations and the public-HTTP constraint¶
Firehose's destination set covers the AWS-native sinks (S3, Redshift, OpenSearch, Splunk, Iceberg tables) plus a generic HTTP endpoint destination for partner integrations. The HTTP endpoint destination has one critical constraint, surfaced verbatim in the 2026-05-13 source:
"Amazon Data Firehose natively supports data delivery to HTTP endpoints, but these endpoints must be public — they cannot be private endpoints inside a VPC." (Source)
This is the architectural constraint that motivates the Lambda-as-VPC- bridge pattern: when the destination must be VPC-internal (for data-privacy, regulatory, or network-topology reasons), the Firehose HTTP destination is unusable, and the Lambda transform becomes the bridge — it runs with VPC attachments, receives the Firehose-buffered records synchronously, and pushes them onto an internal HTTP endpoint inside the VPC.
Data transformation: synchronous Lambda invocation¶
Firehose's data-transformation feature invokes a Lambda function synchronously on each buffered record batch. From the 2026-05-13 source: "Amazon Data Firehose buffers incoming data before synchronously invoking the Lambda function that streams the metrics to the internal HTTP endpoint." Architectural implications:
- Return value is the delivery payload, not just success/fail. This is structurally different from the async-event-source pattern Lambda has with Kinesis / SQS / DynamoDB Streams, where Lambda returns success/fail and Firehose / EventBridge drives the delivery to the next stop. With Firehose transforms, the Lambda's return is the data Firehose then attempts to deliver to the destination.
- Synchronous-invocation latency is on the critical path. Cold-start latency, function execution time, and retry-on-failure all back-pressure into Firehose's buffer.
- At-least-once semantics with retries. Firehose retries failed transform invocations and failed destination deliveries; combined with the Lambda transform's potential to re-emit, the producer must tolerate at-least-once delivery.
- S3 as fallback sink. Every Firehose stream is configured with S3 as a destination — when the primary destination delivery succeeds, Firehose can be configured to skip the S3 write entirely, but S3 is the fallback hatch when the primary fails repeatedly.
Role-altitude vs Kinesis Data Streams¶
Firehose and Kinesis Data Streams are siblings on the AWS streaming portfolio but at different altitudes:
| Property | Kinesis Data Streams | Amazon Data Firehose |
|---|---|---|
| Role | Durable streaming buffer with consumer-driven reads | Managed capture-transform-deliver pipe |
| Retention | 24 h–365 d configurable; replay on consumer fault | Buffer-only (no retention); records are forwarded |
| Consumer model | Multiple independent consumers, shard + sequence | Single delivery destination per stream |
| Scaling unit | Shard (provisioned or on-demand) | None — fully managed throughput |
| Code required | Producer + consumer code | None — fully declarative pipe |
| Lambda use | Async event-source mapping (success/fail) | Sync transform invocation (return = payload) |
Use Kinesis Data Streams when you need multi-consumer fan-out, replay, or shard-scoped order; use Firehose when you need push it to one destination with optional transform.
Seen in¶
- sources/2026-05-13-aws-streaming-cloudwatch-metrics-to-vpc-based-opentelemetry-collectors-using-lambda — first canonical wiki naming. Firehose is the central delivery substrate in the architecture: receives CloudWatch Metric Streams records in JSON format, buffers them, invokes a Lambda transform synchronously, and the transform pushes the metrics through an internal NLB to the OpenTelemetry collector fleet on EC2 inside the customer's VPC. The post canonicalises three architectural properties: (1) HTTP endpoint destination is public-only — "these endpoints must be public — they cannot be private endpoints inside a VPC" — driving the Lambda-as-VPC-bridge pattern; (2) synchronous Lambda transform invocation — "Amazon Data Firehose buffers incoming data before synchronously invoking the Lambda function" — making the transform's latency load-bearing; (3) S3 as zero-cost fallback destination — "because our Lambda transform function sends the data directly to OpenTelemetry endpoint, no metrics are sent to the S3 destination, and it does not incur any cost."
Related¶
- systems/amazon-kinesis-data-streams — sibling streaming primitive at a different altitude.
- systems/amazon-cloudwatch-metric-streams — common producer source.
- systems/aws-lambda — the transform-function substrate.
- systems/aws-s3 — the canonical fallback destination.
- patterns/firehose-lambda-transform-as-vpc-bridge — the load-bearing pattern this page documents.