Skip to content

SYSTEM Cited by 1 source

Amazon Data Firehose

Amazon Data Firehose (formerly Amazon Kinesis Data Firehose; rebranded February 2024) is AWS's fully managed capture-buffer-transform-deliver streaming service for real-time data. Producers (applications, AWS-service streams, CloudWatch Metric Streams, Kinesis Data Streams consumers, etc.) write records into a delivery stream; Firehose buffers records by size or time, optionally invokes a transformation Lambda synchronously per record batch, and writes the result to a configured destination. The customer never provisions or manages shards, brokers, or compute — Firehose is the canonical "push streaming records somewhere" managed primitive on AWS.

Destinations and the public-HTTP constraint

Firehose's destination set covers the AWS-native sinks (S3, Redshift, OpenSearch, Splunk, Iceberg tables) plus a generic HTTP endpoint destination for partner integrations. The HTTP endpoint destination has one critical constraint, surfaced verbatim in the 2026-05-13 source:

"Amazon Data Firehose natively supports data delivery to HTTP endpoints, but these endpoints must be public — they cannot be private endpoints inside a VPC." (Source)

This is the architectural constraint that motivates the Lambda-as-VPC- bridge pattern: when the destination must be VPC-internal (for data-privacy, regulatory, or network-topology reasons), the Firehose HTTP destination is unusable, and the Lambda transform becomes the bridge — it runs with VPC attachments, receives the Firehose-buffered records synchronously, and pushes them onto an internal HTTP endpoint inside the VPC.

Data transformation: synchronous Lambda invocation

Firehose's data-transformation feature invokes a Lambda function synchronously on each buffered record batch. From the 2026-05-13 source: "Amazon Data Firehose buffers incoming data before synchronously invoking the Lambda function that streams the metrics to the internal HTTP endpoint." Architectural implications:

  • Return value is the delivery payload, not just success/fail. This is structurally different from the async-event-source pattern Lambda has with Kinesis / SQS / DynamoDB Streams, where Lambda returns success/fail and Firehose / EventBridge drives the delivery to the next stop. With Firehose transforms, the Lambda's return is the data Firehose then attempts to deliver to the destination.
  • Synchronous-invocation latency is on the critical path. Cold-start latency, function execution time, and retry-on-failure all back-pressure into Firehose's buffer.
  • At-least-once semantics with retries. Firehose retries failed transform invocations and failed destination deliveries; combined with the Lambda transform's potential to re-emit, the producer must tolerate at-least-once delivery.
  • S3 as fallback sink. Every Firehose stream is configured with S3 as a destination — when the primary destination delivery succeeds, Firehose can be configured to skip the S3 write entirely, but S3 is the fallback hatch when the primary fails repeatedly.

Role-altitude vs Kinesis Data Streams

Firehose and Kinesis Data Streams are siblings on the AWS streaming portfolio but at different altitudes:

Property Kinesis Data Streams Amazon Data Firehose
Role Durable streaming buffer with consumer-driven reads Managed capture-transform-deliver pipe
Retention 24 h–365 d configurable; replay on consumer fault Buffer-only (no retention); records are forwarded
Consumer model Multiple independent consumers, shard + sequence Single delivery destination per stream
Scaling unit Shard (provisioned or on-demand) None — fully managed throughput
Code required Producer + consumer code None — fully declarative pipe
Lambda use Async event-source mapping (success/fail) Sync transform invocation (return = payload)

Use Kinesis Data Streams when you need multi-consumer fan-out, replay, or shard-scoped order; use Firehose when you need push it to one destination with optional transform.

Seen in

  • sources/2026-05-13-aws-streaming-cloudwatch-metrics-to-vpc-based-opentelemetry-collectors-using-lambdafirst canonical wiki naming. Firehose is the central delivery substrate in the architecture: receives CloudWatch Metric Streams records in JSON format, buffers them, invokes a Lambda transform synchronously, and the transform pushes the metrics through an internal NLB to the OpenTelemetry collector fleet on EC2 inside the customer's VPC. The post canonicalises three architectural properties: (1) HTTP endpoint destination is public-only"these endpoints must be public — they cannot be private endpoints inside a VPC" — driving the Lambda-as-VPC-bridge pattern; (2) synchronous Lambda transform invocation"Amazon Data Firehose buffers incoming data before synchronously invoking the Lambda function" — making the transform's latency load-bearing; (3) S3 as zero-cost fallback destination"because our Lambda transform function sends the data directly to OpenTelemetry endpoint, no metrics are sent to the S3 destination, and it does not incur any cost."
Last updated · 542 distilled / 1,571 read