Skip to content

Redpanda — Introducing multi-language dynamic plugins for Redpanda Connect

Summary

Launch post for dynamic plugins in Redpanda Connect v4.56.0 (Beta, Apache 2.0). The feature breaks the previous Go- only, compile-into-the-binary plugin constraint by running plugins as separate OS subprocesses that communicate with the main Redpanda Connect process over gRPC on a Unix domain socket. Language agnosticism is the headline property — Go and Python SDKs ship at launch, anything with gRPC support works in principle. Process isolation is the reliability property — "plugins run in separate processes, so crashes won't take down the main Redpanda Connect engine". Performance cost is IPC serialization, mitigated by restricting plugins to batch input / batch processor / batch output component types so the cross- process hop is amortized over a batch of messages rather than paid per message.

Key takeaways

  1. Dynamic plugins = subprocesses over gRPC/Unix-socket. "Redpanda Connect orchestrates different plugins as subprocesses and uses gRPC over a Unix socket to pass data back and forth between processes. The protocol definition for these plugins closely mirrors the existing interfaces defined for plugins within Redpanda Connect's core engine, Benthos." Each plugin maps to a single subprocess, keeping things "modular and isolated". Canonicalized on the wiki as patterns/grpc-over-unix-socket-language-agnostic-plugin.

  2. Three new built-in wrapper plugins act as the dispatch shim. Redpanda Connect ships three new compiled plugins — one per component type: BatchInput, BatchProcessor, BatchOutput — which act as "wrappers that can load and communicate with external plugin executables". The host process speaks the Benthos-mirrored gRPC service interface; the external executable implements the service. No new plugin component type is exposed to the user — the shim is transparent.

  3. Batch-only component types amortize IPC cost. "We use batch components exclusively to amortize the cost of cross- process communication." Canonicalized as concepts/batch-only-component-for-ipc-amortization. The per-message cost of protobuf serialization + Unix socket traversal + context switch is divided over N messages in a batch; non-batch (single-message) component types would expose the plumbing cost on every record.

  4. Compiled plugins remain the performance-critical path. Explicit architectural guidance: "for performance-critical workloads where every microsecond counts, the best approach remains using native Go plugins compiled directly into the Redpanda Connect binary. Dynamic plugins shine for flexibility and language choice, while compiled plugins offer maximum performance." Canonicalized as patterns/compiled-vs-dynamic-plugin-tradeoff. Not a deprecation of the in-process model — a split by use case.

  5. Subprocess isolation is the fault-containment property. "Plugins run in separate processes, so crashes won't take down the main Redpanda Connect engine." This is the reliability argument distinct from the language-agnosticism argument. A buggy Python plugin segfaulting its interpreter, or a C extension throwing, is confined to the subprocess; the host process survives. Canonicalized as concepts/subprocess-plugin-isolation.

  6. The Python SDK is the headline language target. "The Python SDK opens up Redpanda Connect to one of the most popular languages for data processing and AI/ML workloads." Explicit target use cases: PyTorch / TensorFlow / Hugging Face for real-time ML inference; LangChain / LlamaIndex for LLM orchestration; Pandas / NumPy / SciPy for statistical transforms. The motivating example is "a Python plugin that performs sentiment analysis on customer feedback streams using a pre-trained BERT model from Hugging Face".

  7. **Each plugin is a standalone executable + YAML descriptor

  8. long-running command.** The plugin manifest (plugin.yaml) declares name, type (input / processor / output), command (an argv array like ["uv", "run", "main.py"]), and optional fields. The pipeline YAML references the plugin by name. The executable is launched with rpk connect run --rpc-plugins=plugin.yaml connect.yaml — the CLI is responsible for process spawning and socket path setup.

Systems / concepts / patterns extracted

Operational numbers

None disclosed. No benchmarks on the serialization overhead, no throughput comparison between compiled plugin and equivalent dynamic plugin, no latency p99 on the cross-process hop. The post makes an architectural claim about batch amortization but does not quantify it. Also absent: max plugin count per host, memory cost per plugin subprocess, socket path hygiene / cleanup semantics on crash. All left to reader experience or future documentation.

Caveats

  • Launch / marketing voice. The post announces a Beta feature with "We're particularly excited..." / "We're excited to see..." positioning. Architecture content is real but thin — the core technical disclosure is ~8 sentences on the subprocess + gRPC + Unix-socket + batch-only design. Borderline-Tier-3 include.
  • No gRPC protocol definition published inline. The post says the protocol "closely mirrors" the existing Benthos interfaces but does not show the .proto. For anyone wanting to implement a third-language SDK, the reference is the Go + Python SDK source rather than the post.
  • Beta stability only. v4.56.0 Beta. No SLA on protocol stability across minor versions; the Benthos-mirrored interface could evolve.
  • Apache 2.0 is the declared license — an explicit contrast with the Enterprise-licensed CDC input connectors covered in the 2025-03-18 post. Plugin framework itself is open-source; the connectors built on top may carry different licenses.
  • Single-subprocess-per-plugin assumption. No mention of pooling, multiple worker subprocesses per plugin instance, or how the host scales a CPU-bound Python plugin across cores. One subprocess per plugin is the default and only shape described.

Source

Last updated · 470 distilled / 1,213 read