Skip to content

PATTERN Cited by 1 source

Streaming markdown-to-native conversion

Definition

An adapter-layer pattern for chat platforms that do not natively stream rich markdown: the adapter receives a live markdown token stream from the LLM, and at each intermediate edit of the visible message it converts the current accumulated markdown to the platform's native formatting dialect (e.g. Slack's mrkdwn variant, rich text blocks, or plain text) — so the user never sees literal markup like **bold** even mid-stream.

This is contrasted with the native streaming path (Slack's case), where the platform renders bold, italic, lists, and other formatting inline as the response arrives, without adapter-side intervention.

"Slack has a native streaming path that renders bold, italic, lists, and other formatting in real time as the response arrives. Other platforms use a fallback streaming path, passing streamed text through each adapter's markdown-to-native conversion pipeline at each intermediate edit."

(Source: sources/2026-04-21-vercel-chat-sdk-brings-agents-to-your-users.)

The failure mode it fixes

Before the conversion pipeline existed in Chat SDK, "adapters received raw markdown strings, so users on Discord or Teams would see literal **bold** syntax until the final message resolved." Intermediate edits would repeatedly post-then-patch the raw markdown, and the user's eye would catch the literal asterisks and hashes during the stream, only to see them vanish in the final render — an actively worse experience than posting the full message at the end.

Structure

Per intermediate edit on a non-native-streaming platform:

  1. Accumulate the LLM's incremental token delta into a running markdown buffer.
  2. Parse the buffer as (partial) markdown — the parser must tolerate unclosed constructs (**bold with no closing **, an incomplete fenced code block, a list marker without a following item).
  3. Render to the platform's native dialect — Slack mrkdwn, Discord embed text, GitHub/Linear markdown, etc.
  4. Patch the on-platform message to the new rendered output, using whatever edit primitive the platform provides.

Step (2) is the hard part: tolerant streaming-markdown parsing. The post's "partial bold" case (**bo) should render as either literal **bo or eagerly-rendered bo with bold pending — the SDK must pick one and be consistent, because flipping between them mid-stream causes flicker.

Cost structure

Each intermediate edit costs: - one parse-and-render pass on the accumulated buffer (cheap, CPU-bound); - one platform-API edit call (not free — rate-limited by platform); - user-visible flicker if the edit rate is high.

Implementations typically rate-limit edit frequency (e.g. ≤ 1 edit / sec) and batch token deltas between edits. The post doesn't disclose Chat SDK's specific rate-limiting policy.

The Slack append-only streaming sub-case

Slack's native append-only streaming API is a special case the conversion pipeline still has to interoperate with: "while models generate standard markdown, Slack does not natively support it. Chat SDK converts standard markdown to the Slack variant automatically. This conversion happens in real time, even when using Slack's native append-only streaming API."

So even on Slack — which has native streaming — Chat SDK runs a markdown-dialect conversion layer (standard markdown → Slack mrkdwn), just not the intermediate-edit patch-post pattern used on Discord / Teams. Two distinct conversions, both streaming-safe.

Relation to v0's LLM Suspense

This pattern is the chat-platform cousin of the streaming output rewrite pattern disclosed in Vercel's 2026-01-08 v0 retrospective, where v0 rewrites token-stream content on the fly during streaming to prevent users from seeing intermediate incorrect state (e.g. import { Triangle as VercelLogo } before the embedding-based name resolver fixes it).

The underlying insight is the same in both patterns: a streaming token stream is a rendering problem, not just a delivery problem. The rendering has to be live-safe under incremental growth.

Trade-offs

  • Edit-rate ceiling. Platform edit APIs are rate-limited; the conversion pipeline must batch. Batching trades smoothness for safety.
  • Parser incompleteness. Tolerant markdown parsing is imperfect; some edge cases will flicker.
  • Round-trip overhead. Even at 1 edit/sec, the platform round-trip adds visible latency vs Slack's native streaming path.
  • Not all platforms support edit. WhatsApp's adapter "does not support message history, editing, or deletion" — so the streaming conversion pattern can't apply, and the adapter instead uses auto-chunking (another WhatsApp-adapter feature named in the post). The pattern's applicability is platform-conditional.

Seen in

Last updated · 476 distilled / 1,218 read