Make your text streams feel nice!

Smooth Streaming

When I was working on Text Streaming, I noticed that streams within AI apps like OpenAI, Perplexity, have a different feel when it comes to displaying inbound text. Mine feels slightly jankier, and turns out Smooth Streaming is probably the reason why.

Playground

You can edit the markdown text on the left, and try streaming the text from my POST /api/echo endpoint, and compare the feeling when SMOOTH is ON or OFF.

Hit SOUND to enable audio. Each chunk plays a click so you can hear the difference in pacing.

RANDOMNESS15%
SPEED1.0x
AWAITING STREAM...
IDLESMOOTH OFF
1648 chars

Feel the difference? SMOOTH ON gives you a steady flow of beepboops, but SMOOTH OFF gives you inconsistent beepboops.

waow so smooth? woaw

Hitting STREAM calls a POST /api/echo endpoint that simulates bursty LLM output using a . It dumps 3–8 chunks rapidly (near-instant), then pauses to simulate "thinking" before the next burst:

const stream = new ReadableStream({
  async start(controller) {
    let cursor = 0;
    while (cursor < text.length) {
      // Each burst: 3–8 chunks fired rapidly
      const burstSize = 3 + Math.floor(Math.random() * 6);
      for (let i = 0; i < burstSize && cursor < text.length; i++) {
        const chunkSize = 1 + Math.floor(Math.random() * 12);
        const chunk = text.slice(cursor, cursor + chunkSize);
        cursor += chunkSize;
        controller.enqueue(encoder.encode(chunk));
        // Tiny delay within a burst (0–5ms) — near-instant
        await sleep((Math.random() * 5) / speedMultiplier);
      }
      // Pause between bursts: "thinking" time
      const pause = 80 + Math.random() * 200;
      const stallBonus = Math.random() < stallProbability
        ? 150 + Math.random() * 300 : 0;
      await sleep((pause + stallBonus) / speedMultiplier);
    }
    controller.close();
  },
});

The three controls in the playground map directly to this code:

  • Randomness sets stallProbability, which controls how often the stream randomly pauses mid-burst.
  • Speed sets speedMultiplier, which scales all delays up or down.
  • Smooth toggles whether the client pipes the raw response through a word transform and timed buffer before rendering.

Client Streaming

Web APIs have a composable primitive called that lets you perform transformations between streams. If you've used RxJS pipes before (I had to for a functional programming class in uni), this will feel familiar, except it's just the platform.

You start with the from fetch, pipe it through one or more transforms using , and read the result on the other end.

Here's what it looks like to chain two transforms together:

const wordStream = response.body
  .pipeThrough(createWordTransform())
  .pipeThrough(createTimedBuffer(10));

The first transform re-chunks raw bytes into words. It buffers incoming bytes, splits on word boundaries, and holds the last token in case it's incomplete (the server might split mid-word). Punctuation-terminated tokens get flushed immediately since they're guaranteed to be complete.

export function createWordTransform(): TransformStream<Uint8Array, string> {
  const decoder = new TextDecoder();
  let buffer = "";
  const WORD_BOUNDARY = /(\S+|\s+)/g;

  return new TransformStream({
    transform(chunk, controller) {
      buffer += decoder.decode(chunk, { stream: true });
      const tokens = buffer.match(WORD_BOUNDARY);
      if (!tokens) return;
      buffer = tokens.pop() ?? "";
      for (const token of tokens) controller.enqueue(token);
    },
    flush(controller) {
      if (buffer.length > 0) controller.enqueue(buffer);
    },
  });
}

The second transform adds a fixed delay after each word, spacing them out evenly:

export function createTimedBuffer(delayMs: number): TransformStream<string, string> {
  return new TransformStream({
    async transform(chunk, controller) {
      controller.enqueue(chunk);
      await new Promise(resolve => setTimeout(resolve, delayMs));
    },
  });
}

What comes out the other end of pipeThrough is just another ReadableStream, so you read it the same way you'd read any stream. Grab a reader, loop until done, and append each word to your React state:

const reader = wordStream.getReader();
while (true) {
  const { value, done } = await reader.read();
  if (done) break;
  setOutput((prev) => prev + value);
}

Each value is now a single word (or whitespace token) arriving at an even 10ms cadence, instead of a raw byte chunk of arbitrary size.

Why the concept is important

Our implementation here is intentionally simple. We split on whitespace, add a fixed delay, and call it a day. But production smooth streaming has to handle a lot more.

The supports word, line, custom regex, and custom function chunking:

chunking?: 'word' | 'line' | RegExp | ChunkDetector | Intl.Segmenter;

It also accepts an Intl.Segmenter for locale-aware word segmentation. This matters for CJK languages (Chinese, Japanese, Korean) where words aren't separated by spaces. The SDK duck-types for the segment method rather than using instanceof Intl.Segmenter, since Intl.Segmenter isn't available in all runtimes and this also lets you pass any object with a .segment() method:

const segmenter = chunking as Intl.Segmenter;
detectChunk = (buffer: string) => {
  if (buffer.length === 0) return null;
  const iterator = segmenter.segment(buffer)[Symbol.iterator]();
  const first = iterator.next().value;
  return first?.segment || null;
};

The AI SDK also uses a createStitchableStream to handle multi-step tool execution. When a model invokes a tool, the response splits into separate steps: the initial text, the tool result, then potentially another model call. Each step is its own stream, and the stitchable stream sequences them into a single flat output:

const stitchableStream = createStitchableStream<TextStreamPart<TOOLS>>();

// initial model response
self.addStream(streamWithToolResults.pipeThrough(...));

// tool execution step
self.addStream(toolExecutionStepStream);

// smoothStream is then applied on top as a pipeThrough transform
stream = stream.pipeThrough(transform({ tools }));

I'm in love with this pattern. Having a stream controller that you can just addStream into beats trying to manage nested yields or manually merging async iterators. The consumer just reads, the stitchable stream handles sequencing underneath, and you can apply transformations like smoothStream on top with a single pipeThrough.

Even if you're working in a framework that doesn't support native ReadableStreams, the concept is simple enough to implement yourself with a queue and a promise.

In the Wild

Here's a side-by-side comparison of how OpenAI and Anthropic stream text in their chat interfaces. Scrub through and slow it down to see the difference in chunking and pacing. You'll notice that it has less jank compared to raw streams of text.

OpenAI
Anthropic
0:00.0 / 0:00.0

If you made it this far, thanks for reading! I'd love to hear what you think or if you've built something similar. Feel free to reach out at .