Make your text streams feel nice!
Smooth Streaming
When I was working on Text Streaming, I noticed that streams within AI apps like OpenAI, Perplexity, have a different feel when it comes to displaying inbound text. Mine feels slightly jankier, and turns out Smooth Streaming is probably the reason why.
Playground
You can edit the markdown text on the left, and try streaming the text from my POST /api/echo endpoint, and compare the feeling when SMOOTH is ON or OFF.
Hit SOUND to enable audio. Each chunk plays a click so you can hear the difference in pacing.
Feel the difference? SMOOTH ON gives you a steady flow of beepboops, but SMOOTH OFF gives you inconsistent beepboops.
waow so smooth? woaw
Hitting STREAM calls a POST /api/echo endpoint that simulates bursty LLM output using a ReadableStream. It dumps 3–8 chunks rapidly (near-instant), then pauses to simulate "thinking" before the next burst:
The three controls in the playground map directly to this code:
- Randomness sets
stallProbability, which controls how often the stream randomly pauses mid-burst. - Speed sets
speedMultiplier, which scales all delays up or down. - Smooth toggles whether the client pipes the raw response through a word transform and timed buffer before rendering.
Client Streaming
Web APIs have a composable primitive called TransformStream that lets you perform transformations between streams. If you've used RxJS pipes before (I had to for a functional programming class in uni), this will feel familiar, except it's just the platform.
You start with the ReadableStream from fetch, pipe it through one or more transforms using pipeThrough, and read the result on the other end.
Here's what it looks like to chain two transforms together:
The first transform re-chunks raw bytes into words. It buffers incoming bytes, splits on word boundaries, and holds the last token in case it's incomplete (the server might split mid-word). Punctuation-terminated tokens get flushed immediately since they're guaranteed to be complete.
The second transform adds a fixed delay after each word, spacing them out evenly:
What comes out the other end of pipeThrough is just another ReadableStream, so you read it the same way you'd read any stream. Grab a reader, loop until done, and append each word to your React state:
Each value is now a single word (or whitespace token) arriving at an even 10ms cadence, instead of a raw byte chunk of arbitrary size.
Why the concept is important
Our implementation here is intentionally simple. We split on whitespace, add a fixed delay, and call it a day. But production smooth streaming has to handle a lot more.
The Vercel AI SDK's smoothStream supports word, line, custom regex, and custom function chunking:
It also accepts an Intl.Segmenter for locale-aware word segmentation. This matters for CJK languages (Chinese, Japanese, Korean) where words aren't separated by spaces. The SDK duck-types for the segment method rather than using instanceof Intl.Segmenter, since Intl.Segmenter isn't available in all runtimes and this also lets you pass any object with a .segment() method:
The AI SDK also uses a createStitchableStream to handle multi-step tool execution. When a model invokes a tool, the response splits into separate steps: the initial text, the tool result, then potentially another model call. Each step is its own stream, and the stitchable stream sequences them into a single flat output:
I'm in love with this pattern. Having a stream controller that you can just addStream into beats trying to manage nested yields or manually merging async iterators. The consumer just reads, the stitchable stream handles sequencing underneath, and you can apply transformations like smoothStream on top with a single pipeThrough.
Even if you're working in a framework that doesn't support native ReadableStreams, the concept is simple enough to implement yourself with a queue and a promise.
In the Wild
Here's a side-by-side comparison of how OpenAI and Anthropic stream text in their chat interfaces. Scrub through and slow it down to see the difference in chunking and pacing. You'll notice that it has less jank compared to raw streams of text.
If you made it this far, thanks for reading! I'd love to hear what you think or if you've built something similar. Feel free to reach out at zach@nightly.ink.