Make your text streams feel nice!

Smooth Streaming

When I was working on Text Streaming, I noticed that streams within AI apps like OpenAI, Perplexity, have a different feel when it comes to displaying inbound text. Mine feels slightly jankier, and turns out Smooth Streaming is probably the reason why.

Playground

You can edit the markdown text on the left, and try streaming the text from my POST /api/echo endpoint, and compare the feeling when SMOOTH is ON or OFF.

Hit SOUND to enable audio. Each chunk plays a click so you can hear the difference in pacing.

RANDOMNESS15%
SPEED1.0x
## The Context Window Problem

Ramp Inspect runs in a sandbox with deep integration across every internal connector — full visibility across the entire stack. That's powerful. But in practice, when a coding agent has access to *every* connector, asking it to fix a concurrency bug in a backend file can spiral: it verifies the bug on Datadog, checks frontend code, traverses tangential paths — all of which pollute context.

### Connector Sprawl → Context Drift

I tend to start with minimal connectors and add more only when the agent explicitly needs broader context. Otherwise I find myself constantly interrupting: "just focus on this one file." And toward the end — after compaction — I'm re-explaining requirements, asking it to re-read the Slack thread, walking it through where we left off.

### The Compaction Death Spiral

1. You give the agent a plan. It understands.
2. You hit a technically challenging section — conversation deepens.
3. After 1-2 compactions, the agent loses sight of the original goals.
4. You re-inject context. It drifts again. Repeat.
5. Eventually a fresh session feels cheaper than recovery.

### What I Want

A UI where I can see **all active Claude Code sessions** on one screen, each with its persisted "memory block" — context that survives compaction. A button to attach or detach context from any session, like hot-swapping connectors. So when the agent finishes a hard section and gets compacted, the plan, constraints, and decisions **stick**.

> "I always find myself saying 'didn't I tell you not to do X already' — and it never sticks."

That frustration, repeated across 5+ sessions, is the problem.
AWAITING STREAM...
IDLESMOOTH OFF
1648 chars

Feel the difference? SMOOTH ON gives you a steady flow of beepboops, but SMOOTH OFF gives you inconsistent beepboops.

waow so smooth? woaw

Hitting STREAM calls a POST /api/echo endpoint that simulates bursty LLM output using a ReadableStream. It dumps 3–8 chunks rapidly (near-instant), then pauses to simulate "thinking" before the next burst:


const stream = new ReadableStream({
  async start(controller) {
    let cursor = 0;
    while (cursor < text.length) {
      // Each burst: 3–8 chunks fired rapidly
      const burstSize = 3 + Math.floor(Math.random() * 6);
      for (let i = 0; i < burstSize && cursor < text.length; i++) {
        const chunkSize = 1 + Math.floor(Math.random() * 12);
        const chunk = text.slice(cursor, cursor + chunkSize);
        cursor += chunkSize;
        controller.enqueue(encoder.encode(chunk));
        // Tiny delay within a burst (0–5ms) — near-instant
        await sleep((Math.random() * 5) / speedMultiplier);
      }
      // Pause between bursts: "thinking" time
      const pause = 80 + Math.random() * 200;
      const stallBonus = Math.random() < stallProbability
        ? 150 + Math.random() * 300 : 0;
      await sleep((pause + stallBonus) / speedMultiplier);
    }
    controller.close();
  },
});

The three controls in the playground map directly to this code:

Randomness sets stallProbability, which controls how often the stream randomly pauses mid-burst.
Speed sets speedMultiplier, which scales all delays up or down.
Smooth toggles whether the client pipes the raw response through a word transform and timed buffer before rendering.

Client Streaming

Web APIs have a composable primitive called TransformStream that lets you perform transformations between streams. If you've used RxJS pipes before (I had to for a functional programming class in uni), this will feel familiar, except it's just the platform.

You start with the ReadableStream from fetch, pipe it through one or more transforms using pipeThrough, and read the result on the other end.

Here's what it looks like to chain two transforms together:


const wordStream = response.body
  .pipeThrough(createWordTransform())
  .pipeThrough(createTimedBuffer(10));

The first transform re-chunks raw bytes into words. It buffers incoming bytes, splits on word boundaries, and holds the last token in case it's incomplete (the server might split mid-word). Punctuation-terminated tokens get flushed immediately since they're guaranteed to be complete.


export function createWordTransform(): TransformStream<Uint8Array, string> {
  const decoder = new TextDecoder();
  let buffer = "";
  const WORD_BOUNDARY = /(\S+|\s+)/g;

  return new TransformStream({
    transform(chunk, controller) {
      buffer += decoder.decode(chunk, { stream: true });
      const tokens = buffer.match(WORD_BOUNDARY);
      if (!tokens) return;
      buffer = tokens.pop() ?? "";
      for (const token of tokens) controller.enqueue(token);
    },
    flush(controller) {
      if (buffer.length > 0) controller.enqueue(buffer);
    },
  });
}

The second transform adds a fixed delay after each word, spacing them out evenly:


export function createTimedBuffer(delayMs: number): TransformStream<string, string> {
  return new TransformStream({
    async transform(chunk, controller) {
      controller.enqueue(chunk);
      await new Promise(resolve => setTimeout(resolve, delayMs));
    },
  });
}

What comes out the other end of pipeThrough is just another ReadableStream, so you read it the same way you'd read any stream. Grab a reader, loop until done, and append each word to your React state:


const reader = wordStream.getReader();
while (true) {
  const { value, done } = await reader.read();
  if (done) break;
  setOutput((prev) => prev + value);
}

Each value is now a single word (or whitespace token) arriving at an even 10ms cadence, instead of a raw byte chunk of arbitrary size.

Why the concept is important

Our implementation here is intentionally simple. We split on whitespace, add a fixed delay, and call it a day. But production smooth streaming has to handle a lot more.

The Vercel AI SDK's smoothStream supports word, line, custom regex, and custom function chunking:


chunking?: 'word' | 'line' | RegExp | ChunkDetector | Intl.Segmenter;

It also accepts an Intl.Segmenter for locale-aware word segmentation. This matters for CJK languages (Chinese, Japanese, Korean) where words aren't separated by spaces. The SDK duck-types for the segment method rather than using instanceof Intl.Segmenter, since Intl.Segmenter isn't available in all runtimes and this also lets you pass any object with a .segment() method:


const segmenter = chunking as Intl.Segmenter;
detectChunk = (buffer: string) => {
  if (buffer.length === 0) return null;
  const iterator = segmenter.segment(buffer)[Symbol.iterator]();
  const first = iterator.next().value;
  return first?.segment || null;
};

The AI SDK also uses a createStitchableStream to handle multi-step tool execution. When a model invokes a tool, the response splits into separate steps: the initial text, the tool result, then potentially another model call. Each step is its own stream, and the stitchable stream sequences them into a single flat output:


const stitchableStream = createStitchableStream<TextStreamPart<TOOLS>>();

// initial model response
self.addStream(streamWithToolResults.pipeThrough(...));

// tool execution step
self.addStream(toolExecutionStepStream);

// smoothStream is then applied on top as a pipeThrough transform
stream = stream.pipeThrough(transform({ tools }));

I'm in love with this pattern. Having a stream controller that you can just addStream into beats trying to manage nested yields or manually merging async iterators. The consumer just reads, the stitchable stream handles sequencing underneath, and you can apply transformations like smoothStream on top with a single pipeThrough.

Even if you're working in a framework that doesn't support native ReadableStreams, the concept is simple enough to implement yourself with a queue and a promise.

In the Wild

Here's a side-by-side comparison of how OpenAI and Anthropic stream text in their chat interfaces. Scrub through and slow it down to see the difference in chunking and pacing. You'll notice that it has less jank compared to raw streams of text.

OpenAI

Anthropic

0:00.0 / 0:00.0

If you made it this far, thanks for reading! I'd love to hear what you think or if you've built something similar. Feel free to reach out at zach@nightly.ink.