Vercel AI SDK Streaming: LLM Streaming Responses (2026)

The Vercel AI SDK’s streaming feature is fundamentally about enabling a more interactive and responsive user experience by breaking down large LLM responses into smaller, manageable chunks that are delivered to the client as they’re generated, rather than waiting for the entire response to be complete.

Let’s see this in action with a simple Next.js app using the AI SDK to stream a response from OpenAI’s gpt-3.5-turbo.

// app/api/chat/route.ts
import { OpenAI } from 'openai';
import { OpenAIStream, StreamingTextResponse } from 'ai';

const openai = new OpenAI({
  apiKey: process.env.OPENAI_API_KEY,
});

export const runtime = 'edge';

export async function POST(req: Request) {
  const { messages } = await req.json();

  const response = await openai.chat.completions.create({
    model: 'gpt-3.5-turbo',
    stream: true,
    messages,
  });

  const stream = OpenAIStream(response);
  return new StreamingTextResponse(stream);
}

And on the client-side, typically in a React component:

// app/page.tsx
'use client';

import { useChat } from 'ai/react';

export default function Chat() {
  const { messages, input, handleInputChange, handleSubmit } = useChat();

  return (
    <div>
      {messages.map(message => (
        <div key={message.id}>
          {message.role === 'user' ? 'User: ' : 'AI: '}
          {message.content}
        </div>
      ))}

      <form onSubmit={handleSubmit}>
        <input
          type="text"
          value={input}
          onChange={handleInputChange}
          placeholder="Ask a question..."
        />
        <button type="submit">Send</button>
      </form>
    </div>
  );
}

When you send a message, the handleSubmit function triggers a POST request to /api/chat. The server receives the messages, sends them to OpenAI with stream: true, and then pipes the resulting OpenAIStream into a StreamingTextResponse. The ai/react hook on the client then takes this stream and progressively updates the messages array, causing the UI to re-render with each new chunk of text.

The core problem this solves is the perceived latency of LLM responses. Without streaming, a user might see a blank screen for several seconds or even minutes while the LLM generates a lengthy answer. This leads to a poor user experience, with users assuming the application is broken or unresponsive. Streaming provides immediate feedback, showing the user that something is happening and gradually revealing the answer, making the interaction feel much more dynamic and engaging.

Internally, the OpenAIStream function is a crucial piece of the puzzle. It takes the raw streaming response object from the OpenAI API and transforms it into a Node.js-style readable stream. This stream is an iterable object where each yield represents a chunk of data received from the LLM. The StreamingTextResponse then wraps this stream in an HTTP response with the appropriate Content-Type header (text/plain; charset=utf-8 or text/event-stream), allowing the client to consume it. The ai/react hook handles the client-side consumption of this stream, parsing the incoming chunks (often Server-Sent Events) and updating the local state.

The runtime: 'edge' directive in the API route is also significant. It tells Vercel to deploy this function to its Edge Network, which is optimized for low latency and fast execution. This is particularly beneficial for streaming applications, as it minimizes the network hop between your Vercel deployment and the LLM provider (like OpenAI), reducing the overall time it takes for the first chunk of data to reach the user.

When you configure your LLM provider within the ai SDK, you’re essentially telling the SDK how to format the API call and how to interpret the response. For OpenAI, this involves setting stream: true and then using OpenAIStream to process the ChatCompletionChunk objects it returns. Each chunk contains a delta, which is the new piece of text generated by the model since the last chunk. The SDK concatenates these deltas to form the complete message content.

The useChat hook abstracts away much of this complexity. It manages the state of the messages, handles the API calls, and processes the incoming stream, providing a clean interface for building chat UIs. You can customize its behavior by providing options like initialMessages, id, or body for the POST request.

The ai SDK supports different LLM providers (OpenAI, Cohere, Anthropic, etc.) by providing specific *Stream functions (e.g., OpenAIStream, CohereStream). These functions know how to parse the unique streaming response formats of each provider. The underlying principle remains the same: convert the provider’s stream into a standard readable stream that the StreamingTextResponse can use.

The Content-Type header of the StreamingTextResponse is typically text/event-stream. This format, part of the Server-Sent Events (SSE) specification, is ideal for unidirectional communication from server to client. Each message in the stream is prefixed with data: , followed by the actual content, and terminated by \n\n. The ai/react hook knows how to parse these events.

The ai SDK also handles error streaming. If an error occurs during LLM generation, it will be streamed back as an SSE event, often with a data: {"error": {...}} format, which the client-side hook can catch and display. This ensures that even error states are communicated progressively to the user.

One subtle but powerful aspect of the ai SDK’s streaming is its ability to handle custom data within the stream. While the primary use is for text, you can, with custom stream parsing on the client, embed JSON objects or other structured data within the data: fields of your SSE events. This allows for richer interactions, like streaming UI updates or actions alongside the text response, though it requires more advanced client-side handling than the basic useChat hook provides.

The next concept you’ll likely run into is managing the state and UI more granularly as the stream progresses, perhaps to display a typing indicator or to update specific parts of the UI based on different types of streamed data.

More Deep Dives in Vercel