Ruminations
Code better not harder


Streaming APIs: OpenAI's Responses vs. Chat Completions

Posted on

Recently, I upgraded the chat assistant on this site to use OpenAI's latest v1/responses API, moving away from the more traditional v1/chat/completions endpoint. This change was driven by the desire to integrate native web search capabilities into the model's responses. The process revealed some interesting differences in their streaming architecture, which are worth discussing.

Here's a breakdown of the two APIs from a developer's perspective, especially when handling Server-Sent Events (SSE).

The Classic: Chat Completions API (/v1/chat/completions)

The Chat Completions API has been the standard for building conversational AI for some time. When used in streaming mode, it sends a sequence of events that are relatively straightforward to parse.

Pros:

Cons:

Here’s a simplified look at what a client-side parser for this stream does:

// Simplified logic for Chat Completions stream
if (parsed.choices && parsed.choices[0].delta) {
  const content = parsed.choices[0].delta.content || '';
  if (content) {
    render(content);
  }
}

The Newcomer: Responses API (/v1/responses)

The Responses API is OpenAI's next-generation endpoint, designed to be more of a state machine that handles complex, multi-tool interactions natively. This is immediately apparent in its event-driven streaming format.

Pros:

Cons:

Here’s how the client-side logic changes:

// Simplified logic for Responses API stream
switch (parsed.type) {
  case 'response.output_text.delta':
    render(parsed.delta || '');
    break;
  case 'response.completed':
    // Finalize the response, collect all citations
    collectCitations(parsed.response.output);
    break;
  // ... handle other event types
}

Conclusion

For simple, direct chat applications, the Chat Completions API remains a perfectly viable and simpler choice. Its straightforward streaming format is easy to implement and widely understood.

However, for building more advanced, agent-like experiences that require integrated tools like web search, the Responses API is the clear winner, despite its added complexity. The event-driven protocol gives you a much more robust and transparent way to handle the model's lifecycle. The native integration of tools and structured data like citations saves significant development effort that would otherwise be spent on chaining API calls and post-processing results.

For my own site, the switch was worth the effort. It simplified the backend by removing the need for a secondary search provider and enriched the user experience by providing properly attributed, up-to-date information directly from the primary model.