Internals Working of Large Language Models (LLMs)

July 15, 2025

Many real-time applications like ChatGPT, requests are sent and received using event streams, especially when streaming responses

How Requests Are Sent to LLMs like ChatGPT

🧵 1. Event Stream (Server-Sent Events / Streaming API)

When you chat with ChatGPT, especially in real-time apps, the request is sent once, and the response comes back gradually as a stream of text.

This is known as event streaming or streamed responses, and it's often handled using:

Server-Sent Events (SSE)

WebSockets (less common for OpenAI API, more for custom LLM apps)

HTTP Streaming (chunked responses)

Search This Blog

BYTE THE FUTURE

Internals Working of Large Language Models (LLMs)

How Requests Are Sent to LLMs like ChatGPT

🧵 1. Event Stream (Server-Sent Events / Streaming API)

Comments

Post a Comment

Popular posts from this blog

Find the Odd One Out using LLMs

What is ChatGPT?

What Is a Vector Database?