GPT API: Difference between revisions

742 bytes added ,  15 July 2023
Line 67: Line 67:
</pre>
</pre>
If the stream is true, the model's response can be shown while it is still being generated. We no longer need to wait for the whole response to be generated.
If the stream is true, the model's response can be shown while it is still being generated. We no longer need to wait for the whole response to be generated.
OpenAI uses server-sent events for the streaming. How you process the stream depends on your tech stack But the idea is the same, you receive a stream of chunks.
Chunks are strings that starts with data: followed by an object. The first chunk looks like this:
<pre
'data: {"id":"chatcmpl-xxxx","object":"chat.completion.chunk","created":1688198627,"model":"gpt-3.5-turbo-0613","choices":[{"index":0,"delta":{"role":"assistant","content":""},"finish_reason":null}]}'
</pre>
After that you'll receive one last chunk with the string "data: [DONE]".
One thing we lose with streaming is the usage field. So if you need to know how many tokens the request used you'll need to count them yourself.<ref name="”1”">https://gpt.pomb.us/</ref>


==Response Fields==
==Response Fields==
223

edits