223
edits
(→stream) |
(→stream) |
||
Line 67: | Line 67: | ||
</pre> | </pre> | ||
If the stream is true, the model's response can be shown while it is still being generated. We no longer need to wait for the whole response to be generated. | If the stream is true, the model's response can be shown while it is still being generated. We no longer need to wait for the whole response to be generated. | ||
OpenAI uses server-sent events for the streaming. How you process the stream depends on your tech stack But the idea is the same, you receive a stream of chunks. | |||
Chunks are strings that starts with data: followed by an object. The first chunk looks like this: | |||
<pre | |||
'data: {"id":"chatcmpl-xxxx","object":"chat.completion.chunk","created":1688198627,"model":"gpt-3.5-turbo-0613","choices":[{"index":0,"delta":{"role":"assistant","content":""},"finish_reason":null}]}' | |||
</pre> | |||
After that you'll receive one last chunk with the string "data: [DONE]". | |||
One thing we lose with streaming is the usage field. So if you need to know how many tokens the request used you'll need to count them yourself.<ref name="”1”">https://gpt.pomb.us/</ref> | |||
==Response Fields== | ==Response Fields== |
edits