223
edits
(→n) |
(→stream) |
||
Line 66: | Line 66: | ||
stream: true, | stream: true, | ||
</pre> | </pre> | ||
If the stream is true, the model's response can be shown while it is still | If the stream is true, the model's response can be shown while it is still generated. We no longer need to wait for the whole response to be generated. | ||
OpenAI uses server-sent events for | [[OpenAI]] uses server-sent events for streaming. How you process the stream depends on your tech stack, But the idea is the same, you receive a stream of chunks. | ||
Chunks are strings that | Chunks are strings that start with data: followed by an object. The first chunk looks like this: | ||
<pre> | <pre> | ||
'data: {"id":"chatcmpl-xxxx","object":"chat.completion.chunk","created":1688198627,"model":"gpt-3.5-turbo-0613","choices":[{"index":0,"delta":{"role":"assistant","content":""},"finish_reason":null}]}' | 'data: {"id":"chatcmpl-xxxx","object":"chat.completion.chunk","created":1688198627,"model":"gpt-3.5-turbo-0613","choices":[{"index":0,"delta":{"role":"assistant","content":""},"finish_reason":null}]}' | ||
</pre> | </pre> | ||
After that you'll receive one last chunk with the string "data: [DONE]". | After that, you'll receive one last chunk with the string "data: [DONE]". | ||
One thing we lose with streaming is the usage field. So if you need to know how many tokens the request used you'll need to count them yourself.<ref name="”1”">https://gpt.pomb.us/</ref> | One thing we lose with streaming is the usage field. So if you need to know how many tokens the request used you'll need to count them yourself.<ref name="”1”">https://gpt.pomb.us/</ref> |
edits