GPT API: Difference between revisions

GPT API (view source)

742 bytes added , 15 July 2023

223

edits

@@ Line 67: / Line 67: @@
 </pre>
 If the stream is true, the model's response can be shown while it is still being generated. We no longer need to wait for the whole response to be generated.
+OpenAI uses server-sent events for the streaming. How you process the stream depends on your tech stack But the idea is the same, you receive a stream of chunks.
+Chunks are strings that starts with data: followed by an object. The first chunk looks like this:
+<pre
+'data: {"id":"chatcmpl-xxxx","object":"chat.completion.chunk","created":1688198627,"model":"gpt-3.5-turbo-0613","choices":[{"index":0,"delta":{"role":"assistant","content":""},"finish_reason":null}]}'
+</pre>
+After that you'll receive one last chunk with the string "data: [DONE]".
+One thing we lose with streaming is the usage field. So if you need to know how many tokens the request used you'll need to count them yourself.<ref name="”1”">https://gpt.pomb.us/</ref>
 ==Response Fields==