GPT API: Difference between revisions

GPT API (view source)

5 bytes removed , 15 July 2023

223

edits

@@ Line 66: / Line 66: @@
 stream: true,
 </pre>
-If the stream is true, the model's response can be shown while it is still being generated. We no longer need to wait for the whole response to be generated.
+If the stream is true, the model's response can be shown while it is still generated. We no longer need to wait for the whole response to be generated.
-OpenAI uses server-sent events for the streaming. How you process the stream depends on your tech stack But the idea is the same, you receive a stream of chunks.
+[[OpenAI]] uses server-sent events for streaming. How you process the stream depends on your tech stack, But the idea is the same, you receive a stream of chunks.
-Chunks are strings that starts with data: followed by an object. The first chunk looks like this:
+Chunks are strings that start with data: followed by an object. The first chunk looks like this:
 <pre>
 'data: {"id":"chatcmpl-xxxx","object":"chat.completion.chunk","created":1688198627,"model":"gpt-3.5-turbo-0613","choices":[{"index":0,"delta":{"role":"assistant","content":""},"finish_reason":null}]}'
 </pre>
-After that you'll receive one last chunk with the string "data: [DONE]".
+After that, you'll receive one last chunk with the string "data: [DONE]".
 One thing we lose with streaming is the usage field. So if you need to know how many tokens the request used you'll need to count them yourself.<ref name="”1”">https://gpt.pomb.us/</ref>