GPT API: Difference between revisions

977 bytes added ,  15 July 2023
 
(7 intermediate revisions by the same user not shown)
Line 53: Line 53:


'''Uses:'''
'''Uses:'''
#When you receive responses from the model you can append the response to the ''messages'' array before the next ''user'' message.
#When you receive responses from the model you can append the response to the ''[[#messages|messages]]'' array before the next ''[[#user|user]]'' message.
#You can supply the ''assistant'' messages to show the model examples.
#You can supply the ''assistant'' messages to show the model examples.


Line 66: Line 66:
stream: true,
stream: true,
</pre>
</pre>
If the stream is true, the model's response can be shown while it is still being generated. We no longer need to wait for the whole response to be generated.
If the stream is true, the model's response can be shown while it is still generated. We no longer need to wait for the whole response to be generated.


OpenAI uses server-sent events for the streaming. How you process the stream depends on your tech stack But the idea is the same, you receive a stream of chunks.
[[OpenAI]] uses server-sent events for streaming. How you process the stream depends on your tech stack, But the idea is the same, you receive a stream of chunks.


Chunks are strings that starts with data: followed by an object. The first chunk looks like this:  
Chunks are strings that start with data: followed by an object. The first chunk looks like this:  
<pre>
<pre>
'data: {"id":"chatcmpl-xxxx","object":"chat.completion.chunk","created":1688198627,"model":"gpt-3.5-turbo-0613","choices":[{"index":0,"delta":{"role":"assistant","content":""},"finish_reason":null}]}'
'data: {"id":"chatcmpl-xxxx","object":"chat.completion.chunk","created":1688198627,"model":"gpt-3.5-turbo-0613","choices":[{"index":0,"delta":{"role":"assistant","content":""},"finish_reason":null}]}'
</pre>
</pre>


After that you'll receive one last chunk with the string "data: [DONE]".
After that, you'll receive one last chunk with the string "data: [DONE]".


One thing we lose with streaming is the usage field. So if you need to know how many tokens the request used you'll need to count them yourself.<ref name="”1”">https://gpt.pomb.us/</ref>
One thing we lose with streaming is the usage field. So if you need to know how many tokens the request used you'll need to count them yourself.<ref name="”1”">https://gpt.pomb.us/</ref>
Line 88: Line 88:
*0 = least random, 1 = most random
*0 = least random, 1 = most random
*default value is 1
*default value is 1
===n===
'''n''' is a number value that allows you to get multiple responses. Each response will be a different object inside the ''[[#choices|choices]]'' array.
*Note that the content of each choice may be the same, especially for short answers or if you're using a low ''[[#temperature|temperature]]''.<ref name="”1”"></ref>
===stop===
'''stop''' is an array of strings that tells the model to stop generating text when it encounters one of the strings. You can provide up to 4 strings to the stop array. The stop string found will Not be included in the response.
===max_tokens===
<pre>
max_tokens: 100,
</pre>
'''max_tokens''' is a number value that indicates the maximum number of [[tokens]] the model will generate before stopping. For example, if the max_tokens is 100, the model will generate 100 tokens (approximately 75 words) before stopping.
===presence_penalty===
'''presence_penalty'''
===frequency_penalty===
'''frequency_penalty'''


==Response Fields==
==Response Fields==
223

edits