223
edits
(→top_p) |
|||
(8 intermediate revisions by the same user not shown) | |||
Line 53: | Line 53: | ||
'''Uses:''' | '''Uses:''' | ||
#When you receive responses from the model you can append the response to the ''messages'' array before the next ''user'' message. | #When you receive responses from the model you can append the response to the ''[[#messages|messages]]'' array before the next ''[[#user|user]]'' message. | ||
#You can supply the ''assistant'' messages to show the model examples. | #You can supply the ''assistant'' messages to show the model examples. | ||
Line 66: | Line 66: | ||
stream: true, | stream: true, | ||
</pre> | </pre> | ||
If the stream is true, the model's response can be shown while it is still | If the stream is true, the model's response can be shown while it is still generated. We no longer need to wait for the whole response to be generated. | ||
OpenAI uses server-sent events for | [[OpenAI]] uses server-sent events for streaming. How you process the stream depends on your tech stack, But the idea is the same, you receive a stream of chunks. | ||
Chunks are strings that | Chunks are strings that start with data: followed by an object. The first chunk looks like this: | ||
<pre> | <pre> | ||
'data: {"id":"chatcmpl-xxxx","object":"chat.completion.chunk","created":1688198627,"model":"gpt-3.5-turbo-0613","choices":[{"index":0,"delta":{"role":"assistant","content":""},"finish_reason":null}]}' | 'data: {"id":"chatcmpl-xxxx","object":"chat.completion.chunk","created":1688198627,"model":"gpt-3.5-turbo-0613","choices":[{"index":0,"delta":{"role":"assistant","content":""},"finish_reason":null}]}' | ||
</pre> | </pre> | ||
After that you'll receive one last chunk with the string "data: [DONE]". | After that, you'll receive one last chunk with the string "data: [DONE]". | ||
One thing we lose with streaming is the usage field. So if you need to know how many tokens the request used you'll need to count them yourself.<ref name="”1”">https://gpt.pomb.us/</ref> | One thing we lose with streaming is the usage field. So if you need to know how many tokens the request used you'll need to count them yourself.<ref name="”1”">https://gpt.pomb.us/</ref> | ||
Line 81: | Line 81: | ||
===temperature=== | ===temperature=== | ||
[[temperature]] accepts a value between 0 and 2. It impacts the randomness of the predictions made by the model. A lower temperature (e.g., close to 0) will cause the model to make more deterministic and confident predictions, picking the most likely next words or phrases. Higher temperatures (e.g., 1 or more) make the output more random and the model may generate less likely but more diverse outputs. | [[temperature]] accepts a value between 0 and 2. It impacts the randomness of the predictions made by the model. A lower temperature (e.g., close to 0) will cause the model to make more deterministic and confident predictions, picking the most likely next words or phrases. Higher temperatures (e.g., 1 or more) make the output more random and the model may generate less likely but more diverse outputs. | ||
*0 = least random, 2 = most random | |||
*The default value of temperature is 1 | *The default value of temperature is 1 | ||
Line 87: | Line 88: | ||
*0 = least random, 1 = most random | *0 = least random, 1 = most random | ||
*default value is 1 | *default value is 1 | ||
===n=== | |||
'''n''' is a number value that allows you to get multiple responses. Each response will be a different object inside the ''[[#choices|choices]]'' array. | |||
*Note that the content of each choice may be the same, especially for short answers or if you're using a low ''[[#temperature|temperature]]''.<ref name="”1”"></ref> | |||
===stop=== | |||
'''stop''' is an array of strings that tells the model to stop generating text when it encounters one of the strings. You can provide up to 4 strings to the stop array. The stop string found will Not be included in the response. | |||
===max_tokens=== | |||
<pre> | |||
max_tokens: 100, | |||
</pre> | |||
'''max_tokens''' is a number value that indicates the maximum number of [[tokens]] the model will generate before stopping. For example, if the max_tokens is 100, the model will generate 100 tokens (approximately 75 words) before stopping. | |||
===presence_penalty=== | |||
'''presence_penalty''' | |||
===frequency_penalty=== | |||
'''frequency_penalty''' | |||
==Response Fields== | ==Response Fields== |
edits