GPT API: Difference between revisions

← Older edit

GPT API (view source)

Revision as of 20:19, 15 July 2023

1,012 bytes added , 15 July 2023

→‎max_tokens

Onceuponatime

223

edits

@@ Line 53: / Line 53: @@
 '''Uses:'''
-#When you receive responses from the model you can append the response to the ''messages'' array before the next ''user'' message.
+#When you receive responses from the model you can append the response to the ''[[#messages|messages]]'' array before the next ''[[#user|user]]'' message.
 #You can supply the ''assistant'' messages to show the model examples.
@@ Line 66: / Line 66: @@
 stream: true,
 </pre>
-If the stream is true, the model's response can be shown while it is still being generated. We no longer need to wait for the whole response to be generated.
+If the stream is true, the model's response can be shown while it is still generated. We no longer need to wait for the whole response to be generated.
-OpenAI uses server-sent events for the streaming. How you process the stream depends on your tech stack But the idea is the same, you receive a stream of chunks.
+[[OpenAI]] uses server-sent events for streaming. How you process the stream depends on your tech stack, But the idea is the same, you receive a stream of chunks.
-Chunks are strings that starts with data: followed by an object. The first chunk looks like this:
+Chunks are strings that start with data: followed by an object. The first chunk looks like this:
 <pre>
 'data: {"id":"chatcmpl-xxxx","object":"chat.completion.chunk","created":1688198627,"model":"gpt-3.5-turbo-0613","choices":[{"index":0,"delta":{"role":"assistant","content":""},"finish_reason":null}]}'
 </pre>
-After that you'll receive one last chunk with the string "data: [DONE]".
+After that, you'll receive one last chunk with the string "data: [DONE]".
 One thing we lose with streaming is the usage field. So if you need to know how many tokens the request used you'll need to count them yourself.<ref name="”1”">https://gpt.pomb.us/</ref>
@@ Line 81: / Line 81: @@
 ===temperature===
 [[temperature]] accepts a value between 0 and 2. It impacts the randomness of the predictions made by the model. A lower temperature (e.g., close to 0) will cause the model to make more deterministic and confident predictions, picking the most likely next words or phrases. Higher temperatures (e.g., 1 or more) make the output more random and the model may generate less likely but more diverse outputs.
+*0 = least random, 2 = most random
 *The default value of temperature is 1
@@ Line 87: / Line 88: @@
 *0 = least random, 1 = most random
 *default value is 1
+===n===
+'''n''' is a number value that allows you to get multiple responses. Each response will be a different object inside the ''[[#choices|choices]]'' array.
+*Note that the content of each choice may be the same, especially for short answers or if you're using a low ''[[#temperature|temperature]]''.<ref name="”1”"></ref>
+===stop===
+'''stop''' is an array of strings that tells the model to stop generating text when it encounters one of the strings. You can provide up to 4 strings to the stop array. The stop string found will Not be included in the response.
+===max_tokens===
+<pre>
+max_tokens: 100,
+</pre>
+'''max_tokens''' is a number value that indicates the maximum number of [[tokens]] the model will generate before stopping. For example, if the max_tokens is 100, the model will generate 100 tokens (approximately 75 words) before stopping.
+===presence_penalty===
+'''presence_penalty'''
+===frequency_penalty===
+'''frequency_penalty'''
 ==Response Fields==