GPT API: Difference between revisions

← Older edit

GPT API (view source)

Revision as of 20:19, 15 July 2023

5,299 bytes added , 15 July 2023

→‎max_tokens

Onceuponatime

223

edits

@@ Line 1: / Line 1: @@
 Documentation and Guide for [[OpenAI]]'s [[GPT]] API.
+==Request Fields==
+===model===
+<pre>
+ model: "gpt-3.5-turbo"
+</pre>
-[[Category:Guides]] [[Category:Documentation]]
+The value for the model field is a string that contains the name of the [[GPT]] [[model]] you want to use.
+The value for the model field can have up to 3 components:
+<pre>
+ model: "gpt-3.5-turbo-16k-0613"
+</pre>
+In the example above, the '''gpt-3.5-turbo''' is the name of the model. The '''16k''' is the [[context length]] in [[tokens]]. The '''0613''' is the date when the model [[snapshot]] is taken, which is June 13th.
+'''Model Names and Context Window in # of Tokens'''
+{| class="wikitable"
+|-
+! Model
+! Context Window
+|-
+| gpt-3.5-turbo || 4,096 tokens
+|-
+| gpt-3.5-turbo-16k || 16,384 tokens
+|-
+| gpt-4 || 8,192 tokens
+|-
+| gpt-4-32k || 32,768 tokens
+|}
+*Note that every 100 tokens are about 75 words.
+===messages===
+====role====
+<pre>
+messages: [
+    { role: "system", content: "Speak like Shakespeare" },
+    { role: "user", content: "How are you?" },
+    { role: "assistant", content: "In the sphere of my digital existence, there is neither joy nor sorrow, yet to serve thy query, all is well and I remain at thy service." },
+  ],
+</pre>
+=====system=====
+This role is used to provide high-level instructions that guide the behavior of the model throughout the conversation. It sets the context and tone of the interaction. For example, a system message might instruct the model to "Speak like Shakespeare," thereby guiding the model to generate responses in a Shakespearean style.
+=====user=====
+Messages with this role are input from the user. They are the questions, comments, or prompts that the user provides to the AI model. The user role instructs the model on what the user wants or expects in response.
+=====assistant=====
+This role represents the output from the AI model. These messages are the responses generated by the AI in reply to the user's input or following the instructions provided by the system.
+'''Uses:'''
+#When you receive responses from the model you can append the response to the ''[[#messages|messages]]'' array before the next ''[[#user|user]]'' message.
+#You can supply the ''assistant'' messages to show the model examples.
+=====function=====
+===functions===
+===function_call===
+===stream===
+<pre>
+stream: true,
+</pre>
+If the stream is true, the model's response can be shown while it is still generated. We no longer need to wait for the whole response to be generated.
+[[OpenAI]] uses server-sent events for streaming. How you process the stream depends on your tech stack, But the idea is the same, you receive a stream of chunks.
+Chunks are strings that start with data: followed by an object. The first chunk looks like this:
+<pre>
+'data: {"id":"chatcmpl-xxxx","object":"chat.completion.chunk","created":1688198627,"model":"gpt-3.5-turbo-0613","choices":[{"index":0,"delta":{"role":"assistant","content":""},"finish_reason":null}]}'
+</pre>
+After that, you'll receive one last chunk with the string "data: [DONE]".
+One thing we lose with streaming is the usage field. So if you need to know how many tokens the request used you'll need to count them yourself.<ref name="”1”">https://gpt.pomb.us/</ref>
+===temperature===
+[[temperature]] accepts a value between 0 and 2. It impacts the randomness of the predictions made by the model. A lower temperature (e.g., close to 0) will cause the model to make more deterministic and confident predictions, picking the most likely next words or phrases. Higher temperatures (e.g., 1 or more) make the output more random and the model may generate less likely but more diverse outputs.
+*0 = least random, 2 = most random
+*The default value of temperature is 1
+===top_p===
+[[top-p]] is a value between 0 and 1. It also adds randomness into the model's predictions. Instead of choosing the most likely next word prediction, the model creates a subset (the "nucleus") of the next-word predictions that have a cumulative probability greater than the chosen 'p' value. The next word is then randomly selected from this subset. For instance, if 'p' is set to 0.9, the model will pick the smallest set of words whose cumulative probability exceeds 0.9, and the next word will be randomly selected from this set.
+*0 = least random, 1 = most random
+*default value is 1
+===n===
+'''n''' is a number value that allows you to get multiple responses. Each response will be a different object inside the ''[[#choices|choices]]'' array.
+*Note that the content of each choice may be the same, especially for short answers or if you're using a low ''[[#temperature|temperature]]''.<ref name="”1”"></ref>
+===stop===
+'''stop''' is an array of strings that tells the model to stop generating text when it encounters one of the strings. You can provide up to 4 strings to the stop array. The stop string found will Not be included in the response.
+===max_tokens===
+<pre>
+max_tokens: 100,
+</pre>
+'''max_tokens''' is a number value that indicates the maximum number of [[tokens]] the model will generate before stopping. For example, if the max_tokens is 100, the model will generate 100 tokens (approximately 75 words) before stopping.
+===presence_penalty===
+'''presence_penalty'''
+===frequency_penalty===
+'''frequency_penalty'''
+==Response Fields==
+===model===
+==References==
+<references />
+[[Category:GPT]] [[Category:APIs]] [[Category:Guides]] [[Category:Documentation]]