GPT API

Documentation and Guide for OpenAI's GPT API.

Request Fields

model

 model: "gpt-3.5-turbo"

The value for the model field is a string that contains the name of the GPT model you want to use.

The value for the model field can have up to 3 components:

 model: "gpt-3.5-turbo-16k-0613"

In the example above, the gpt-3.5-turbo is the name of the model. The 16k is the context length in tokens. The 0613 is the date when the model snapshot is taken, which is June 13th.

Model Names and Context Window in # of Tokens

Model	Context Window
gpt-3.5-turbo	4,096 tokens
gpt-3.5-turbo-16k	16,384 tokens
gpt-4	8,192 tokens
gpt-4-32k	32,768 tokens

Note that every 100 tokens are about 75 words.

messages

role

messages: [
    { role: "system", content: "Speak like Shakespeare" },
    { role: "user", content: "How are you?" },
    { role: "assistant", content: "In the sphere of my digital existence, there is neither joy nor sorrow, yet to serve thy query, all is well and I remain at thy service." },
  ],

system

This role is used to provide high-level instructions that guide the behavior of the model throughout the conversation. It sets the context and tone of the interaction. For example, a system message might instruct the model to "Speak like Shakespeare," thereby guiding the model to generate responses in a Shakespearean style.

user

Messages with this role are input from the user. They are the questions, comments, or prompts that the user provides to the AI model. The user role instructs the model on what the user wants or expects in response.

assistant

This role represents the output from the AI model. These messages are the responses generated by the AI in reply to the user's input or following the instructions provided by the system.

Uses:

When you receive responses from the model you can append the response to the messages array before the next user message.
You can supply the assistant messages to show the model examples.

function

functions

function_call

stream

stream: true,

If the stream is true, the model's response can be shown while it is still being generated. We no longer need to wait for the whole response to be generated.

OpenAI uses server-sent events for the streaming. How you process the stream depends on your tech stack But the idea is the same, you receive a stream of chunks.

Chunks are strings that starts with data: followed by an object. The first chunk looks like this: <pre 'data: {"id":"chatcmpl-xxxx","object":"chat.completion.chunk","created":1688198627,"model":"gpt-3.5-turbo-0613","choices":[{"index":0,"delta":{"role":"assistant","content":""},"finish_reason":null}]}'

After that you'll receive one last chunk with the string "data: [DONE]".

One thing we lose with streaming is the usage field. So if you need to know how many tokens the request used you'll need to count them yourself.^[1]

Response Fields

model

References

↑ https://gpt.pomb.us/

[”1”-1] ttps://gpt.pomb.us/

[1]