text-embedding-3
Last reviewed
Jun 3, 2026
Sources
5 citations
Review status
Source-backed
Revision
v1 · 1,110 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
Jun 3, 2026
Sources
5 citations
Review status
Source-backed
Revision
v1 · 1,110 words
Add missing citations, update stale details, or suggest a clearer explanation.
text-embedding-3 is a family of third-generation text embedding models released by OpenAI on January 25, 2024. The family consists of two models, text-embedding-3-small and text-embedding-3-large, both served through the OpenAI API. They succeeded text-embedding-ada-002, the single embedding model OpenAI had offered since December 2022, and outperformed it on standard benchmarks for both English and multilingual retrieval. The release also introduced a native ability to shorten the output vectors, allowing developers to trade a small amount of accuracy for reduced storage and compute cost. [1][2]
A text embedding is a list of floating-point numbers (a vector) that represents the meaning of a piece of text, such that texts with similar meanings produce vectors that are close together in the vector space. Embeddings are a foundational building block for semantic search, clustering, recommendation systems, and the retrieval step of retrieval-augmented generation. [3]
Before January 2024, OpenAI's recommended embedding model was text-embedding-ada-002, a 1,536-dimensional model released in December 2022 that had itself consolidated several earlier first-generation embedding models into a single endpoint. The text-embedding-3 generation kept the same general interface (text in, a fixed-length vector out) but improved measured quality, lowered prices, and added the ability to request shorter vectors directly from the API. OpenAI positioned the small model as a direct, lower-cost upgrade path for existing ada-002 users and the large model as a new top tier for accuracy-sensitive workloads. [1][2]
Both new models accept up to 8,191 input tokens per request, the same context length as ada-002, and were trained with data up to a September 2021 knowledge cutoff. [4]
OpenAI shipped two models aimed at different points on the cost-versus-quality spectrum. text-embedding-3-small is the smaller, cheaper, faster option intended as a drop-in upgrade for most applications, while text-embedding-3-large targets the highest accuracy and produces larger vectors by default. [1]
| Model | Default dimensions | Maximum dimensions | Max input tokens | Price (per 1M tokens) |
|---|---|---|---|---|
| text-embedding-3-small | 1,536 | 1,536 | 8,191 | $0.02 |
| text-embedding-3-large | 3,072 | 3,072 | 8,191 | $0.13 |
The model is selected through the model field in a standard embeddings API call. Each request returns one vector per input string. The numbers in a vector are not individually interpretable; only their geometric relationships (typically compared using cosine similarity) carry meaning. [3][4]
The headline new capability of the text-embedding-3 family is the dimensions API parameter, which lets a developer request an embedding shorter than the model's native length. For example, a value of 256 or 1,024 can be passed to receive a 256- or 1,024-dimensional vector from text-embedding-3-large instead of the full 3,072 dimensions. Shorter vectors use less memory and storage in a vector database and are faster to compare, at the cost of some accuracy. [1][2]
This works because the models were trained with a technique called Matryoshka Representation Learning (MRL), named after Russian nesting dolls. MRL trains the model so that the most important semantic information is concentrated in the earliest entries of the vector. As a result, the embedding can be shortened simply by removing numbers from the end of the sequence, and the truncated vector still represents the text's concepts. OpenAI's recommended practice is to truncate the vector and then renormalize it (rescale it to unit length) before use. [1][3]
The trade-off is favorable: OpenAI reported that on the MTEB benchmark, a text-embedding-3-large embedding shortened to 256 dimensions still outperforms an unshortened text-embedding-ada-002 embedding of 1,536 dimensions. This means a vector six times smaller than ada-002's can deliver higher measured quality. [1][2]
OpenAI published average benchmark scores comparing the new models against ada-002 on two widely used evaluations: MTEB (the Massive Text Embedding Benchmark, measured in English) and MIRACL (a multilingual retrieval benchmark spanning many languages). On both, the text-embedding-3 models scored higher, with the largest gains appearing on the multilingual MIRACL benchmark. [1]
| Model | MTEB (English) average | MIRACL (multilingual) average |
|---|---|---|
| text-embedding-ada-002 | 61.0% | 31.4% |
| text-embedding-3-small | 62.3% | 44.0% |
| text-embedding-3-large | 64.6% | 54.9% |
The MIRACL improvement is substantial: text-embedding-3-large nearly doubled ada-002's multilingual score, rising from 31.4% to 54.9%, while even the smaller model jumped to 44.0%. The English MTEB gains were more modest, since ada-002 was already a strong English model, but text-embedding-3-large still set a new high of 64.6% among OpenAI's own models. [1][2]
text-embedding-3-small was priced at $0.00002 per 1,000 tokens (equivalently $0.02 per 1 million tokens), a roughly fivefold reduction from text-embedding-ada-002's $0.0001 per 1,000 tokens. text-embedding-3-large, the higher-accuracy model, was priced at $0.00013 per 1,000 tokens ($0.13 per 1 million tokens). [1]
| Model | Price per 1k tokens | Price per 1M tokens | Relative to ada-002 |
|---|---|---|---|
| text-embedding-ada-002 | $0.0001 | $0.10 | baseline |
| text-embedding-3-small | $0.00002 | $0.02 | about 5x cheaper |
| text-embedding-3-large | $0.00013 | $0.13 | about 1.3x more expensive |
Because billing is based only on input tokens (embeddings produce no generated output to charge for), the dimensions parameter does not change the price of a request; shortening vectors saves on downstream storage and similarity-search cost rather than on the embedding call itself. [3][4]
OpenAI's documentation lists several standard applications for embeddings, all of which the text-embedding-3 models support: [3]
In practice the most common deployment is retrieval-augmented generation, in which documents are embedded and stored in a vector database, a user's query is embedded at request time, and the most similar documents are retrieved to ground a language model's answer. The dimensions parameter is particularly relevant here: teams operating at large scale can store 256- or 1,024-dimensional vectors from text-embedding-3-large to cut vector-database memory and similarity-search latency while retaining most of the model's quality advantage over ada-002. The text-embedding-3 models are text-only and not multimodal; for joint image-and-text embeddings, OpenAI's separate CLIP line is used instead. [1][3]