LLM Comparisons: Difference between revisions
No edit summary |
|||
(9 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
Compare different [[large language models]] ([[LLM]]s) | {{see also|LLM Benchmarks Timeline|LLM Rankings}} | ||
Compare different [[large language models]] ([[LLM]]s). | |||
{| class="wikitable sortable" | {| class="wikitable sortable" | ||
! Model | ! Model | ||
! Creator | ! Creator | ||
! Context Window | ! Context Window | ||
! Quality Index<br>(Normalized avg) | ! Quality Index<br>(Normalized avg) | ||
! Blended<br>(USD/1M Tokens) | ! Blended<br>(USD/1M Tokens) | ||
! Median<br>(Tokens/s) | ! Median<br>(Tokens/s) | ||
! Median<br>(First Chunk (s)) | ! Median<br>(First Chunk (s)) | ||
|- | |- | ||
| '''[[o1-preview]]''' | | '''[[o1-preview]]''' || [[OpenAI]] || 128k || 86 || $27.56 || 143.7 || 21.33 | ||
| OpenAI | |||
| | |||
| 128k | |||
| 86 | |||
| | |||
| $27.56 | |||
| | |||
| 143. | |||
| | |||
| 21. | |||
|- | |- | ||
| '''[[o1-mini]]''' | | '''[[o1-mini]]''' || [[OpenAI]] || 128k || 84 || $5.25 || 213.2 || 11.27 | ||
| OpenAI | |||
| | |||
| 128k | |||
| 84 | |||
| | |||
| $5.25 | |||
| | |||
| 213. | |||
| | |||
| 11. | |||
|- | |- | ||
| '''[[GPT-4o (Aug '24)]]''' | | '''[[GPT-4o (Aug '24)]]''' || [[OpenAI]] || 128k || 78 || $4.38 || 83.5 || 0.67 | ||
| OpenAI | |||
| | |||
| 128k | |||
| 78 | |||
| | |||
| $4.38 | |||
| | |||
| | |||
| | |||
| 0. | |||
|- | |- | ||
| '''[[GPT-4o (May '24)]]''' | | '''[[GPT-4o (May '24)]]''' || [[OpenAI]] || 128k || 78 || $7.50 || 106.3 || 0.65 | ||
| OpenAI | |||
| | |||
| 128k | |||
| 78 | |||
| | |||
| $7.50 | |||
| | |||
| 106. | |||
| | |||
| 0.65 | |||
|- | |- | ||
| '''[[GPT-4o mini]]''' | | '''[[GPT-4o mini]]''' || [[OpenAI]] || 128k || 73 || $0.26 || 113.8 || 0.64 | ||
| OpenAI | |||
| | |||
| 128k | |||
| 73 | |||
| | |||
| $0.26 | |||
| | |||
| | |||
| | |||
| 0. | |||
|- | |- | ||
| '''[[GPT-4o (Nov '24)]]''' | | '''[[GPT-4o (Nov '24)]]''' || [[OpenAI]] || 128k || 73 || $4.38 || 116.4 || 0.39 | ||
| OpenAI | |||
| | |||
| 128k | |||
| 73 | |||
| | |||
| $4.38 | |||
| | |||
| | |||
| | |||
| | |||
|- | |- | ||
| '''[[GPT-4o mini Realtime (Dec '24)]]''' | | '''[[GPT-4o mini Realtime (Dec '24)]]''' || [[OpenAI]] || 128k || || $0.00 || || | ||
| OpenAI | |||
| | |||
| 128k | |||
| | |||
| | |||
| | |||
| $0.00 | |||
| | |||
| | |||
| | |||
| | |||
|- | |- | ||
| '''[[GPT-4o Realtime (Dec '24)]]''' | | '''[[GPT-4o Realtime (Dec '24)]]''' || [[OpenAI]] || 128k || || $0.00 || || | ||
| OpenAI | |||
| | |||
| 128k | |||
| | |||
| | |||
| | |||
| $0.00 | |||
| | |||
| | |||
| | |||
| | |||
|- | |- | ||
| '''[[Llama 3.3 70B]]''' | | '''[[Llama 3.3 70B]]''' || [[Meta]] || 128k || 74 || $0.69 || 71.8 || 0.49 | ||
| Meta | |||
| | |||
| 128k | |||
| 74 | |||
| | |||
| $0. | |||
| | |||
| | |||
| | |||
| 0. | |||
|- | |- | ||
| '''[[Llama 3.1 405B]]''' | | '''[[Llama 3.1 405B]]''' || [[Meta]] || 128k || 74 || $3.50 || 30.2 || 0.71 | ||
| Meta | |||
| | |||
| 128k | |||
| 74 | |||
| | |||
| $3.50 | |||
| | |||
| 30.2 | |||
| | |||
| 0. | |||
|- | |- | ||
| '''[[Llama 3.1 70B]]''' | | '''[[Llama 3.1 70B]]''' || [[Meta]] || 128k || 68 || $0.72 || 72.8 || 0.44 | ||
| Meta | |||
| | |||
| 128k | |||
| 68 | |||
| | |||
| $0. | |||
| | |||
| 72. | |||
| | |||
| 0.44 | |||
|- | |- | ||
| '''[[Llama 3.2 90B (Vision)]]''' | | '''[[Llama 3.2 90B (Vision)]]''' || [[Meta]] || 128k || 68 || $0.81 || 48.9 || 0.33 | ||
| Meta | |||
| | |||
| 128k | |||
| 68 | |||
| | |||
| $0.81 | |||
| | |||
| 48 | |||
| | |||
| 0. | |||
|- | |- | ||
| '''[[Llama 3.2 11B (Vision)]]''' | | '''[[Llama 3.2 11B (Vision)]]''' || [[Meta]] || 128k || 54 || $0.18 || 131.2 || 0.28 | ||
| Meta | |||
| | |||
| 128k | |||
| 54 | |||
| | |||
| $0.18 | |||
| | |||
| 131 | |||
| | |||
| 0. | |||
|- | |- | ||
| '''[[Llama 3.1 8B]]''' | | '''[[Llama 3.1 8B]]''' || [[Meta]] || 128k || 54 || $0.10 || 184.9 || 0.33 | ||
| Meta | |||
| | |||
| 128k | |||
| 54 | |||
| | |||
| $0.10 | |||
| | |||
| | |||
| | |||
| 0. | |||
|- | |- | ||
| '''[[Llama 3.2 3B]]''' | | '''[[Llama 3.2 3B]]''' || [[Meta]] || 128k || 49 || $0.06 || 201.4 || 0.38 | ||
| Meta | |||
| | |||
| 128k | |||
| 49 | |||
| | |||
| $0.06 | |||
| | |||
| | |||
| | |||
| 0.38 | |||
|- | |- | ||
| '''[[Llama 3.2 1B]]''' | | '''[[Llama 3.2 1B]]''' || [[Meta]] || 128k || 26 || $0.04 || 468.6 || 0.37 | ||
| Meta | |||
| | |||
| 128k | |||
| 26 | |||
| | |||
| $0.04 | |||
| | |||
| | |||
| | |||
| 0. | |||
|- | |- | ||
| '''[[Gemini 2.0 Flash (exp)]]''' | | '''[[Gemini 2.0 Flash (exp)]]''' || [[Google]] || 1m || 82 || $0.00 || 169.0 || 0.48 | ||
| Google | |||
| | |||
| 1m | |||
| 82 | |||
| | |||
| $0.00 | |||
| | |||
| | |||
| 0.48 | |||
|- | |- | ||
| '''[[Gemini 1.5 Pro (Sep)]]''' | | '''[[Gemini 1.5 Pro (Sep)]]''' || [[Google]] || 2m || 80 || $2.19 || 60.8 || 0.74 | ||
| Google | |||
| | |||
| 2m | |||
| 80 | |||
| | |||
| $2.19 | |||
| | |||
| 60. | |||
| | |||
| 0.74 | |||
|- | |- | ||
| '''[[Gemini 1.5 Flash (Sep)]]''' | | '''[[Gemini 1.5 Flash (Sep)]]''' || [[Google]] || 1m || 72 || $0.13 || 188.4 || 0.25 | ||
| Google | |||
| | |||
| 1m | |||
| 72 | |||
| | |||
| $0.13 | |||
| | |||
| 188. | |||
| | |||
| 0.25 | |||
|- | |- | ||
| '''[[Gemma 2 27B]]''' | | '''[[Gemma 2 27B]]''' || [[Google]] || 8k || 61 || $0.26 || 59.4 || 0.48 | ||
| Google | |||
| | |||
| 8k | |||
| 61 | |||
| | |||
| $0.26 | |||
| | |||
| | |||
| | |||
| 0. | |||
|- | |- | ||
| '''[[Gemma 2 9B]]''' | | '''[[Gemma 2 9B]]''' || [[Google]] || 8k || 55 || $0.12 || 168.9 || 0.36 | ||
| Google | |||
| | |||
| 8k | |||
| 55 | |||
| | |||
| $0.12 | |||
| | |||
| 168.9 | |||
| | |||
| 0. | |||
|- | |- | ||
| '''[[Gemini 1.5 Flash (May)]]''' | | '''[[Gemini 1.5 Flash (May)]]''' || [[Google]] || 1m || || $0.13 || 310.6 || 0.29 | ||
| Google | |||
| | |||
| 1m | |||
| | |||
| | |||
| | |||
| $0.13 | |||
| | |||
| 310. | |||
| | |||
| 0. | |||
|- | |- | ||
| '''[[Gemini Experimental (Nov)]]''' | | '''[[Gemini Experimental (Nov)]]''' || [[Google]] || 2m || || $0.00 || 53.9 || 1.12 | ||
| Google | |||
| | |||
| 2m | |||
| | |||
| | |||
| | |||
| $0.00 | |||
| | |||
| 53. | |||
| | |||
| 1.12 | |||
|- | |- | ||
| '''[[Gemini 1.5 Pro (May)]]''' | | '''[[Gemini 1.5 Pro (May)]]''' || [[Google]] || 2m || || $2.19 || 66.9 || 0.49 | ||
| Google | |||
| | |||
| 2m | |||
| | |||
| | |||
| | |||
| $2.19 | |||
| | |||
| 66.9 | |||
| | |||
| 0. | |||
|- | |- | ||
| '''[[Gemini 1.5 Flash-8B]]''' | | '''[[Gemini 1.5 Flash-8B]]''' || [[Google]] || 1m || || $0.07 || 279.7 || 0.38 | ||
| Google | |||
| | |||
| 1m | |||
| | |||
| | |||
| | |||
| $0.07 | |||
| | |||
| 279. | |||
| | |||
| 0. | |||
|- | |- | ||
| '''[[Claude 3.5 Sonnet (Oct)]]''' | | '''[[Claude 3.5 Sonnet (Oct)]]''' || [[Anthropic]] || 200k || 80 || $6.00 || 72.0 || 0.99 | ||
| Anthropic | |||
| | |||
| 200k | |||
| 80 | |||
| | |||
| $6.00 | |||
| | |||
| | |||
| | |||
| 0. | |||
|- | |- | ||
| '''[[Claude 3.5 Sonnet (June)]]''' | | '''[[Claude 3.5 Sonnet (June)]]''' || [[Anthropic]] || 200k || 76 || $6.00 || 61.5 || 0.87 | ||
| Anthropic | |||
| | |||
| 200k | |||
| 76 | |||
| | |||
| $6.00 | |||
| | |||
| 61. | |||
| | |||
| 0.87 | |||
|- | |- | ||
| '''[[Claude 3 Opus]]''' | | '''[[Claude 3 Opus]]''' || [[Anthropic]] || 200k || 70 || $30.00 || 25.9 || 2.00 | ||
| Anthropic | |||
| | |||
| 200k | |||
| 70 | |||
| | |||
| $30.00 | |||
| | |||
| 25.9 | |||
| | |||
| 2. | |||
|- | |- | ||
| '''[[Claude 3.5 Haiku]]''' | | '''[[Claude 3.5 Haiku]]''' || [[Anthropic]] || 200k || 68 || $1.60 || 65.1 || 0.71 | ||
| Anthropic | |||
| | |||
| 200k | |||
| 68 | |||
| | |||
| $1.60 | |||
| | |||
| 65.1 | |||
| | |||
| 0.71 | |||
|- | |- | ||
| '''[[Claude 3 Haiku]]''' | | '''[[Claude 3 Haiku]]''' || [[Anthropic]] || 200k || 55 || $0.50 || 121.6 || 0.72 | ||
| Anthropic | |||
| | |||
| 200k | |||
| 55 | |||
| | |||
| $0.50 | |||
| | |||
| | |||
| | |||
| 0.72 | |||
|- | |- | ||
| '''[[Pixtral Large]]''' | | '''[[Pixtral Large]]''' || [[Mistral]] || 128k || 74 || $3.00 || 36.5 || 0.40 | ||
| Mistral | |||
| | |||
| 128k | |||
| 74 | |||
| | |||
| $3.00 | |||
| | |||
| 36. | |||
| | |||
| 0.40 | |||
|- | |- | ||
| '''[[Mistral Large 2 (Jul '24)]]''' | | '''[[Mistral Large 2 (Jul '24)]]''' || [[Mistral]] || 128k || 74 || $3.00 || 31.1 || 0.50 | ||
| Mistral | |||
| | |||
| 128k | |||
| 74 | |||
| | |||
| $3.00 | |||
| | |||
| 31. | |||
| | |||
| 0.50 | |||
|- | |- | ||
| '''[[Mistral Large 2 (Nov '24)]]''' | | '''[[Mistral Large 2 (Nov '24)]]''' || [[Mistral]] || 128k || 74 || $3.00 || 37.4 || 0.52 | ||
| Mistral | |||
| | |||
| 128k | |||
| 74 | |||
| | |||
| $3.00 | |||
| | |||
| 37.4 | |||
| | |||
| 0.52 | |||
|- | |- | ||
| '''[[Mistral Small (Sep '24)]]''' | | '''[[Mistral Small (Sep '24)]]''' || [[Mistral]] || 33k || 61 || $0.30 || 61.5 || 0.32 | ||
| Mistral | |||
| | |||
| 33k | |||
| 61 | |||
| | |||
| $0.30 | |||
| | |||
| | |||
| | |||
| 0.32 | |||
|- | |- | ||
| '''[[Mixtral 8x22B]]''' | | '''[[Mixtral 8x22B]]''' || [[Mistral]] || 65k || 61 || $1.20 || 85.1 || 0.57 | ||
| Mistral | |||
| | |||
| 65k | |||
| 61 | |||
| | |||
| $1.20 | |||
| | |||
| 85. | |||
| | |||
| 0.57 | |||
|- | |- | ||
| '''[[Pixtral 12B]]''' | | '''[[Pixtral 12B]]''' || [[Mistral]] || 128k || 56 || $0.13 || 70.3 || 0.37 | ||
| Mistral | |||
| | |||
| 128k | |||
| 56 | |||
| | |||
| $0.13 | |||
| | |||
| 70 | |||
| | |||
| 0.37 | |||
|- | |- | ||
| '''[[Ministral 8B]]''' | | '''[[Ministral 8B]]''' || [[Mistral]] || 128k || 56 || $0.10 || 136.1 || 0.30 | ||
| Mistral | |||
| | |||
| 128k | |||
| 56 | |||
| | |||
| $0.10 | |||
| | |||
| | |||
| | |||
| 0. | |||
|- | |- | ||
| '''[[Mistral NeMo]]''' | | '''[[Mistral NeMo]]''' || [[Mistral]] || 128k || 54 || $0.09 || 122.5 || 0.48 | ||
| Mistral | |||
| | |||
| 128k | |||
| 54 | |||
| | |||
| $0.09 | |||
| | |||
| | |||
| | |||
| 0.48 | |||
|- | |- | ||
| '''[[Ministral 3B]]''' | | '''[[Ministral 3B]]''' || [[Mistral]] || 128k || 53 || $0.04 || 168.5 || 0.29 | ||
| Mistral | |||
| | |||
| 128k | |||
| 53 | |||
| | |||
| $0.04 | |||
| | |||
| 168. | |||
| | |||
| 0.29 | |||
|- | |- | ||
| '''[[Mixtral 8x7B]]''' | | '''[[Mixtral 8x7B]]''' || [[Mistral]] || 33k || 41 || $0.50 || 110.6 || 0.36 | ||
| Mistral | |||
| | |||
| 33k | |||
| 41 | |||
| | |||
| $0.50 | |||
| | |||
| 110. | |||
| | |||
| 0.36 | |||
|- | |- | ||
| '''[[Codestral-Mamba]]''' | | '''[[Codestral-Mamba]]''' || [[Mistral]] || 256k || 33 || $0.25 || 95.8 || 0.44 | ||
| Mistral | |||
| | |||
| 256k | |||
| 33 | |||
| | |||
| $0.25 | |||
| | |||
| 95.8 | |||
| | |||
| 0.44 | |||
|- | |- | ||
| '''[[Command-R+]]''' | | '''[[Command-R+]]''' || [[Cohere]] || 128k || 55 || $5.19 || 50.7 || 0.47 | ||
| Cohere | |||
| | |||
| 128k | |||
| 55 | |||
| | |||
| $5.19 | |||
| | |||
| 50.7 | |||
| | |||
| 0.47 | |||
|- | |- | ||
| '''[[Command-R+ (Apr '24)]]''' | | '''[[Command-R+ (Apr '24)]]''' || [[Cohere]] || 128k || 45 || $6.00 || 49.3 || 0.51 | ||
| Cohere | |||
| | |||
| 128k | |||
| 45 | |||
| | |||
| $6.00 | |||
| | |||
| 49.3 | |||
| | |||
| 0.51 | |||
|- | |- | ||
| '''[[Command-R (Mar '24)]]''' | | '''[[Command-R (Mar '24)]]''' || [[Cohere]] || 128k || 36 || $0.75 || 108.1 || 0.36 | ||
| Cohere | |||
| | |||
| 128k | |||
| 36 | |||
| | |||
| $0.75 | |||
| | |||
| 108. | |||
| | |||
| 0.36 | |||
|- | |- | ||
| '''[[Aya Expanse 8B]]''' | | '''[[Aya Expanse 8B]]''' || [[Cohere]] || 8k || || $0.75 || 165.5 || 0.16 | ||
| Cohere | |||
| | |||
| 8k | |||
| | |||
| | |||
| | |||
| $0.75 | |||
| | |||
| 165 | |||
| | |||
| 0.16 | |||
|- | |- | ||
| '''[[Command-R]]''' | | '''[[Command-R]]''' || [[Cohere]] || 128k || || $0.51 || 111.8 || 0.32 | ||
| Cohere | |||
| | |||
| 128k | |||
| | |||
| | |||
| | |||
| $0.51 | |||
| | |||
| 111.8 | |||
| | |||
| 0.32 | |||
|- | |- | ||
| '''[[Aya Expanse 32B]]''' | | '''[[Aya Expanse 32B]]''' || [[Cohere]] || 128k || || $0.75 || 120.4 || 0.18 | ||
| Cohere | |||
| | |||
| 128k | |||
| | |||
| | |||
| | |||
| $0.75 | |||
| | |||
| 120 | |||
| | |||
| 0.18 | |||
|- | |- | ||
| '''[[Sonar 3.1 Small]]''' | | '''[[Sonar 3.1 Small]]''' || [[Perplexity]] || 127k || || $0.20 || 203.8 || 0.34 | ||
| Perplexity | |||
| | |||
| 127k | |||
| | |||
| | |||
| | |||
| $0.20 | |||
| | |||
| 203.8 | |||
| | |||
| 0. | |||
|- | |- | ||
| '''[[Sonar 3.1 Large]]''' | | '''[[Sonar 3.1 Large]]''' || [[Perplexity]] || 127k || || $1.00 || 57.7 || 0.31 | ||
| Perplexity | |||
| | |||
| 127k | |||
| | |||
| | |||
| | |||
| $1.00 | |||
| | |||
| 57 | |||
| | |||
| 0.31 | |||
|- | |- | ||
| '''[[Grok Beta]]''' | | '''[[Grok Beta]]''' || [[xAI]] || 128k || 72 || $7.50 || 66.7 || 0.42 | ||
| xAI | |||
| | |||
| 128k | |||
| 72 | |||
| | |||
| $7.50 | |||
| | |||
| 66. | |||
| | |||
| 0.42 | |||
|- | |- | ||
| '''[[Nova Pro]]''' | | '''[[Nova Pro]]''' || [[Amazon]] || 300k || 75 || $1.40 || 91.0 || 0.38 | ||
| Amazon | |||
| | |||
| 300k | |||
| 75 | |||
| | |||
| $1.40 | |||
| | |||
| 91. | |||
| | |||
| 0.38 | |||
|- | |- | ||
| '''[[Nova Lite]]''' | | '''[[Nova Lite]]''' || [[Amazon]] || 300k || 70 || $0.10 || 148.0 || 0.33 | ||
| Amazon | |||
| | |||
| 300k | |||
| 70 | |||
| | |||
| $0.10 | |||
| | |||
| 148.0 | |||
| | |||
| 0.33 | |||
|- | |- | ||
| '''[[Nova Micro]]''' | | '''[[Nova Micro]]''' || [[Amazon]] || 130k || 66 || $0.06 || 195.5 || 0.33 | ||
| Amazon | |||
| | |||
| 130k | |||
| 66 | |||
| | |||
| $0.06 | |||
| | |||
| 195. | |||
| | |||
| 0.33 | |||
|- | |- | ||
| '''[[Phi-4]]''' | | '''[[Phi-4]]''' || [[Microsoft Azure]] || 16k || 77 || $0.09 || 85.0 || 0.22 | ||
| Microsoft Azure | |||
| | |||
| 16k | |||
| 77 | |||
| | |||
| $0.09 | |||
| | |||
| 85. | |||
| | |||
| 0. | |||
|- | |- | ||
| '''[[Phi-3 Mini]]''' | | '''[[Phi-3 Mini]]''' || [[Microsoft Azure]] || 4k || || $0.00 || || | ||
| Microsoft Azure | |||
| | |||
| 4k | |||
| | |||
| | |||
| | |||
| $0.00 | |||
| | |||
| | |||
| | |||
| | |||
|- | |- | ||
| '''[[Phi-3 Medium 14B]]''' | | '''[[Phi-3 Medium 14B]]''' || [[Microsoft Azure]] || 128k || || $0.30 || 50.4 || 0.43 | ||
| Microsoft Azure | |||
| | |||
| 128k | |||
| | |||
| | |||
| | |||
| $0.30 | |||
| | |||
| 50. | |||
| | |||
| 0.43 | |||
|- | |- | ||
| '''[[Solar Mini]]''' | | '''[[Solar Mini]]''' || [[Upstage]] || 4k || 47 || $0.15 || 89.3 || 1.13 | ||
| Upstage | |||
| | |||
| 4k | |||
| 47 | |||
| | |||
| $0.15 | |||
| | |||
| 89 | |||
| | |||
| 1. | |||
|- | |- | ||
| '''[[DBRX]]''' | | '''[[DBRX]]''' || [[Databricks]] || 33k || 46 || $1.16 || 78.3 || 0.42 | ||
| Databricks | |||
| | |||
| 33k | |||
| 46 | |||
| | |||
| $1.16 | |||
| | |||
| | |||
| | |||
| 0. | |||
|- | |- | ||
| '''[[Llama 3.1 Nemotron 70B]]''' | | '''[[Llama 3.1 Nemotron 70B]]''' || [[NVIDIA]] || 128k || 72 || $0.27 || 48.3 || 0.57 | ||
| NVIDIA | |||
| | |||
| 128k | |||
| 72 | |||
| | |||
| $0.27 | |||
| | |||
| 48.3 | |||
| | |||
| 0.57 | |||
|- | |- | ||
| '''[[Reka Flash]]''' | | '''[[Reka Flash]]''' || [[Reka AI]] || 128k || 59 || $0.35 || || | ||
| Reka AI | |||
| | |||
| 128k | |||
| 59 | |||
| | |||
| $0.35 | |||
| | |||
| | |||
| | |||
| | |||
|- | |- | ||
| '''[[Reka Core]]''' | | '''[[Reka Core]]''' || [[Reka AI]] || 128k || 58 || $2.00 || || | ||
| Reka AI | |||
| | |||
| 128k | |||
| 58 | |||
| | |||
| $2.00 | |||
| | |||
| | |||
| | |||
| | |||
|- | |- | ||
| '''[[Reka Flash (Feb '24)]]''' | | '''[[Reka Flash (Feb '24)]]''' || [[Reka AI]] || 128k || 46 || $0.35 || || | ||
| Reka AI | |||
| | |||
| 128k | |||
| 46 | |||
| | |||
| $0.35 | |||
| | |||
| | |||
| | |||
| | |||
|- | |- | ||
| '''[[Reka Edge]]''' | | '''[[Reka Edge]]''' || [[Reka AI]] || 128k || 31 || $0.10 || || | ||
| Reka AI | |||
| | |||
| 128k | |||
| 31 | |||
| | |||
| $0.10 | |||
| | |||
| | |||
| | |||
| | |||
|- | |- | ||
| '''[[Jamba 1.5 Large]]''' | | '''[[Jamba 1.5 Large]]''' || [[AI21 Labs]] || 256k || 64 || $3.50 || 51.0 || 0.71 | ||
| AI21 Labs | |||
| | |||
| 256k | |||
| 64 | |||
| | |||
| $3.50 | |||
| | |||
| 51. | |||
| | |||
| 0. | |||
|- | |- | ||
| '''[[Jamba 1.5 Mini]]''' | | '''[[Jamba 1.5 Mini]]''' || [[AI21 Labs]] || 256k || || $0.25 || 83.7 || 0.48 | ||
| AI21 Labs | |||
| | |||
| 256k | |||
| | |||
| | |||
| | |||
| $0.25 | |||
| | |||
| 83. | |||
| | |||
| 0.48 | |||
|- | |- | ||
| '''[[DeepSeek V3]]''' | | '''[[DeepSeek V3]]''' || [[DeepSeek]] || 128k || 80 || $0.90 || 20.9 || 0.94 | ||
| DeepSeek | |||
| | |||
| 128k | |||
| 80 | |||
| | |||
| $0.90 | |||
| | |||
| | |||
| | |||
| 0. | |||
|- | |- | ||
| '''[[DeepSeek-V2.5 (Dec '24)]]''' | | '''[[DeepSeek-V2.5 (Dec '24)]]''' || [[DeepSeek]] || 128k || 72 || $0.17 || 61.8 || 1.15 | ||
| DeepSeek | |||
| | |||
| 128k | |||
| 72 | |||
| | |||
| $0.17 | |||
| | |||
| | |||
| | |||
| 1. | |||
|- | |- | ||
| '''[[DeepSeek-Coder-V2]]''' | | '''[[DeepSeek-Coder-V2]]''' || [[DeepSeek]] || 128k || 71 || $0.17 || 62.0 || 1.11 | ||
| DeepSeek | |||
| | |||
| 128k | |||
| 71 | |||
| | |||
| $0.17 | |||
| | |||
| | |||
| | |||
| 1. | |||
|- | |- | ||
| '''[[DeepSeek-V2.5]]''' | | '''[[DeepSeek-V2.5]]''' || [[DeepSeek]] || 128k || || $1.09 || 7.6 || 0.77 | ||
| DeepSeek | |||
| | |||
| 128k | |||
| | |||
| | |||
| | |||
| $1.09 | |||
| | |||
| 7.6 | |||
| | |||
| 0.77 | |||
|- | |- | ||
| '''[[DeepSeek-V2]]''' | | '''[[DeepSeek-V2]]''' || [[DeepSeek]] || 128k || || $0.17 || || | ||
| DeepSeek | |||
| | |||
| 128k | |||
| | |||
| | |||
| | |||
| $0.17 | |||
| | |||
| | |||
| | |||
| | |||
|- | |- | ||
| '''[[Arctic]]''' | | '''[[Arctic]]''' || [[Snowflake]] || 4k || 51 || $0.00 || || | ||
| Snowflake | |||
| | |||
| 4k | |||
| 51 | |||
| | |||
| $0.00 | |||
| | |||
| | |||
| | |||
| | |||
|- | |- | ||
| '''[[Qwen2.5 72B]]''' | | '''[[Qwen2.5 72B]]''' || [[Alibaba]] || 131k || 77 || $0.40 || 67.6 || 0.53 | ||
| Alibaba | |||
| | |||
| 131k | |||
| 77 | |||
| | |||
| $0.40 | |||
| | |||
| | |||
| | |||
| 0. | |||
|- | |- | ||
| '''[[Qwen2.5 Coder 32B]]''' | | '''[[Qwen2.5 Coder 32B]]''' || [[Alibaba]] || 131k || 72 || $0.80 || 84.0 || 0.38 | ||
| Alibaba | |||
| | |||
| 131k | |||
| 72 | |||
| | |||
| $0.80 | |||
| | |||
| 84. | |||
| | |||
| 0. | |||
|- | |- | ||
| '''[[Qwen2 72B]]''' | | '''[[Qwen2 72B]]''' || [[Alibaba]] || 131k || 72 || $0.63 || 46.5 || 0.30 | ||
| Alibaba | |||
| | |||
| 131k | |||
| 72 | |||
| | |||
| $0.63 | |||
| | |||
| | |||
| | |||
| 0.30 | |||
|- | |- | ||
| '''[[QwQ 32B-Preview]]''' | | '''[[QwQ 32B-Preview]]''' || [[Alibaba]] || 33k || 46 || $0.26 || 67.3 || 0.40 | ||
| Alibaba | |||
| | |||
| 33k | |||
| 46 | |||
| | |||
| $0.26 | |||
| | |||
| | |||
| | |||
| 0.40 | |||
|- | |- | ||
| '''[[Yi-Large]]''' | | '''[[Yi-Large]]''' || [[01.AI]] || 32k || 61 || $3.00 || 68.1 || 0.47 | ||
| 01.AI | |||
| | |||
| 32k | |||
| 61 | |||
| | |||
| $3.00 | |||
| | |||
| | |||
| | |||
| 0.47 | |||
|- | |- | ||
| '''[[GPT-4 Turbo]]''' | | '''[[GPT-4 Turbo]]''' || [[OpenAI]] || 128k || 75 || $15.00 || 43.3 || 1.20 | ||
| OpenAI | |||
| | |||
| 128k | |||
| 75 | |||
| | |||
| $15.00 | |||
| | |||
| 43 | |||
| | |||
| 1. | |||
|- | |- | ||
| '''[[GPT-4]]''' | | '''[[GPT-4]]''' || [[OpenAI]] || 8k || || $37.50 || 28.4 || 0.75 | ||
| OpenAI | |||
| | |||
| 8k | |||
| | |||
| | |||
| | |||
| $37.50 | |||
| | |||
| | |||
| | |||
| 0. | |||
|- | |- | ||
| '''[[Llama 3 70B]]''' | | '''[[Llama 3 70B]]''' || [[Meta]] || 8k || 48 || $0.89 || 48.9 || 0.38 | ||
| Meta | |||
| | |||
| 8k | |||
| 48 | |||
| | |||
| $0. | |||
| | |||
| 48 | |||
| | |||
| 0. | |||
|- | |- | ||
| '''[[Llama 3 8B]]''' | | '''[[Llama 3 8B]]''' || [[Meta]] || 8k || 45 || $0.15 || 117.3 || 0.34 | ||
| Meta | |||
| | |||
| 8k | |||
| 45 | |||
| | |||
| $0. | |||
| | |||
| | |||
| | |||
| 0. | |||
|- | |- | ||
| '''[[Llama 2 Chat 70B]]''' | | '''[[Llama 2 Chat 70B]]''' || [[Meta]] || 4k || || $1.85 || || | ||
| Meta | |||
| | |||
| 4k | |||
| | |||
| | |||
| | |||
| $1.85 | |||
| | |||
| | |||
| | |||
| | |||
|- | |- | ||
| '''[[Llama 2 Chat 13B]]''' | | '''[[Llama 2 Chat 13B]]''' || [[Meta]] || 4k || || $0.00 || || | ||
| Meta | |||
| | |||
| 4k | |||
| | |||
| | |||
| | |||
| $0.00 | |||
| | |||
| | |||
| | |||
| | |||
|- | |- | ||
| '''[[Llama 2 Chat 7B]]''' | | '''[[Llama 2 Chat 7B]]''' || [[Meta]] || 4k || || $0.33 || 123.7 || 0.37 | ||
| Meta | |||
| | |||
| 4k | |||
| | |||
| | |||
| | |||
| $0.33 | |||
| | |||
| 123. | |||
| | |||
| 0.37 | |||
|- | |- | ||
| '''[[Gemini 1.0 Pro]]''' | | '''[[Gemini 1.0 Pro]]''' || [[Google]] || 33k || || $0.75 || 102.9 || 1.27 | ||
| Google | |||
| | |||
| 33k | |||
| | |||
| | |||
| | |||
| $0.75 | |||
| | |||
| | |||
| | |||
| | |||
|- | |- | ||
| '''[[Claude 3 Sonnet]]''' | | '''[[Claude 3 Sonnet]]''' || [[Anthropic]] || 200k || 57 || $6.00 || 68.2 || 0.76 | ||
| Anthropic | |||
| | |||
| 200k | |||
| 57 | |||
| | |||
| $6.00 | |||
| | |||
| | |||
| | |||
| 0. | |||
|- | |- | ||
| '''[[Claude 2.1]]''' | | '''[[Claude 2.1]]''' || [[Anthropic]] || 200k || || $12.00 || 14.1 || 1.24 | ||
| Anthropic | |||
| | |||
| 200k | |||
| | |||
| | |||
| | |||
| $12.00 | |||
| | |||
| | |||
| | |||
| 1. | |||
|- | |- | ||
| '''[[Claude 2.0]]''' | | '''[[Claude 2.0]]''' || [[Anthropic]] || 100k || || $12.00 || 29.9 || 0.81 | ||
| Anthropic | |||
| | |||
| 100k | |||
| | |||
| | |||
| | |||
| $12.00 | |||
| | |||
| 29.9 | |||
| | |||
| 0.81 | |||
|- | |- | ||
| '''[[Mistral Small (Feb '24)]]''' | | '''[[Mistral Small (Feb '24)]]''' || [[Mistral]] || 33k || 59 || $1.50 || 53.5 || 0.37 | ||
| Mistral | |||
| | |||
| 33k | |||
| 59 | |||
| | |||
| $1.50 | |||
| | |||
| 53. | |||
| | |||
| 0. | |||
|- | |- | ||
| '''[[Mistral Large (Feb '24)]]''' | | '''[[Mistral Large (Feb '24)]]''' || [[Mistral]] || 33k || 56 || $6.00 || 38.9 || 0.44 | ||
| Mistral | |||
| | |||
| 33k | |||
| 56 | |||
| | |||
| $6.00 | |||
| | |||
| 38. | |||
| | |||
| 0. | |||
|- | |- | ||
| '''[[Mistral 7B]]''' | | '''[[Mistral 7B]]''' || [[Mistral]] || 8k || 28 || $0.16 || 112.5 || 0.26 | ||
| Mistral | |||
| | |||
| 8k | |||
| 28 | |||
| | |||
| 0.16 | |||
| | |||
| | |||
| | |||
| | |||
|- | |- | ||
| '''[[Mistral Medium]]''' | | '''[[Mistral Medium]]''' || [[Mistral]] || 33k || || $4.09 || 44.7 || 0.37 | ||
| Mistral | |||
| | |||
| 33k | |||
| | |||
| | |||
| | |||
| $4.09 | |||
| | |||
| 44 | |||
| | |||
| 0. | |||
|- | |- | ||
| '''[[Codestral]]''' | | '''[[Codestral]]''' || [[Mistral]] || 33k || || $0.30 || 84.9 || 0.28 | ||
| Mistral | |||
| | |||
| 33k | |||
| | |||
| | |||
| | |||
| $0.30 | |||
| | |||
| 84. | |||
| | |||
| 0.28 | |||
|- | |- | ||
| '''[[OpenChat 3.5]]''' | | '''[[OpenChat 3.5]]''' || [[OpenChat]] || 8k || 44 || $0.06 || 73.5 || 0.30 | ||
| OpenChat | |||
| | |||
| 8k | |||
| 44 | |||
| | |||
| $0.06 | |||
| | |||
| 73. | |||
| | |||
| 0.30 | |||
|- | |- | ||
| '''[[Jamba Instruct]]''' | | '''[[Jamba Instruct]]''' || [[AI21 Labs]] || 256k || || $0.55 || 77.4 || 0.52 | ||
| AI21 Labs | |||
| | |||
| 256k | |||
| | |||
| | |||
| | |||
| $0.55 | |||
| | |||
| 77 | |||
| | |||
| 0.52 | |||
|} | |} | ||
<ref name="”1”">LLM Leaderboard - Comparison of GPT-4o, Llama 3, Mistral, Gemini and over 30 models https://artificialanalysis.ai/leaderboards/models</ref> | |||
==Terms== | ==Terms== | ||
Line 3,339: | Line 210: | ||
*'''Input Price''': Price per token included in the request/message sent to the API, represented as USD per million Tokens. | *'''Input Price''': Price per token included in the request/message sent to the API, represented as USD per million Tokens. | ||
*'''Time period''': Metrics are 'live' and are based on the past 14 days of measurements, measurements are taken 8 times a day for single requests and 2 times per day for parallel requests. | *'''Time period''': Metrics are 'live' and are based on the past 14 days of measurements, measurements are taken 8 times a day for single requests and 2 times per day for parallel requests. | ||
<ref name="”1”">LLM Leaderboard - Comparison of GPT-4o, Llama 3, Mistral, Gemini and over 30 models https://artificialanalysis.ai/leaderboards/models</ref> | |||
==References== | ==References== | ||
<references /> | |||
[[Category:Important]] [[Category:Rankings]] [[Category:Aggregate pages]] |
Latest revision as of 21:03, 13 January 2025
- See also: LLM Benchmarks Timeline and LLM Rankings
Compare different large language models (LLMs).
Model | Creator | Context Window | Quality Index (Normalized avg) |
Blended (USD/1M Tokens) |
Median (Tokens/s) |
Median (First Chunk (s)) |
---|---|---|---|---|---|---|
o1-preview | OpenAI | 128k | 86 | $27.56 | 143.7 | 21.33 |
o1-mini | OpenAI | 128k | 84 | $5.25 | 213.2 | 11.27 |
GPT-4o (Aug '24) | OpenAI | 128k | 78 | $4.38 | 83.5 | 0.67 |
GPT-4o (May '24) | OpenAI | 128k | 78 | $7.50 | 106.3 | 0.65 |
GPT-4o mini | OpenAI | 128k | 73 | $0.26 | 113.8 | 0.64 |
GPT-4o (Nov '24) | OpenAI | 128k | 73 | $4.38 | 116.4 | 0.39 |
GPT-4o mini Realtime (Dec '24) | OpenAI | 128k | $0.00 | |||
GPT-4o Realtime (Dec '24) | OpenAI | 128k | $0.00 | |||
Llama 3.3 70B | Meta | 128k | 74 | $0.69 | 71.8 | 0.49 |
Llama 3.1 405B | Meta | 128k | 74 | $3.50 | 30.2 | 0.71 |
Llama 3.1 70B | Meta | 128k | 68 | $0.72 | 72.8 | 0.44 |
Llama 3.2 90B (Vision) | Meta | 128k | 68 | $0.81 | 48.9 | 0.33 |
Llama 3.2 11B (Vision) | Meta | 128k | 54 | $0.18 | 131.2 | 0.28 |
Llama 3.1 8B | Meta | 128k | 54 | $0.10 | 184.9 | 0.33 |
Llama 3.2 3B | Meta | 128k | 49 | $0.06 | 201.4 | 0.38 |
Llama 3.2 1B | Meta | 128k | 26 | $0.04 | 468.6 | 0.37 |
Gemini 2.0 Flash (exp) | 1m | 82 | $0.00 | 169.0 | 0.48 | |
Gemini 1.5 Pro (Sep) | 2m | 80 | $2.19 | 60.8 | 0.74 | |
Gemini 1.5 Flash (Sep) | 1m | 72 | $0.13 | 188.4 | 0.25 | |
Gemma 2 27B | 8k | 61 | $0.26 | 59.4 | 0.48 | |
Gemma 2 9B | 8k | 55 | $0.12 | 168.9 | 0.36 | |
Gemini 1.5 Flash (May) | 1m | $0.13 | 310.6 | 0.29 | ||
Gemini Experimental (Nov) | 2m | $0.00 | 53.9 | 1.12 | ||
Gemini 1.5 Pro (May) | 2m | $2.19 | 66.9 | 0.49 | ||
Gemini 1.5 Flash-8B | 1m | $0.07 | 279.7 | 0.38 | ||
Claude 3.5 Sonnet (Oct) | Anthropic | 200k | 80 | $6.00 | 72.0 | 0.99 |
Claude 3.5 Sonnet (June) | Anthropic | 200k | 76 | $6.00 | 61.5 | 0.87 |
Claude 3 Opus | Anthropic | 200k | 70 | $30.00 | 25.9 | 2.00 |
Claude 3.5 Haiku | Anthropic | 200k | 68 | $1.60 | 65.1 | 0.71 |
Claude 3 Haiku | Anthropic | 200k | 55 | $0.50 | 121.6 | 0.72 |
Pixtral Large | Mistral | 128k | 74 | $3.00 | 36.5 | 0.40 |
Mistral Large 2 (Jul '24) | Mistral | 128k | 74 | $3.00 | 31.1 | 0.50 |
Mistral Large 2 (Nov '24) | Mistral | 128k | 74 | $3.00 | 37.4 | 0.52 |
Mistral Small (Sep '24) | Mistral | 33k | 61 | $0.30 | 61.5 | 0.32 |
Mixtral 8x22B | Mistral | 65k | 61 | $1.20 | 85.1 | 0.57 |
Pixtral 12B | Mistral | 128k | 56 | $0.13 | 70.3 | 0.37 |
Ministral 8B | Mistral | 128k | 56 | $0.10 | 136.1 | 0.30 |
Mistral NeMo | Mistral | 128k | 54 | $0.09 | 122.5 | 0.48 |
Ministral 3B | Mistral | 128k | 53 | $0.04 | 168.5 | 0.29 |
Mixtral 8x7B | Mistral | 33k | 41 | $0.50 | 110.6 | 0.36 |
Codestral-Mamba | Mistral | 256k | 33 | $0.25 | 95.8 | 0.44 |
Command-R+ | Cohere | 128k | 55 | $5.19 | 50.7 | 0.47 |
Command-R+ (Apr '24) | Cohere | 128k | 45 | $6.00 | 49.3 | 0.51 |
Command-R (Mar '24) | Cohere | 128k | 36 | $0.75 | 108.1 | 0.36 |
Aya Expanse 8B | Cohere | 8k | $0.75 | 165.5 | 0.16 | |
Command-R | Cohere | 128k | $0.51 | 111.8 | 0.32 | |
Aya Expanse 32B | Cohere | 128k | $0.75 | 120.4 | 0.18 | |
Sonar 3.1 Small | Perplexity | 127k | $0.20 | 203.8 | 0.34 | |
Sonar 3.1 Large | Perplexity | 127k | $1.00 | 57.7 | 0.31 | |
Grok Beta | xAI | 128k | 72 | $7.50 | 66.7 | 0.42 |
Nova Pro | Amazon | 300k | 75 | $1.40 | 91.0 | 0.38 |
Nova Lite | Amazon | 300k | 70 | $0.10 | 148.0 | 0.33 |
Nova Micro | Amazon | 130k | 66 | $0.06 | 195.5 | 0.33 |
Phi-4 | Microsoft Azure | 16k | 77 | $0.09 | 85.0 | 0.22 |
Phi-3 Mini | Microsoft Azure | 4k | $0.00 | |||
Phi-3 Medium 14B | Microsoft Azure | 128k | $0.30 | 50.4 | 0.43 | |
Solar Mini | Upstage | 4k | 47 | $0.15 | 89.3 | 1.13 |
DBRX | Databricks | 33k | 46 | $1.16 | 78.3 | 0.42 |
Llama 3.1 Nemotron 70B | NVIDIA | 128k | 72 | $0.27 | 48.3 | 0.57 |
Reka Flash | Reka AI | 128k | 59 | $0.35 | ||
Reka Core | Reka AI | 128k | 58 | $2.00 | ||
Reka Flash (Feb '24) | Reka AI | 128k | 46 | $0.35 | ||
Reka Edge | Reka AI | 128k | 31 | $0.10 | ||
Jamba 1.5 Large | AI21 Labs | 256k | 64 | $3.50 | 51.0 | 0.71 |
Jamba 1.5 Mini | AI21 Labs | 256k | $0.25 | 83.7 | 0.48 | |
DeepSeek V3 | DeepSeek | 128k | 80 | $0.90 | 20.9 | 0.94 |
DeepSeek-V2.5 (Dec '24) | DeepSeek | 128k | 72 | $0.17 | 61.8 | 1.15 |
DeepSeek-Coder-V2 | DeepSeek | 128k | 71 | $0.17 | 62.0 | 1.11 |
DeepSeek-V2.5 | DeepSeek | 128k | $1.09 | 7.6 | 0.77 | |
DeepSeek-V2 | DeepSeek | 128k | $0.17 | |||
Arctic | Snowflake | 4k | 51 | $0.00 | ||
Qwen2.5 72B | Alibaba | 131k | 77 | $0.40 | 67.6 | 0.53 |
Qwen2.5 Coder 32B | Alibaba | 131k | 72 | $0.80 | 84.0 | 0.38 |
Qwen2 72B | Alibaba | 131k | 72 | $0.63 | 46.5 | 0.30 |
QwQ 32B-Preview | Alibaba | 33k | 46 | $0.26 | 67.3 | 0.40 |
Yi-Large | 01.AI | 32k | 61 | $3.00 | 68.1 | 0.47 |
GPT-4 Turbo | OpenAI | 128k | 75 | $15.00 | 43.3 | 1.20 |
GPT-4 | OpenAI | 8k | $37.50 | 28.4 | 0.75 | |
Llama 3 70B | Meta | 8k | 48 | $0.89 | 48.9 | 0.38 |
Llama 3 8B | Meta | 8k | 45 | $0.15 | 117.3 | 0.34 |
Llama 2 Chat 70B | Meta | 4k | $1.85 | |||
Llama 2 Chat 13B | Meta | 4k | $0.00 | |||
Llama 2 Chat 7B | Meta | 4k | $0.33 | 123.7 | 0.37 | |
Gemini 1.0 Pro | 33k | $0.75 | 102.9 | 1.27 | ||
Claude 3 Sonnet | Anthropic | 200k | 57 | $6.00 | 68.2 | 0.76 |
Claude 2.1 | Anthropic | 200k | $12.00 | 14.1 | 1.24 | |
Claude 2.0 | Anthropic | 100k | $12.00 | 29.9 | 0.81 | |
Mistral Small (Feb '24) | Mistral | 33k | 59 | $1.50 | 53.5 | 0.37 |
Mistral Large (Feb '24) | Mistral | 33k | 56 | $6.00 | 38.9 | 0.44 |
Mistral 7B | Mistral | 8k | 28 | $0.16 | 112.5 | 0.26 |
Mistral Medium | Mistral | 33k | $4.09 | 44.7 | 0.37 | |
Codestral | Mistral | 33k | $0.30 | 84.9 | 0.28 | |
OpenChat 3.5 | OpenChat | 8k | 44 | $0.06 | 73.5 | 0.30 |
Jamba Instruct | AI21 Labs | 256k | $0.55 | 77.4 | 0.52 |
Terms
- Artificial Analysis Quality Index: Average result across our evaluations covering different dimensions of model intelligence. Currently includes MMLU, GPQA, Math & HumanEval. OpenAI o1 model figures are preliminary and are based on figures stated by OpenAI. See methodology for more details.
- Context window: Maximum number of combined input & output tokens. Output tokens commonly have a significantly lower limit (varied by model).
- Output Speed: Tokens per second received while the model is generating tokens (ie. after first chunk has been received from the API for models which support streaming).
- Latency: Time to first token of tokens received, in seconds, after API request sent. For models which do not support streaming, this represents time to receive the completion.
- Price: Price per token, represented as USD per million Tokens. Price is a blend of Input & Output token prices (3:1 ratio).
- Output Price: Price per token generated by the model (received from the API), represented as USD per million Tokens.
- Input Price: Price per token included in the request/message sent to the API, represented as USD per million Tokens.
- Time period: Metrics are 'live' and are based on the past 14 days of measurements, measurements are taken 8 times a day for single requests and 2 times per day for parallel requests.
References
- ↑ 1.0 1.1 LLM Leaderboard - Comparison of GPT-4o, Llama 3, Mistral, Gemini and over 30 models https://artificialanalysis.ai/leaderboards/models