Large language models ranking: Difference between revisions

From AI Wiki
No edit summary
No edit summary
Line 1: Line 1:
Ranking of [[LLMs]].
Ranking of [[LLMs]].
{| class="wikitable"
{| class="wikitable"
! Model
! [[Model]]
! ⭐ Arena Elo rating
! ⭐ Arena Elo rating
! 📈 MT-bench (score)
! 📈 MT-bench (score)
Line 7: Line 8:
! License
! License
|-
|-
| GPT-4-Turbo
| [[GPT-4-Turbo]]
| 1210
| 1210
| 9.32
| 9.32
Line 13: Line 14:
| Proprietary
| Proprietary
|-
|-
| GPT-4
| [[GPT-4]]
| 1159
| 1159
| 8.99
| 8.99
Line 19: Line 20:
| Proprietary
| Proprietary
|-
|-
| Claude-1
| [[Claude-1]]
| 1146
| 1146
| 7.9
| 7.9
Line 25: Line 26:
| Proprietary
| Proprietary
|-
|-
| Claude-2
| [[Claude-2]]
| 1125
| 1125
| 8.06
| 8.06
Line 31: Line 32:
| Proprietary
| Proprietary
|-
|-
| Claude-instant-1
| [[Claude-instant-1]]
| 1106
| 1106
| 7.85
| 7.85
Line 37: Line 38:
| Proprietary
| Proprietary
|-
|-
| GPT-3.5-turbo
| [[GPT-3.5-turbo]]
| 1103
| 1103
| 7.94
| 7.94
Line 43: Line 44:
| Proprietary
| Proprietary
|-
|-
| WizardLM-70b-v1.0
| [[WizardLM-70b-v1.0]]
| 1093
| 1093
| 7.71
| 7.71
Line 49: Line 50:
| Llama 2 Community
| Llama 2 Community
|-
|-
| Vicuna-33B
| [[Vicuna-33B]]
| 1090
| 1090
| 7.12
| 7.12
Line 55: Line 56:
| Non-commercial
| Non-commercial
|-
|-
| OpenChat-3.5
| [[OpenChat-3.5]]
| 1070
| 1070
| 7.81
| 7.81
Line 61: Line 62:
| Apache-2.0
| Apache-2.0
|-
|-
| Llama-2-70b-chat
| [[Llama-2-70b-chat]]
| 1065
| 1065
| 6.86
| 6.86
Line 67: Line 68:
| Llama 2 Community
| Llama 2 Community
|-
|-
| WizardLM-13b-v1.2
| [[WizardLM-13b-v1.2]]
| 1047
| 1047
| 7.2
| 7.2
Line 73: Line 74:
| Llama 2 Community
| Llama 2 Community
|-
|-
| zephyr-7b-beta
| [[zephyr-7b-beta]]
| 1042
| 1042
| 7.34
| 7.34
Line 79: Line 80:
| MIT
| MIT
|-
|-
| MPT-30B-chat
| [[MPT-30B-chat]]
| 1031
| 1031
| 6.39
| 6.39
Line 85: Line 86:
| CC-BY-NC-SA-4.0
| CC-BY-NC-SA-4.0
|-
|-
| Vicuna-13B
| [[Vicuna-13B]]
| 1031
| 1031
| 6.57
| 6.57
Line 91: Line 92:
| Llama 2 Community
| Llama 2 Community
|-
|-
| QWen-Chat-14B
| [[QWen-Chat-14B]]
| 1030
| 1030
| 6.96
| 6.96
Line 97: Line 98:
| Qianwen LICENSE
| Qianwen LICENSE
|-
|-
| falcon-180b-chat
| [[falcon-180b-chat]]
| 1024
| 1024
|  
|  
Line 103: Line 104:
| Falcon-180B TII License
| Falcon-180B TII License
|-
|-
| zephyr-7b-alpha
| [[zephyr-7b-alpha]]
| 1024
| 1024
| 6.88
| 6.88
Line 109: Line 110:
| MIT
| MIT
|-
|-
| CodeLlama-34B-instruct
| [[CodeLlama-34B-instruct]]
| 1022
| 1022
|  
|  
Line 115: Line 116:
| Llama 2 Community
| Llama 2 Community
|-
|-
| Guanaco-33B
| [[Guanaco-33B]]
| 1021
| 1021
| 6.53
| 6.53
Line 121: Line 122:
| Non-commercial
| Non-commercial
|-
|-
| Llama-2-13b-chat
| [[Llama-2-13b-chat]]
| 1021
| 1021
| 6.65
| 6.65
Line 127: Line 128:
| Llama 2 Community
| Llama 2 Community
|-
|-
| Mistral-7B-Instruct-v0.1
| [[Mistral-7B-Instruct-v0.1]]
| 1008
| 1008
| 6.84
| 6.84
Line 133: Line 134:
| Apache 2.0
| Apache 2.0
|-
|-
| Llama-2-7b-chat
| [[Llama-2-7b-chat]]
| 1001
| 1001
| 6.27
| 6.27
Line 139: Line 140:
| Llama 2 Community
| Llama 2 Community
|-
|-
| Vicuna-7B
| [[Vicuna-7B]]
| 994
| 994
| 6.17
| 6.17
Line 145: Line 146:
| Llama 2 Community
| Llama 2 Community
|-
|-
| PaLM-Chat-Bison-001
| [[PaLM-Chat-Bison-001]]
| 991
| 991
| 6.4
| 6.4
Line 151: Line 152:
| Proprietary
| Proprietary
|-
|-
| ChatGLM3-6B
| [[ChatGLM3-6B]]
| 970
| 970
|  
|  
Line 157: Line 158:
| Apache-2.0
| Apache-2.0
|-
|-
| Koala-13B
| [[Koala-13B]]
| 955
| 955
| 5.35
| 5.35
Line 163: Line 164:
| Non-commercial
| Non-commercial
|-
|-
| GPT4All-13B-Snoozy
| [[GPT4All-13B-Snoozy]]
| 925
| 925
| 5.41
| 5.41
Line 169: Line 170:
| Non-commercial
| Non-commercial
|-
|-
| MPT-7B-Chat
| [[MPT-7B-Chat]]
| 918
| 918
| 5.42
| 5.42
Line 175: Line 176:
| CC-BY-NC-SA-4.0
| CC-BY-NC-SA-4.0
|-
|-
| ChatGLM2-6B
| [[ChatGLM2-6B]]
| 918
| 918
| 4.96
| 4.96
Line 181: Line 182:
| Apache-2.0
| Apache-2.0
|-
|-
| RWKV-4-Raven-14B
| [[RWKV-4-Raven-14B]]
| 915
| 915
| 3.98
| 3.98
Line 187: Line 188:
| Apache 2.0
| Apache 2.0
|-
|-
| Alpaca-13B
| [[Alpaca-13B]]
| 893
| 893
| 4.53
| 4.53
Line 193: Line 194:
| Non-commercial
| Non-commercial
|-
|-
| OpenAssistant-Pythia-12B
| [[OpenAssistant-Pythia-12B]]
| 884
| 884
| 4.32
| 4.32
Line 199: Line 200:
| Apache 2.0
| Apache 2.0
|-
|-
| ChatGLM-6B
| [[ChatGLM-6B]]
| 871
| 871
| 4.5
| 4.5
Line 205: Line 206:
| Non-commercial
| Non-commercial
|-
|-
| FastChat-T5-3B
| [[FastChat-T5-3B]]
| 863
| 863
| 3.04
| 3.04
Line 211: Line 212:
| Apache 2.0
| Apache 2.0
|-
|-
| StableLM-Tuned-Alpha-7B
| [[StableLM-Tuned-Alpha-7B]]
| 833
| 833
| 2.75
| 2.75
Line 217: Line 218:
| CC-BY-NC-SA-4.0
| CC-BY-NC-SA-4.0
|-
|-
| Dolly-V2-12B
| [[Dolly-V2-12B]]
| 810
| 810
| 3.28
| 3.28
Line 223: Line 224:
| MIT
| MIT
|-
|-
| LLaMA-13B
| [[LLaMA-13B]]
| 789
| 789
| 2.61
| 2.61
Line 229: Line 230:
| Non-commercial
| Non-commercial
|-
|-
| WizardLM-30B
| [[WizardLM-30B]]
|  
|  
| 7.01
| 7.01
Line 235: Line 236:
| Non-commercial
| Non-commercial
|-
|-
| Vicuna-13B-16k
| [[Vicuna-13B-16k]]
|  
|  
| 6.92
| 6.92
Line 241: Line 242:
| Llama 2 Community
| Llama 2 Community
|-
|-
| WizardLM-13B-v1.1
| [[WizardLM-13B-v1.1]]
|  
|  
| 6.76
| 6.76
Line 247: Line 248:
| Non-commercial
| Non-commercial
|-
|-
| Tulu-30B
| [[Tulu-30B]]
|  
|  
| 6.43
| 6.43
Line 253: Line 254:
| Non-commercial
| Non-commercial
|-
|-
| Guanaco-65B
| [[Guanaco-65B]]
|  
|  
| 6.41
| 6.41
Line 259: Line 260:
| Non-commercial
| Non-commercial
|-
|-
| OpenAssistant-LLaMA-30B
| [[OpenAssistant-LLaMA-30B]]
|  
|  
| 6.41
| 6.41
Line 265: Line 266:
| Non-commercial
| Non-commercial
|-
|-
| WizardLM-13B-v1.0
| [[WizardLM-13B-v1.0]]
|  
|  
| 6.35
| 6.35
Line 271: Line 272:
| Non-commercial
| Non-commercial
|-
|-
| Vicuna-7B-16k
| [[Vicuna-7B-16k]]
|  
|  
| 6.22
| 6.22
Line 277: Line 278:
| Llama 2 Community
| Llama 2 Community
|-
|-
| Baize-v2-13B
| [[Baize-v2-13B]]
|  
|  
| 5.75
| 5.75
Line 283: Line 284:
| Non-commercial
| Non-commercial
|-
|-
| XGen-7B-8K-Inst
| [[XGen-7B-8K-Inst]]
|  
|  
| 5.55
| 5.55
Line 289: Line 290:
| Non-commercial
| Non-commercial
|-
|-
| Nous-Hermes-13B
| [[Nous-Hermes-13B]]
|  
|  
| 5.51
| 5.51
Line 295: Line 296:
| Non-commercial
| Non-commercial
|-
|-
| MPT-30B-Instruct
| [[MPT-30B-Instruct]]
|  
|  
| 5.22
| 5.22
Line 301: Line 302:
| CC-BY-SA 3.0
| CC-BY-SA 3.0
|-
|-
| Falcon-40B-Instruct
| [[Falcon-40B-Instruct]]
|  
|  
| 5.17
| 5.17
Line 307: Line 308:
| Apache 2.0
| Apache 2.0
|-
|-
| H2O-Oasst-OpenLLaMA-13B
| [[H2O-Oasst-OpenLLaMA-13B]]
|  
|  
| 4.63
| 4.63

Revision as of 11:23, 28 November 2023

Ranking of LLMs.

Model ⭐ Arena Elo rating 📈 MT-bench (score) MMLU License
GPT-4-Turbo 1210 9.32 Proprietary
GPT-4 1159 8.99 86.4 Proprietary
Claude-1 1146 7.9 77 Proprietary
Claude-2 1125 8.06 78.5 Proprietary
Claude-instant-1 1106 7.85 73.4 Proprietary
GPT-3.5-turbo 1103 7.94 70 Proprietary
WizardLM-70b-v1.0 1093 7.71 63.7 Llama 2 Community
Vicuna-33B 1090 7.12 59.2 Non-commercial
OpenChat-3.5 1070 7.81 64.3 Apache-2.0
Llama-2-70b-chat 1065 6.86 63 Llama 2 Community
WizardLM-13b-v1.2 1047 7.2 52.7 Llama 2 Community
zephyr-7b-beta 1042 7.34 61.4 MIT
MPT-30B-chat 1031 6.39 50.4 CC-BY-NC-SA-4.0
Vicuna-13B 1031 6.57 55.8 Llama 2 Community
QWen-Chat-14B 1030 6.96 66.5 Qianwen LICENSE
falcon-180b-chat 1024 68 Falcon-180B TII License
zephyr-7b-alpha 1024 6.88 MIT
CodeLlama-34B-instruct 1022 53.7 Llama 2 Community
Guanaco-33B 1021 6.53 57.6 Non-commercial
Llama-2-13b-chat 1021 6.65 53.6 Llama 2 Community
Mistral-7B-Instruct-v0.1 1008 6.84 55.4 Apache 2.0
Llama-2-7b-chat 1001 6.27 45.8 Llama 2 Community
Vicuna-7B 994 6.17 49.8 Llama 2 Community
PaLM-Chat-Bison-001 991 6.4 Proprietary
ChatGLM3-6B 970 Apache-2.0
Koala-13B 955 5.35 44.7 Non-commercial
GPT4All-13B-Snoozy 925 5.41 43 Non-commercial
MPT-7B-Chat 918 5.42 32 CC-BY-NC-SA-4.0
ChatGLM2-6B 918 4.96 45.5 Apache-2.0
RWKV-4-Raven-14B 915 3.98 25.6 Apache 2.0
Alpaca-13B 893 4.53 48.1 Non-commercial
OpenAssistant-Pythia-12B 884 4.32 27 Apache 2.0
ChatGLM-6B 871 4.5 36.1 Non-commercial
FastChat-T5-3B 863 3.04 47.7 Apache 2.0
StableLM-Tuned-Alpha-7B 833 2.75 24.4 CC-BY-NC-SA-4.0
Dolly-V2-12B 810 3.28 25.7 MIT
LLaMA-13B 789 2.61 47 Non-commercial
WizardLM-30B 7.01 58.7 Non-commercial
Vicuna-13B-16k 6.92 54.5 Llama 2 Community
WizardLM-13B-v1.1 6.76 50 Non-commercial
Tulu-30B 6.43 58.1 Non-commercial
Guanaco-65B 6.41 62.1 Non-commercial
OpenAssistant-LLaMA-30B 6.41 56 Non-commercial
WizardLM-13B-v1.0 6.35 52.3 Non-commercial
Vicuna-7B-16k 6.22 48.5 Llama 2 Community
Baize-v2-13B 5.75 48.9 Non-commercial
XGen-7B-8K-Inst 5.55 42.1 Non-commercial
Nous-Hermes-13B 5.51 49.3 Non-commercial
MPT-30B-Instruct 5.22 47.8 CC-BY-SA 3.0
Falcon-40B-Instruct 5.17 54.7 Apache 2.0
H2O-Oasst-OpenLLaMA-13B 4.63 42.8 Apache 2.0