Large language models ranking: Difference between revisions

From AI Wiki
(Created page with " Category:Important")
 
No edit summary
Line 1: Line 1:
 
Ranking of [[LLMs]].
{| class="wikitable"
! Model
! ⭐ Arena Elo rating
! 📈 MT-bench (score)
! MMLU
! License
|-
| GPT-4-Turbo
| 1210
| 9.32
|
| Proprietary
|-
| GPT-4
| 1159
| 8.99
| 86.4
| Proprietary
|-
| Claude-1
| 1146
| 7.9
| 77
| Proprietary
|-
| Claude-2
| 1125
| 8.06
| 78.5
| Proprietary
|-
| Claude-instant-1
| 1106
| 7.85
| 73.4
| Proprietary
|-
| GPT-3.5-turbo
| 1103
| 7.94
| 70
| Proprietary
|-
| WizardLM-70b-v1.0
| 1093
| 7.71
| 63.7
| Llama 2 Community
|-
| Vicuna-33B
| 1090
| 7.12
| 59.2
| Non-commercial
|-
| OpenChat-3.5
| 1070
| 7.81
| 64.3
| Apache-2.0
|-
| Llama-2-70b-chat
| 1065
| 6.86
| 63
| Llama 2 Community
|-
| WizardLM-13b-v1.2
| 1047
| 7.2
| 52.7
| Llama 2 Community
|-
| zephyr-7b-beta
| 1042
| 7.34
| 61.4
| MIT
|-
| MPT-30B-chat
| 1031
| 6.39
| 50.4
| CC-BY-NC-SA-4.0
|-
| Vicuna-13B
| 1031
| 6.57
| 55.8
| Llama 2 Community
|-
| QWen-Chat-14B
| 1030
| 6.96
| 66.5
| Qianwen LICENSE
|-
| falcon-180b-chat
| 1024
|
| 68
| Falcon-180B TII License
|-
| zephyr-7b-alpha
| 1024
| 6.88
|
| MIT
|-
| CodeLlama-34B-instruct
| 1022
|
| 53.7
| Llama 2 Community
|-
| Guanaco-33B
| 1021
| 6.53
| 57.6
| Non-commercial
|-
| Llama-2-13b-chat
| 1021
| 6.65
| 53.6
| Llama 2 Community
|-
| Mistral-7B-Instruct-v0.1
| 1008
| 6.84
| 55.4
| Apache 2.0
|-
| Llama-2-7b-chat
| 1001
| 6.27
| 45.8
| Llama 2 Community
|-
| Vicuna-7B
| 994
| 6.17
| 49.8
| Llama 2 Community
|-
| PaLM-Chat-Bison-001
| 991
| 6.4
|
| Proprietary
|-
| ChatGLM3-6B
| 970
|
|
| Apache-2.0
|-
| Koala-13B
| 955
| 5.35
| 44.7
| Non-commercial
|-
| GPT4All-13B-Snoozy
| 925
| 5.41
| 43
| Non-commercial
|-
| MPT-7B-Chat
| 918
| 5.42
| 32
| CC-BY-NC-SA-4.0
|-
| ChatGLM2-6B
| 918
| 4.96
| 45.5
| Apache-2.0
|-
| RWKV-4-Raven-14B
| 915
| 3.98
| 25.6
| Apache 2.0
|-
| Alpaca-13B
| 893
| 4.53
| 48.1
| Non-commercial
|-
| OpenAssistant-Pythia-12B
| 884
| 4.32
| 27
| Apache 2.0
|-
| ChatGLM-6B
| 871
| 4.5
| 36.1
| Non-commercial
|-
| FastChat-T5-3B
| 863
| 3.04
| 47.7
| Apache 2.0
|-
| StableLM-Tuned-Alpha-7B
| 833
| 2.75
| 24.4
| CC-BY-NC-SA-4.0
|-
| Dolly-V2-12B
| 810
| 3.28
| 25.7
| MIT
|-
| LLaMA-13B
| 789
| 2.61
| 47
| Non-commercial
|-
| WizardLM-30B
|
| 7.01
| 58.7
| Non-commercial
|-
| Vicuna-13B-16k
|
| 6.92
| 54.5
| Llama 2 Community
|-
| WizardLM-13B-v1.1
|
| 6.76
| 50
| Non-commercial
|-
| Tulu-30B
|
| 6.43
| 58.1
| Non-commercial
|-
| Guanaco-65B
|
| 6.41
| 62.1
| Non-commercial
|-
| OpenAssistant-LLaMA-30B
|
| 6.41
| 56
| Non-commercial
|-
| WizardLM-13B-v1.0
|
| 6.35
| 52.3
| Non-commercial
|-
| Vicuna-7B-16k
|
| 6.22
| 48.5
| Llama 2 Community
|-
| Baize-v2-13B
|
| 5.75
| 48.9
| Non-commercial
|-
| XGen-7B-8K-Inst
|
| 5.55
| 42.1
| Non-commercial
|-
| Nous-Hermes-13B
|
| 5.51
| 49.3
| Non-commercial
|-
| MPT-30B-Instruct
|
| 5.22
| 47.8
| CC-BY-SA 3.0
|-
| Falcon-40B-Instruct
|
| 5.17
| 54.7
| Apache 2.0
|-
| H2O-Oasst-OpenLLaMA-13B
|
| 4.63
| 42.8
| Apache 2.0
|}




[[Category:Important]]
[[Category:Important]]

Revision as of 11:18, 28 November 2023

Ranking of LLMs.

Model ⭐ Arena Elo rating 📈 MT-bench (score) MMLU License
GPT-4-Turbo 1210 9.32 Proprietary
GPT-4 1159 8.99 86.4 Proprietary
Claude-1 1146 7.9 77 Proprietary
Claude-2 1125 8.06 78.5 Proprietary
Claude-instant-1 1106 7.85 73.4 Proprietary
GPT-3.5-turbo 1103 7.94 70 Proprietary
WizardLM-70b-v1.0 1093 7.71 63.7 Llama 2 Community
Vicuna-33B 1090 7.12 59.2 Non-commercial
OpenChat-3.5 1070 7.81 64.3 Apache-2.0
Llama-2-70b-chat 1065 6.86 63 Llama 2 Community
WizardLM-13b-v1.2 1047 7.2 52.7 Llama 2 Community
zephyr-7b-beta 1042 7.34 61.4 MIT
MPT-30B-chat 1031 6.39 50.4 CC-BY-NC-SA-4.0
Vicuna-13B 1031 6.57 55.8 Llama 2 Community
QWen-Chat-14B 1030 6.96 66.5 Qianwen LICENSE
falcon-180b-chat 1024 68 Falcon-180B TII License
zephyr-7b-alpha 1024 6.88 MIT
CodeLlama-34B-instruct 1022 53.7 Llama 2 Community
Guanaco-33B 1021 6.53 57.6 Non-commercial
Llama-2-13b-chat 1021 6.65 53.6 Llama 2 Community
Mistral-7B-Instruct-v0.1 1008 6.84 55.4 Apache 2.0
Llama-2-7b-chat 1001 6.27 45.8 Llama 2 Community
Vicuna-7B 994 6.17 49.8 Llama 2 Community
PaLM-Chat-Bison-001 991 6.4 Proprietary
ChatGLM3-6B 970 Apache-2.0
Koala-13B 955 5.35 44.7 Non-commercial
GPT4All-13B-Snoozy 925 5.41 43 Non-commercial
MPT-7B-Chat 918 5.42 32 CC-BY-NC-SA-4.0
ChatGLM2-6B 918 4.96 45.5 Apache-2.0
RWKV-4-Raven-14B 915 3.98 25.6 Apache 2.0
Alpaca-13B 893 4.53 48.1 Non-commercial
OpenAssistant-Pythia-12B 884 4.32 27 Apache 2.0
ChatGLM-6B 871 4.5 36.1 Non-commercial
FastChat-T5-3B 863 3.04 47.7 Apache 2.0
StableLM-Tuned-Alpha-7B 833 2.75 24.4 CC-BY-NC-SA-4.0
Dolly-V2-12B 810 3.28 25.7 MIT
LLaMA-13B 789 2.61 47 Non-commercial
WizardLM-30B 7.01 58.7 Non-commercial
Vicuna-13B-16k 6.92 54.5 Llama 2 Community
WizardLM-13B-v1.1 6.76 50 Non-commercial
Tulu-30B 6.43 58.1 Non-commercial
Guanaco-65B 6.41 62.1 Non-commercial
OpenAssistant-LLaMA-30B 6.41 56 Non-commercial
WizardLM-13B-v1.0 6.35 52.3 Non-commercial
Vicuna-7B-16k 6.22 48.5 Llama 2 Community
Baize-v2-13B 5.75 48.9 Non-commercial
XGen-7B-8K-Inst 5.55 42.1 Non-commercial
Nous-Hermes-13B 5.51 49.3 Non-commercial
MPT-30B-Instruct 5.22 47.8 CC-BY-SA 3.0
Falcon-40B-Instruct 5.17 54.7 Apache 2.0
H2O-Oasst-OpenLLaMA-13B 4.63 42.8 Apache 2.0