Large language models ranking: Difference between revisions

From AI Wiki
(Created page with " Category:Important")
 
No edit summary
 
(2 intermediate revisions by the same user not shown)
Line 1: Line 1:
Ranking of [[LLMs]].
{| class="wikitable"
! [[Model]]
! ⭐ Arena Elo rating
! 📈 MT-bench (score)
! MMLU
! License
|-
| [[GPT-4-Turbo]]
| 1210
| 9.32
|
| Proprietary
|-
| [[GPT-4]]
| 1159
| 8.99
| 86.4
| Proprietary
|-
| [[Claude-1]]
| 1146
| 7.9
| 77
| Proprietary
|-
| [[Claude-2]]
| 1125
| 8.06
| 78.5
| Proprietary
|-
| [[Claude-instant-1]]
| 1106
| 7.85
| 73.4
| Proprietary
|-
| [[GPT-3.5-turbo]]
| 1103
| 7.94
| 70
| Proprietary
|-
| [[WizardLM-70b-v1.0]]
| 1093
| 7.71
| 63.7
| Llama 2 Community
|-
| [[Vicuna-33B]]
| 1090
| 7.12
| 59.2
| Non-commercial
|-
| [[OpenChat-3.5]]
| 1070
| 7.81
| 64.3
| Apache-2.0
|-
| [[Llama-2-70b-chat]]
| 1065
| 6.86
| 63
| Llama 2 Community
|-
| [[WizardLM-13b-v1.2]]
| 1047
| 7.2
| 52.7
| Llama 2 Community
|-
| [[zephyr-7b-beta]]
| 1042
| 7.34
| 61.4
| MIT
|-
| [[MPT-30B-chat]]
| 1031
| 6.39
| 50.4
| CC-BY-NC-SA-4.0
|-
| [[Vicuna-13B]]
| 1031
| 6.57
| 55.8
| Llama 2 Community
|-
| [[QWen-Chat-14B]]
| 1030
| 6.96
| 66.5
| Qianwen LICENSE
|-
| [[falcon-180b-chat]]
| 1024
|
| 68
| Falcon-180B TII License
|-
| [[zephyr-7b-alpha]]
| 1024
| 6.88
|
| MIT
|-
| [[CodeLlama-34B-instruct]]
| 1022
|
| 53.7
| Llama 2 Community
|-
| [[Guanaco-33B]]
| 1021
| 6.53
| 57.6
| Non-commercial
|-
| [[Llama-2-13b-chat]]
| 1021
| 6.65
| 53.6
| Llama 2 Community
|-
| [[Mistral-7B-Instruct-v0.1]]
| 1008
| 6.84
| 55.4
| Apache 2.0
|-
| [[Llama-2-7b-chat]]
| 1001
| 6.27
| 45.8
| Llama 2 Community
|-
| [[Vicuna-7B]]
| 994
| 6.17
| 49.8
| Llama 2 Community
|-
| [[PaLM-Chat-Bison-001]]
| 991
| 6.4
|
| Proprietary
|-
| [[ChatGLM3-6B]]
| 970
|
|
| Apache-2.0
|-
| [[Koala-13B]]
| 955
| 5.35
| 44.7
| Non-commercial
|-
| [[GPT4All-13B-Snoozy]]
| 925
| 5.41
| 43
| Non-commercial
|-
| [[MPT-7B-Chat]]
| 918
| 5.42
| 32
| CC-BY-NC-SA-4.0
|-
| [[ChatGLM2-6B]]
| 918
| 4.96
| 45.5
| Apache-2.0
|-
| [[RWKV-4-Raven-14B]]
| 915
| 3.98
| 25.6
| Apache 2.0
|-
| [[Alpaca-13B]]
| 893
| 4.53
| 48.1
| Non-commercial
|-
| [[OpenAssistant-Pythia-12B]]
| 884
| 4.32
| 27
| Apache 2.0
|-
| [[ChatGLM-6B]]
| 871
| 4.5
| 36.1
| Non-commercial
|-
| [[FastChat-T5-3B]]
| 863
| 3.04
| 47.7
| Apache 2.0
|-
| [[StableLM-Tuned-Alpha-7B]]
| 833
| 2.75
| 24.4
| CC-BY-NC-SA-4.0
|-
| [[Dolly-V2-12B]]
| 810
| 3.28
| 25.7
| MIT
|-
| [[LLaMA-13B]]
| 789
| 2.61
| 47
| Non-commercial
|-
| [[WizardLM-30B]]
|
| 7.01
| 58.7
| Non-commercial
|-
| [[Vicuna-13B-16k]]
|
| 6.92
| 54.5
| Llama 2 Community
|-
| [[WizardLM-13B-v1.1]]
|
| 6.76
| 50
| Non-commercial
|-
| [[Tulu-30B]]
|
| 6.43
| 58.1
| Non-commercial
|-
| [[Guanaco-65B]]
|
| 6.41
| 62.1
| Non-commercial
|-
| [[OpenAssistant-LLaMA-30B]]
|
| 6.41
| 56
| Non-commercial
|-
| [[WizardLM-13B-v1.0]]
|
| 6.35
| 52.3
| Non-commercial
|-
| [[Vicuna-7B-16k]]
|
| 6.22
| 48.5
| Llama 2 Community
|-
| [[Baize-v2-13B]]
|
| 5.75
| 48.9
| Non-commercial
|-
| [[XGen-7B-8K-Inst]]
|
| 5.55
| 42.1
| Non-commercial
|-
| [[Nous-Hermes-13B]]
|
| 5.51
| 49.3
| Non-commercial
|-
| [[MPT-30B-Instruct]]
|
| 5.22
| 47.8
| CC-BY-SA 3.0
|-
| [[Falcon-40B-Instruct]]
|
| 5.17
| 54.7
| Apache 2.0
|-
| [[H2O-Oasst-OpenLLaMA-13B]]
|
| 4.63
| 42.8
| Apache 2.0
|}
<ref name="”1”">Chatbot Arena Leaderboard https://arena.lmsys.org/</ref>
==References==
<references />






[[Category:Important]]
[[Category:Important]]

Latest revision as of 11:24, 28 November 2023

Ranking of LLMs.

Model ⭐ Arena Elo rating 📈 MT-bench (score) MMLU License
GPT-4-Turbo 1210 9.32 Proprietary
GPT-4 1159 8.99 86.4 Proprietary
Claude-1 1146 7.9 77 Proprietary
Claude-2 1125 8.06 78.5 Proprietary
Claude-instant-1 1106 7.85 73.4 Proprietary
GPT-3.5-turbo 1103 7.94 70 Proprietary
WizardLM-70b-v1.0 1093 7.71 63.7 Llama 2 Community
Vicuna-33B 1090 7.12 59.2 Non-commercial
OpenChat-3.5 1070 7.81 64.3 Apache-2.0
Llama-2-70b-chat 1065 6.86 63 Llama 2 Community
WizardLM-13b-v1.2 1047 7.2 52.7 Llama 2 Community
zephyr-7b-beta 1042 7.34 61.4 MIT
MPT-30B-chat 1031 6.39 50.4 CC-BY-NC-SA-4.0
Vicuna-13B 1031 6.57 55.8 Llama 2 Community
QWen-Chat-14B 1030 6.96 66.5 Qianwen LICENSE
falcon-180b-chat 1024 68 Falcon-180B TII License
zephyr-7b-alpha 1024 6.88 MIT
CodeLlama-34B-instruct 1022 53.7 Llama 2 Community
Guanaco-33B 1021 6.53 57.6 Non-commercial
Llama-2-13b-chat 1021 6.65 53.6 Llama 2 Community
Mistral-7B-Instruct-v0.1 1008 6.84 55.4 Apache 2.0
Llama-2-7b-chat 1001 6.27 45.8 Llama 2 Community
Vicuna-7B 994 6.17 49.8 Llama 2 Community
PaLM-Chat-Bison-001 991 6.4 Proprietary
ChatGLM3-6B 970 Apache-2.0
Koala-13B 955 5.35 44.7 Non-commercial
GPT4All-13B-Snoozy 925 5.41 43 Non-commercial
MPT-7B-Chat 918 5.42 32 CC-BY-NC-SA-4.0
ChatGLM2-6B 918 4.96 45.5 Apache-2.0
RWKV-4-Raven-14B 915 3.98 25.6 Apache 2.0
Alpaca-13B 893 4.53 48.1 Non-commercial
OpenAssistant-Pythia-12B 884 4.32 27 Apache 2.0
ChatGLM-6B 871 4.5 36.1 Non-commercial
FastChat-T5-3B 863 3.04 47.7 Apache 2.0
StableLM-Tuned-Alpha-7B 833 2.75 24.4 CC-BY-NC-SA-4.0
Dolly-V2-12B 810 3.28 25.7 MIT
LLaMA-13B 789 2.61 47 Non-commercial
WizardLM-30B 7.01 58.7 Non-commercial
Vicuna-13B-16k 6.92 54.5 Llama 2 Community
WizardLM-13B-v1.1 6.76 50 Non-commercial
Tulu-30B 6.43 58.1 Non-commercial
Guanaco-65B 6.41 62.1 Non-commercial
OpenAssistant-LLaMA-30B 6.41 56 Non-commercial
WizardLM-13B-v1.0 6.35 52.3 Non-commercial
Vicuna-7B-16k 6.22 48.5 Llama 2 Community
Baize-v2-13B 5.75 48.9 Non-commercial
XGen-7B-8K-Inst 5.55 42.1 Non-commercial
Nous-Hermes-13B 5.51 49.3 Non-commercial
MPT-30B-Instruct 5.22 47.8 CC-BY-SA 3.0
Falcon-40B-Instruct 5.17 54.7 Apache 2.0
H2O-Oasst-OpenLLaMA-13B 4.63 42.8 Apache 2.0

[1]

References

  1. Chatbot Arena Leaderboard https://arena.lmsys.org/