Large language models ranking

From AI Wiki

Ranking of LLMs.

Model ⭐ Arena Elo rating 📈 MT-bench (score) MMLU License
GPT-4-Turbo 1210 9.32 Proprietary
GPT-4 1159 8.99 86.4 Proprietary
Claude-1 1146 7.9 77 Proprietary
Claude-2 1125 8.06 78.5 Proprietary
Claude-instant-1 1106 7.85 73.4 Proprietary
GPT-3.5-turbo 1103 7.94 70 Proprietary
WizardLM-70b-v1.0 1093 7.71 63.7 Llama 2 Community
Vicuna-33B 1090 7.12 59.2 Non-commercial
OpenChat-3.5 1070 7.81 64.3 Apache-2.0
Llama-2-70b-chat 1065 6.86 63 Llama 2 Community
WizardLM-13b-v1.2 1047 7.2 52.7 Llama 2 Community
zephyr-7b-beta 1042 7.34 61.4 MIT
MPT-30B-chat 1031 6.39 50.4 CC-BY-NC-SA-4.0
Vicuna-13B 1031 6.57 55.8 Llama 2 Community
QWen-Chat-14B 1030 6.96 66.5 Qianwen LICENSE
falcon-180b-chat 1024 68 Falcon-180B TII License
zephyr-7b-alpha 1024 6.88 MIT
CodeLlama-34B-instruct 1022 53.7 Llama 2 Community
Guanaco-33B 1021 6.53 57.6 Non-commercial
Llama-2-13b-chat 1021 6.65 53.6 Llama 2 Community
Mistral-7B-Instruct-v0.1 1008 6.84 55.4 Apache 2.0
Llama-2-7b-chat 1001 6.27 45.8 Llama 2 Community
Vicuna-7B 994 6.17 49.8 Llama 2 Community
PaLM-Chat-Bison-001 991 6.4 Proprietary
ChatGLM3-6B 970 Apache-2.0
Koala-13B 955 5.35 44.7 Non-commercial
GPT4All-13B-Snoozy 925 5.41 43 Non-commercial
MPT-7B-Chat 918 5.42 32 CC-BY-NC-SA-4.0
ChatGLM2-6B 918 4.96 45.5 Apache-2.0
RWKV-4-Raven-14B 915 3.98 25.6 Apache 2.0
Alpaca-13B 893 4.53 48.1 Non-commercial
OpenAssistant-Pythia-12B 884 4.32 27 Apache 2.0
ChatGLM-6B 871 4.5 36.1 Non-commercial
FastChat-T5-3B 863 3.04 47.7 Apache 2.0
StableLM-Tuned-Alpha-7B 833 2.75 24.4 CC-BY-NC-SA-4.0
Dolly-V2-12B 810 3.28 25.7 MIT
LLaMA-13B 789 2.61 47 Non-commercial
WizardLM-30B 7.01 58.7 Non-commercial
Vicuna-13B-16k 6.92 54.5 Llama 2 Community
WizardLM-13B-v1.1 6.76 50 Non-commercial
Tulu-30B 6.43 58.1 Non-commercial
Guanaco-65B 6.41 62.1 Non-commercial
OpenAssistant-LLaMA-30B 6.41 56 Non-commercial
WizardLM-13B-v1.0 6.35 52.3 Non-commercial
Vicuna-7B-16k 6.22 48.5 Llama 2 Community
Baize-v2-13B 5.75 48.9 Non-commercial
XGen-7B-8K-Inst 5.55 42.1 Non-commercial
Nous-Hermes-13B 5.51 49.3 Non-commercial
MPT-30B-Instruct 5.22 47.8 CC-BY-SA 3.0
Falcon-40B-Instruct 5.17 54.7 Apache 2.0
H2O-Oasst-OpenLLaMA-13B 4.63 42.8 Apache 2.0

[1]

References

  1. Chatbot Arena Leaderboard https://arena.lmsys.org/