Large language models ranking
Ranking of LLMs.
Model | ⭐ Arena Elo rating | 📈 MT-bench (score) | MMLU | License |
---|---|---|---|---|
GPT-4-Turbo | 1210 | 9.32 | Proprietary | |
GPT-4 | 1159 | 8.99 | 86.4 | Proprietary |
Claude-1 | 1146 | 7.9 | 77 | Proprietary |
Claude-2 | 1125 | 8.06 | 78.5 | Proprietary |
Claude-instant-1 | 1106 | 7.85 | 73.4 | Proprietary |
GPT-3.5-turbo | 1103 | 7.94 | 70 | Proprietary |
WizardLM-70b-v1.0 | 1093 | 7.71 | 63.7 | Llama 2 Community |
Vicuna-33B | 1090 | 7.12 | 59.2 | Non-commercial |
OpenChat-3.5 | 1070 | 7.81 | 64.3 | Apache-2.0 |
Llama-2-70b-chat | 1065 | 6.86 | 63 | Llama 2 Community |
WizardLM-13b-v1.2 | 1047 | 7.2 | 52.7 | Llama 2 Community |
zephyr-7b-beta | 1042 | 7.34 | 61.4 | MIT |
MPT-30B-chat | 1031 | 6.39 | 50.4 | CC-BY-NC-SA-4.0 |
Vicuna-13B | 1031 | 6.57 | 55.8 | Llama 2 Community |
QWen-Chat-14B | 1030 | 6.96 | 66.5 | Qianwen LICENSE |
falcon-180b-chat | 1024 | 68 | Falcon-180B TII License | |
zephyr-7b-alpha | 1024 | 6.88 | MIT | |
CodeLlama-34B-instruct | 1022 | 53.7 | Llama 2 Community | |
Guanaco-33B | 1021 | 6.53 | 57.6 | Non-commercial |
Llama-2-13b-chat | 1021 | 6.65 | 53.6 | Llama 2 Community |
Mistral-7B-Instruct-v0.1 | 1008 | 6.84 | 55.4 | Apache 2.0 |
Llama-2-7b-chat | 1001 | 6.27 | 45.8 | Llama 2 Community |
Vicuna-7B | 994 | 6.17 | 49.8 | Llama 2 Community |
PaLM-Chat-Bison-001 | 991 | 6.4 | Proprietary | |
ChatGLM3-6B | 970 | Apache-2.0 | ||
Koala-13B | 955 | 5.35 | 44.7 | Non-commercial |
GPT4All-13B-Snoozy | 925 | 5.41 | 43 | Non-commercial |
MPT-7B-Chat | 918 | 5.42 | 32 | CC-BY-NC-SA-4.0 |
ChatGLM2-6B | 918 | 4.96 | 45.5 | Apache-2.0 |
RWKV-4-Raven-14B | 915 | 3.98 | 25.6 | Apache 2.0 |
Alpaca-13B | 893 | 4.53 | 48.1 | Non-commercial |
OpenAssistant-Pythia-12B | 884 | 4.32 | 27 | Apache 2.0 |
ChatGLM-6B | 871 | 4.5 | 36.1 | Non-commercial |
FastChat-T5-3B | 863 | 3.04 | 47.7 | Apache 2.0 |
StableLM-Tuned-Alpha-7B | 833 | 2.75 | 24.4 | CC-BY-NC-SA-4.0 |
Dolly-V2-12B | 810 | 3.28 | 25.7 | MIT |
LLaMA-13B | 789 | 2.61 | 47 | Non-commercial |
WizardLM-30B | 7.01 | 58.7 | Non-commercial | |
Vicuna-13B-16k | 6.92 | 54.5 | Llama 2 Community | |
WizardLM-13B-v1.1 | 6.76 | 50 | Non-commercial | |
Tulu-30B | 6.43 | 58.1 | Non-commercial | |
Guanaco-65B | 6.41 | 62.1 | Non-commercial | |
OpenAssistant-LLaMA-30B | 6.41 | 56 | Non-commercial | |
WizardLM-13B-v1.0 | 6.35 | 52.3 | Non-commercial | |
Vicuna-7B-16k | 6.22 | 48.5 | Llama 2 Community | |
Baize-v2-13B | 5.75 | 48.9 | Non-commercial | |
XGen-7B-8K-Inst | 5.55 | 42.1 | Non-commercial | |
Nous-Hermes-13B | 5.51 | 49.3 | Non-commercial | |
MPT-30B-Instruct | 5.22 | 47.8 | CC-BY-SA 3.0 | |
Falcon-40B-Instruct | 5.17 | 54.7 | Apache 2.0 | |
H2O-Oasst-OpenLLaMA-13B | 4.63 | 42.8 | Apache 2.0 |
References
- ↑ Chatbot Arena Leaderboard https://arena.lmsys.org/