Large language models ranking: Difference between revisions
(Created page with " Category:Important") |
No edit summary |
||
Line 1: | Line 1: | ||
Ranking of [[LLMs]]. | |||
{| class="wikitable" | |||
! Model | |||
! ⭐ Arena Elo rating | |||
! 📈 MT-bench (score) | |||
! MMLU | |||
! License | |||
|- | |||
| GPT-4-Turbo | |||
| 1210 | |||
| 9.32 | |||
| | |||
| Proprietary | |||
|- | |||
| GPT-4 | |||
| 1159 | |||
| 8.99 | |||
| 86.4 | |||
| Proprietary | |||
|- | |||
| Claude-1 | |||
| 1146 | |||
| 7.9 | |||
| 77 | |||
| Proprietary | |||
|- | |||
| Claude-2 | |||
| 1125 | |||
| 8.06 | |||
| 78.5 | |||
| Proprietary | |||
|- | |||
| Claude-instant-1 | |||
| 1106 | |||
| 7.85 | |||
| 73.4 | |||
| Proprietary | |||
|- | |||
| GPT-3.5-turbo | |||
| 1103 | |||
| 7.94 | |||
| 70 | |||
| Proprietary | |||
|- | |||
| WizardLM-70b-v1.0 | |||
| 1093 | |||
| 7.71 | |||
| 63.7 | |||
| Llama 2 Community | |||
|- | |||
| Vicuna-33B | |||
| 1090 | |||
| 7.12 | |||
| 59.2 | |||
| Non-commercial | |||
|- | |||
| OpenChat-3.5 | |||
| 1070 | |||
| 7.81 | |||
| 64.3 | |||
| Apache-2.0 | |||
|- | |||
| Llama-2-70b-chat | |||
| 1065 | |||
| 6.86 | |||
| 63 | |||
| Llama 2 Community | |||
|- | |||
| WizardLM-13b-v1.2 | |||
| 1047 | |||
| 7.2 | |||
| 52.7 | |||
| Llama 2 Community | |||
|- | |||
| zephyr-7b-beta | |||
| 1042 | |||
| 7.34 | |||
| 61.4 | |||
| MIT | |||
|- | |||
| MPT-30B-chat | |||
| 1031 | |||
| 6.39 | |||
| 50.4 | |||
| CC-BY-NC-SA-4.0 | |||
|- | |||
| Vicuna-13B | |||
| 1031 | |||
| 6.57 | |||
| 55.8 | |||
| Llama 2 Community | |||
|- | |||
| QWen-Chat-14B | |||
| 1030 | |||
| 6.96 | |||
| 66.5 | |||
| Qianwen LICENSE | |||
|- | |||
| falcon-180b-chat | |||
| 1024 | |||
| | |||
| 68 | |||
| Falcon-180B TII License | |||
|- | |||
| zephyr-7b-alpha | |||
| 1024 | |||
| 6.88 | |||
| | |||
| MIT | |||
|- | |||
| CodeLlama-34B-instruct | |||
| 1022 | |||
| | |||
| 53.7 | |||
| Llama 2 Community | |||
|- | |||
| Guanaco-33B | |||
| 1021 | |||
| 6.53 | |||
| 57.6 | |||
| Non-commercial | |||
|- | |||
| Llama-2-13b-chat | |||
| 1021 | |||
| 6.65 | |||
| 53.6 | |||
| Llama 2 Community | |||
|- | |||
| Mistral-7B-Instruct-v0.1 | |||
| 1008 | |||
| 6.84 | |||
| 55.4 | |||
| Apache 2.0 | |||
|- | |||
| Llama-2-7b-chat | |||
| 1001 | |||
| 6.27 | |||
| 45.8 | |||
| Llama 2 Community | |||
|- | |||
| Vicuna-7B | |||
| 994 | |||
| 6.17 | |||
| 49.8 | |||
| Llama 2 Community | |||
|- | |||
| PaLM-Chat-Bison-001 | |||
| 991 | |||
| 6.4 | |||
| | |||
| Proprietary | |||
|- | |||
| ChatGLM3-6B | |||
| 970 | |||
| | |||
| | |||
| Apache-2.0 | |||
|- | |||
| Koala-13B | |||
| 955 | |||
| 5.35 | |||
| 44.7 | |||
| Non-commercial | |||
|- | |||
| GPT4All-13B-Snoozy | |||
| 925 | |||
| 5.41 | |||
| 43 | |||
| Non-commercial | |||
|- | |||
| MPT-7B-Chat | |||
| 918 | |||
| 5.42 | |||
| 32 | |||
| CC-BY-NC-SA-4.0 | |||
|- | |||
| ChatGLM2-6B | |||
| 918 | |||
| 4.96 | |||
| 45.5 | |||
| Apache-2.0 | |||
|- | |||
| RWKV-4-Raven-14B | |||
| 915 | |||
| 3.98 | |||
| 25.6 | |||
| Apache 2.0 | |||
|- | |||
| Alpaca-13B | |||
| 893 | |||
| 4.53 | |||
| 48.1 | |||
| Non-commercial | |||
|- | |||
| OpenAssistant-Pythia-12B | |||
| 884 | |||
| 4.32 | |||
| 27 | |||
| Apache 2.0 | |||
|- | |||
| ChatGLM-6B | |||
| 871 | |||
| 4.5 | |||
| 36.1 | |||
| Non-commercial | |||
|- | |||
| FastChat-T5-3B | |||
| 863 | |||
| 3.04 | |||
| 47.7 | |||
| Apache 2.0 | |||
|- | |||
| StableLM-Tuned-Alpha-7B | |||
| 833 | |||
| 2.75 | |||
| 24.4 | |||
| CC-BY-NC-SA-4.0 | |||
|- | |||
| Dolly-V2-12B | |||
| 810 | |||
| 3.28 | |||
| 25.7 | |||
| MIT | |||
|- | |||
| LLaMA-13B | |||
| 789 | |||
| 2.61 | |||
| 47 | |||
| Non-commercial | |||
|- | |||
| WizardLM-30B | |||
| | |||
| 7.01 | |||
| 58.7 | |||
| Non-commercial | |||
|- | |||
| Vicuna-13B-16k | |||
| | |||
| 6.92 | |||
| 54.5 | |||
| Llama 2 Community | |||
|- | |||
| WizardLM-13B-v1.1 | |||
| | |||
| 6.76 | |||
| 50 | |||
| Non-commercial | |||
|- | |||
| Tulu-30B | |||
| | |||
| 6.43 | |||
| 58.1 | |||
| Non-commercial | |||
|- | |||
| Guanaco-65B | |||
| | |||
| 6.41 | |||
| 62.1 | |||
| Non-commercial | |||
|- | |||
| OpenAssistant-LLaMA-30B | |||
| | |||
| 6.41 | |||
| 56 | |||
| Non-commercial | |||
|- | |||
| WizardLM-13B-v1.0 | |||
| | |||
| 6.35 | |||
| 52.3 | |||
| Non-commercial | |||
|- | |||
| Vicuna-7B-16k | |||
| | |||
| 6.22 | |||
| 48.5 | |||
| Llama 2 Community | |||
|- | |||
| Baize-v2-13B | |||
| | |||
| 5.75 | |||
| 48.9 | |||
| Non-commercial | |||
|- | |||
| XGen-7B-8K-Inst | |||
| | |||
| 5.55 | |||
| 42.1 | |||
| Non-commercial | |||
|- | |||
| Nous-Hermes-13B | |||
| | |||
| 5.51 | |||
| 49.3 | |||
| Non-commercial | |||
|- | |||
| MPT-30B-Instruct | |||
| | |||
| 5.22 | |||
| 47.8 | |||
| CC-BY-SA 3.0 | |||
|- | |||
| Falcon-40B-Instruct | |||
| | |||
| 5.17 | |||
| 54.7 | |||
| Apache 2.0 | |||
|- | |||
| H2O-Oasst-OpenLLaMA-13B | |||
| | |||
| 4.63 | |||
| 42.8 | |||
| Apache 2.0 | |||
|} | |||
[[Category:Important]] | [[Category:Important]] |
Revision as of 11:18, 28 November 2023
Ranking of LLMs.
Model | ⭐ Arena Elo rating | 📈 MT-bench (score) | MMLU | License |
---|---|---|---|---|
GPT-4-Turbo | 1210 | 9.32 | Proprietary | |
GPT-4 | 1159 | 8.99 | 86.4 | Proprietary |
Claude-1 | 1146 | 7.9 | 77 | Proprietary |
Claude-2 | 1125 | 8.06 | 78.5 | Proprietary |
Claude-instant-1 | 1106 | 7.85 | 73.4 | Proprietary |
GPT-3.5-turbo | 1103 | 7.94 | 70 | Proprietary |
WizardLM-70b-v1.0 | 1093 | 7.71 | 63.7 | Llama 2 Community |
Vicuna-33B | 1090 | 7.12 | 59.2 | Non-commercial |
OpenChat-3.5 | 1070 | 7.81 | 64.3 | Apache-2.0 |
Llama-2-70b-chat | 1065 | 6.86 | 63 | Llama 2 Community |
WizardLM-13b-v1.2 | 1047 | 7.2 | 52.7 | Llama 2 Community |
zephyr-7b-beta | 1042 | 7.34 | 61.4 | MIT |
MPT-30B-chat | 1031 | 6.39 | 50.4 | CC-BY-NC-SA-4.0 |
Vicuna-13B | 1031 | 6.57 | 55.8 | Llama 2 Community |
QWen-Chat-14B | 1030 | 6.96 | 66.5 | Qianwen LICENSE |
falcon-180b-chat | 1024 | 68 | Falcon-180B TII License | |
zephyr-7b-alpha | 1024 | 6.88 | MIT | |
CodeLlama-34B-instruct | 1022 | 53.7 | Llama 2 Community | |
Guanaco-33B | 1021 | 6.53 | 57.6 | Non-commercial |
Llama-2-13b-chat | 1021 | 6.65 | 53.6 | Llama 2 Community |
Mistral-7B-Instruct-v0.1 | 1008 | 6.84 | 55.4 | Apache 2.0 |
Llama-2-7b-chat | 1001 | 6.27 | 45.8 | Llama 2 Community |
Vicuna-7B | 994 | 6.17 | 49.8 | Llama 2 Community |
PaLM-Chat-Bison-001 | 991 | 6.4 | Proprietary | |
ChatGLM3-6B | 970 | Apache-2.0 | ||
Koala-13B | 955 | 5.35 | 44.7 | Non-commercial |
GPT4All-13B-Snoozy | 925 | 5.41 | 43 | Non-commercial |
MPT-7B-Chat | 918 | 5.42 | 32 | CC-BY-NC-SA-4.0 |
ChatGLM2-6B | 918 | 4.96 | 45.5 | Apache-2.0 |
RWKV-4-Raven-14B | 915 | 3.98 | 25.6 | Apache 2.0 |
Alpaca-13B | 893 | 4.53 | 48.1 | Non-commercial |
OpenAssistant-Pythia-12B | 884 | 4.32 | 27 | Apache 2.0 |
ChatGLM-6B | 871 | 4.5 | 36.1 | Non-commercial |
FastChat-T5-3B | 863 | 3.04 | 47.7 | Apache 2.0 |
StableLM-Tuned-Alpha-7B | 833 | 2.75 | 24.4 | CC-BY-NC-SA-4.0 |
Dolly-V2-12B | 810 | 3.28 | 25.7 | MIT |
LLaMA-13B | 789 | 2.61 | 47 | Non-commercial |
WizardLM-30B | 7.01 | 58.7 | Non-commercial | |
Vicuna-13B-16k | 6.92 | 54.5 | Llama 2 Community | |
WizardLM-13B-v1.1 | 6.76 | 50 | Non-commercial | |
Tulu-30B | 6.43 | 58.1 | Non-commercial | |
Guanaco-65B | 6.41 | 62.1 | Non-commercial | |
OpenAssistant-LLaMA-30B | 6.41 | 56 | Non-commercial | |
WizardLM-13B-v1.0 | 6.35 | 52.3 | Non-commercial | |
Vicuna-7B-16k | 6.22 | 48.5 | Llama 2 Community | |
Baize-v2-13B | 5.75 | 48.9 | Non-commercial | |
XGen-7B-8K-Inst | 5.55 | 42.1 | Non-commercial | |
Nous-Hermes-13B | 5.51 | 49.3 | Non-commercial | |
MPT-30B-Instruct | 5.22 | 47.8 | CC-BY-SA 3.0 | |
Falcon-40B-Instruct | 5.17 | 54.7 | Apache 2.0 | |
H2O-Oasst-OpenLLaMA-13B | 4.63 | 42.8 | Apache 2.0 |