Large language models ranking: Difference between revisions
No edit summary |
No edit summary |
||
(One intermediate revision by the same user not shown) | |||
Line 1: | Line 1: | ||
Ranking of [[LLMs]]. | Ranking of [[LLMs]]. | ||
{| class="wikitable" | {| class="wikitable" | ||
! Model | ! [[Model]] | ||
! ⭐ Arena Elo rating | ! ⭐ Arena Elo rating | ||
! 📈 MT-bench (score) | ! 📈 MT-bench (score) | ||
Line 7: | Line 8: | ||
! License | ! License | ||
|- | |- | ||
| GPT-4-Turbo | | [[GPT-4-Turbo]] | ||
| 1210 | | 1210 | ||
| 9.32 | | 9.32 | ||
Line 13: | Line 14: | ||
| Proprietary | | Proprietary | ||
|- | |- | ||
| GPT-4 | | [[GPT-4]] | ||
| 1159 | | 1159 | ||
| 8.99 | | 8.99 | ||
Line 19: | Line 20: | ||
| Proprietary | | Proprietary | ||
|- | |- | ||
| Claude-1 | | [[Claude-1]] | ||
| 1146 | | 1146 | ||
| 7.9 | | 7.9 | ||
Line 25: | Line 26: | ||
| Proprietary | | Proprietary | ||
|- | |- | ||
| Claude-2 | | [[Claude-2]] | ||
| 1125 | | 1125 | ||
| 8.06 | | 8.06 | ||
Line 31: | Line 32: | ||
| Proprietary | | Proprietary | ||
|- | |- | ||
| Claude-instant-1 | | [[Claude-instant-1]] | ||
| 1106 | | 1106 | ||
| 7.85 | | 7.85 | ||
Line 37: | Line 38: | ||
| Proprietary | | Proprietary | ||
|- | |- | ||
| GPT-3.5-turbo | | [[GPT-3.5-turbo]] | ||
| 1103 | | 1103 | ||
| 7.94 | | 7.94 | ||
Line 43: | Line 44: | ||
| Proprietary | | Proprietary | ||
|- | |- | ||
| WizardLM-70b-v1.0 | | [[WizardLM-70b-v1.0]] | ||
| 1093 | | 1093 | ||
| 7.71 | | 7.71 | ||
Line 49: | Line 50: | ||
| Llama 2 Community | | Llama 2 Community | ||
|- | |- | ||
| Vicuna-33B | | [[Vicuna-33B]] | ||
| 1090 | | 1090 | ||
| 7.12 | | 7.12 | ||
Line 55: | Line 56: | ||
| Non-commercial | | Non-commercial | ||
|- | |- | ||
| OpenChat-3.5 | | [[OpenChat-3.5]] | ||
| 1070 | | 1070 | ||
| 7.81 | | 7.81 | ||
Line 61: | Line 62: | ||
| Apache-2.0 | | Apache-2.0 | ||
|- | |- | ||
| Llama-2-70b-chat | | [[Llama-2-70b-chat]] | ||
| 1065 | | 1065 | ||
| 6.86 | | 6.86 | ||
Line 67: | Line 68: | ||
| Llama 2 Community | | Llama 2 Community | ||
|- | |- | ||
| WizardLM-13b-v1.2 | | [[WizardLM-13b-v1.2]] | ||
| 1047 | | 1047 | ||
| 7.2 | | 7.2 | ||
Line 73: | Line 74: | ||
| Llama 2 Community | | Llama 2 Community | ||
|- | |- | ||
| zephyr-7b-beta | | [[zephyr-7b-beta]] | ||
| 1042 | | 1042 | ||
| 7.34 | | 7.34 | ||
Line 79: | Line 80: | ||
| MIT | | MIT | ||
|- | |- | ||
| MPT-30B-chat | | [[MPT-30B-chat]] | ||
| 1031 | | 1031 | ||
| 6.39 | | 6.39 | ||
Line 85: | Line 86: | ||
| CC-BY-NC-SA-4.0 | | CC-BY-NC-SA-4.0 | ||
|- | |- | ||
| Vicuna-13B | | [[Vicuna-13B]] | ||
| 1031 | | 1031 | ||
| 6.57 | | 6.57 | ||
Line 91: | Line 92: | ||
| Llama 2 Community | | Llama 2 Community | ||
|- | |- | ||
| QWen-Chat-14B | | [[QWen-Chat-14B]] | ||
| 1030 | | 1030 | ||
| 6.96 | | 6.96 | ||
Line 97: | Line 98: | ||
| Qianwen LICENSE | | Qianwen LICENSE | ||
|- | |- | ||
| falcon-180b-chat | | [[falcon-180b-chat]] | ||
| 1024 | | 1024 | ||
| | | | ||
Line 103: | Line 104: | ||
| Falcon-180B TII License | | Falcon-180B TII License | ||
|- | |- | ||
| zephyr-7b-alpha | | [[zephyr-7b-alpha]] | ||
| 1024 | | 1024 | ||
| 6.88 | | 6.88 | ||
Line 109: | Line 110: | ||
| MIT | | MIT | ||
|- | |- | ||
| CodeLlama-34B-instruct | | [[CodeLlama-34B-instruct]] | ||
| 1022 | | 1022 | ||
| | | | ||
Line 115: | Line 116: | ||
| Llama 2 Community | | Llama 2 Community | ||
|- | |- | ||
| Guanaco-33B | | [[Guanaco-33B]] | ||
| 1021 | | 1021 | ||
| 6.53 | | 6.53 | ||
Line 121: | Line 122: | ||
| Non-commercial | | Non-commercial | ||
|- | |- | ||
| Llama-2-13b-chat | | [[Llama-2-13b-chat]] | ||
| 1021 | | 1021 | ||
| 6.65 | | 6.65 | ||
Line 127: | Line 128: | ||
| Llama 2 Community | | Llama 2 Community | ||
|- | |- | ||
| Mistral-7B-Instruct-v0.1 | | [[Mistral-7B-Instruct-v0.1]] | ||
| 1008 | | 1008 | ||
| 6.84 | | 6.84 | ||
Line 133: | Line 134: | ||
| Apache 2.0 | | Apache 2.0 | ||
|- | |- | ||
| Llama-2-7b-chat | | [[Llama-2-7b-chat]] | ||
| 1001 | | 1001 | ||
| 6.27 | | 6.27 | ||
Line 139: | Line 140: | ||
| Llama 2 Community | | Llama 2 Community | ||
|- | |- | ||
| Vicuna-7B | | [[Vicuna-7B]] | ||
| 994 | | 994 | ||
| 6.17 | | 6.17 | ||
Line 145: | Line 146: | ||
| Llama 2 Community | | Llama 2 Community | ||
|- | |- | ||
| PaLM-Chat-Bison-001 | | [[PaLM-Chat-Bison-001]] | ||
| 991 | | 991 | ||
| 6.4 | | 6.4 | ||
Line 151: | Line 152: | ||
| Proprietary | | Proprietary | ||
|- | |- | ||
| ChatGLM3-6B | | [[ChatGLM3-6B]] | ||
| 970 | | 970 | ||
| | | | ||
Line 157: | Line 158: | ||
| Apache-2.0 | | Apache-2.0 | ||
|- | |- | ||
| Koala-13B | | [[Koala-13B]] | ||
| 955 | | 955 | ||
| 5.35 | | 5.35 | ||
Line 163: | Line 164: | ||
| Non-commercial | | Non-commercial | ||
|- | |- | ||
| GPT4All-13B-Snoozy | | [[GPT4All-13B-Snoozy]] | ||
| 925 | | 925 | ||
| 5.41 | | 5.41 | ||
Line 169: | Line 170: | ||
| Non-commercial | | Non-commercial | ||
|- | |- | ||
| MPT-7B-Chat | | [[MPT-7B-Chat]] | ||
| 918 | | 918 | ||
| 5.42 | | 5.42 | ||
Line 175: | Line 176: | ||
| CC-BY-NC-SA-4.0 | | CC-BY-NC-SA-4.0 | ||
|- | |- | ||
| ChatGLM2-6B | | [[ChatGLM2-6B]] | ||
| 918 | | 918 | ||
| 4.96 | | 4.96 | ||
Line 181: | Line 182: | ||
| Apache-2.0 | | Apache-2.0 | ||
|- | |- | ||
| RWKV-4-Raven-14B | | [[RWKV-4-Raven-14B]] | ||
| 915 | | 915 | ||
| 3.98 | | 3.98 | ||
Line 187: | Line 188: | ||
| Apache 2.0 | | Apache 2.0 | ||
|- | |- | ||
| Alpaca-13B | | [[Alpaca-13B]] | ||
| 893 | | 893 | ||
| 4.53 | | 4.53 | ||
Line 193: | Line 194: | ||
| Non-commercial | | Non-commercial | ||
|- | |- | ||
| OpenAssistant-Pythia-12B | | [[OpenAssistant-Pythia-12B]] | ||
| 884 | | 884 | ||
| 4.32 | | 4.32 | ||
Line 199: | Line 200: | ||
| Apache 2.0 | | Apache 2.0 | ||
|- | |- | ||
| ChatGLM-6B | | [[ChatGLM-6B]] | ||
| 871 | | 871 | ||
| 4.5 | | 4.5 | ||
Line 205: | Line 206: | ||
| Non-commercial | | Non-commercial | ||
|- | |- | ||
| FastChat-T5-3B | | [[FastChat-T5-3B]] | ||
| 863 | | 863 | ||
| 3.04 | | 3.04 | ||
Line 211: | Line 212: | ||
| Apache 2.0 | | Apache 2.0 | ||
|- | |- | ||
| StableLM-Tuned-Alpha-7B | | [[StableLM-Tuned-Alpha-7B]] | ||
| 833 | | 833 | ||
| 2.75 | | 2.75 | ||
Line 217: | Line 218: | ||
| CC-BY-NC-SA-4.0 | | CC-BY-NC-SA-4.0 | ||
|- | |- | ||
| Dolly-V2-12B | | [[Dolly-V2-12B]] | ||
| 810 | | 810 | ||
| 3.28 | | 3.28 | ||
Line 223: | Line 224: | ||
| MIT | | MIT | ||
|- | |- | ||
| LLaMA-13B | | [[LLaMA-13B]] | ||
| 789 | | 789 | ||
| 2.61 | | 2.61 | ||
Line 229: | Line 230: | ||
| Non-commercial | | Non-commercial | ||
|- | |- | ||
| WizardLM-30B | | [[WizardLM-30B]] | ||
| | | | ||
| 7.01 | | 7.01 | ||
Line 235: | Line 236: | ||
| Non-commercial | | Non-commercial | ||
|- | |- | ||
| Vicuna-13B-16k | | [[Vicuna-13B-16k]] | ||
| | | | ||
| 6.92 | | 6.92 | ||
Line 241: | Line 242: | ||
| Llama 2 Community | | Llama 2 Community | ||
|- | |- | ||
| WizardLM-13B-v1.1 | | [[WizardLM-13B-v1.1]] | ||
| | | | ||
| 6.76 | | 6.76 | ||
Line 247: | Line 248: | ||
| Non-commercial | | Non-commercial | ||
|- | |- | ||
| Tulu-30B | | [[Tulu-30B]] | ||
| | | | ||
| 6.43 | | 6.43 | ||
Line 253: | Line 254: | ||
| Non-commercial | | Non-commercial | ||
|- | |- | ||
| Guanaco-65B | | [[Guanaco-65B]] | ||
| | | | ||
| 6.41 | | 6.41 | ||
Line 259: | Line 260: | ||
| Non-commercial | | Non-commercial | ||
|- | |- | ||
| OpenAssistant-LLaMA-30B | | [[OpenAssistant-LLaMA-30B]] | ||
| | | | ||
| 6.41 | | 6.41 | ||
Line 265: | Line 266: | ||
| Non-commercial | | Non-commercial | ||
|- | |- | ||
| WizardLM-13B-v1.0 | | [[WizardLM-13B-v1.0]] | ||
| | | | ||
| 6.35 | | 6.35 | ||
Line 271: | Line 272: | ||
| Non-commercial | | Non-commercial | ||
|- | |- | ||
| Vicuna-7B-16k | | [[Vicuna-7B-16k]] | ||
| | | | ||
| 6.22 | | 6.22 | ||
Line 277: | Line 278: | ||
| Llama 2 Community | | Llama 2 Community | ||
|- | |- | ||
| Baize-v2-13B | | [[Baize-v2-13B]] | ||
| | | | ||
| 5.75 | | 5.75 | ||
Line 283: | Line 284: | ||
| Non-commercial | | Non-commercial | ||
|- | |- | ||
| XGen-7B-8K-Inst | | [[XGen-7B-8K-Inst]] | ||
| | | | ||
| 5.55 | | 5.55 | ||
Line 289: | Line 290: | ||
| Non-commercial | | Non-commercial | ||
|- | |- | ||
| Nous-Hermes-13B | | [[Nous-Hermes-13B]] | ||
| | | | ||
| 5.51 | | 5.51 | ||
Line 295: | Line 296: | ||
| Non-commercial | | Non-commercial | ||
|- | |- | ||
| MPT-30B-Instruct | | [[MPT-30B-Instruct]] | ||
| | | | ||
| 5.22 | | 5.22 | ||
Line 301: | Line 302: | ||
| CC-BY-SA 3.0 | | CC-BY-SA 3.0 | ||
|- | |- | ||
| Falcon-40B-Instruct | | [[Falcon-40B-Instruct]] | ||
| | | | ||
| 5.17 | | 5.17 | ||
Line 307: | Line 308: | ||
| Apache 2.0 | | Apache 2.0 | ||
|- | |- | ||
| H2O-Oasst-OpenLLaMA-13B | | [[H2O-Oasst-OpenLLaMA-13B]] | ||
| | | | ||
| 4.63 | | 4.63 | ||
Line 313: | Line 314: | ||
| Apache 2.0 | | Apache 2.0 | ||
|} | |} | ||
<ref name="”1”">Chatbot Arena Leaderboard https://arena.lmsys.org/</ref> | |||
==References== | |||
<references /> | |||
[[Category:Important]] | [[Category:Important]] |
Latest revision as of 11:24, 28 November 2023
Ranking of LLMs.
Model | ⭐ Arena Elo rating | 📈 MT-bench (score) | MMLU | License |
---|---|---|---|---|
GPT-4-Turbo | 1210 | 9.32 | Proprietary | |
GPT-4 | 1159 | 8.99 | 86.4 | Proprietary |
Claude-1 | 1146 | 7.9 | 77 | Proprietary |
Claude-2 | 1125 | 8.06 | 78.5 | Proprietary |
Claude-instant-1 | 1106 | 7.85 | 73.4 | Proprietary |
GPT-3.5-turbo | 1103 | 7.94 | 70 | Proprietary |
WizardLM-70b-v1.0 | 1093 | 7.71 | 63.7 | Llama 2 Community |
Vicuna-33B | 1090 | 7.12 | 59.2 | Non-commercial |
OpenChat-3.5 | 1070 | 7.81 | 64.3 | Apache-2.0 |
Llama-2-70b-chat | 1065 | 6.86 | 63 | Llama 2 Community |
WizardLM-13b-v1.2 | 1047 | 7.2 | 52.7 | Llama 2 Community |
zephyr-7b-beta | 1042 | 7.34 | 61.4 | MIT |
MPT-30B-chat | 1031 | 6.39 | 50.4 | CC-BY-NC-SA-4.0 |
Vicuna-13B | 1031 | 6.57 | 55.8 | Llama 2 Community |
QWen-Chat-14B | 1030 | 6.96 | 66.5 | Qianwen LICENSE |
falcon-180b-chat | 1024 | 68 | Falcon-180B TII License | |
zephyr-7b-alpha | 1024 | 6.88 | MIT | |
CodeLlama-34B-instruct | 1022 | 53.7 | Llama 2 Community | |
Guanaco-33B | 1021 | 6.53 | 57.6 | Non-commercial |
Llama-2-13b-chat | 1021 | 6.65 | 53.6 | Llama 2 Community |
Mistral-7B-Instruct-v0.1 | 1008 | 6.84 | 55.4 | Apache 2.0 |
Llama-2-7b-chat | 1001 | 6.27 | 45.8 | Llama 2 Community |
Vicuna-7B | 994 | 6.17 | 49.8 | Llama 2 Community |
PaLM-Chat-Bison-001 | 991 | 6.4 | Proprietary | |
ChatGLM3-6B | 970 | Apache-2.0 | ||
Koala-13B | 955 | 5.35 | 44.7 | Non-commercial |
GPT4All-13B-Snoozy | 925 | 5.41 | 43 | Non-commercial |
MPT-7B-Chat | 918 | 5.42 | 32 | CC-BY-NC-SA-4.0 |
ChatGLM2-6B | 918 | 4.96 | 45.5 | Apache-2.0 |
RWKV-4-Raven-14B | 915 | 3.98 | 25.6 | Apache 2.0 |
Alpaca-13B | 893 | 4.53 | 48.1 | Non-commercial |
OpenAssistant-Pythia-12B | 884 | 4.32 | 27 | Apache 2.0 |
ChatGLM-6B | 871 | 4.5 | 36.1 | Non-commercial |
FastChat-T5-3B | 863 | 3.04 | 47.7 | Apache 2.0 |
StableLM-Tuned-Alpha-7B | 833 | 2.75 | 24.4 | CC-BY-NC-SA-4.0 |
Dolly-V2-12B | 810 | 3.28 | 25.7 | MIT |
LLaMA-13B | 789 | 2.61 | 47 | Non-commercial |
WizardLM-30B | 7.01 | 58.7 | Non-commercial | |
Vicuna-13B-16k | 6.92 | 54.5 | Llama 2 Community | |
WizardLM-13B-v1.1 | 6.76 | 50 | Non-commercial | |
Tulu-30B | 6.43 | 58.1 | Non-commercial | |
Guanaco-65B | 6.41 | 62.1 | Non-commercial | |
OpenAssistant-LLaMA-30B | 6.41 | 56 | Non-commercial | |
WizardLM-13B-v1.0 | 6.35 | 52.3 | Non-commercial | |
Vicuna-7B-16k | 6.22 | 48.5 | Llama 2 Community | |
Baize-v2-13B | 5.75 | 48.9 | Non-commercial | |
XGen-7B-8K-Inst | 5.55 | 42.1 | Non-commercial | |
Nous-Hermes-13B | 5.51 | 49.3 | Non-commercial | |
MPT-30B-Instruct | 5.22 | 47.8 | CC-BY-SA 3.0 | |
Falcon-40B-Instruct | 5.17 | 54.7 | Apache 2.0 | |
H2O-Oasst-OpenLLaMA-13B | 4.63 | 42.8 | Apache 2.0 |
References
- ↑ Chatbot Arena Leaderboard https://arena.lmsys.org/