Large language models ranking: Difference between revisions

no edit summary
(Created page with " Category:Important")
 
No edit summary
Line 1: Line 1:
 
Ranking of [[LLMs]].
{| class="wikitable"
! Model
! ⭐ Arena Elo rating
! 📈 MT-bench (score)
! MMLU
! License
|-
| GPT-4-Turbo
| 1210
| 9.32
|
| Proprietary
|-
| GPT-4
| 1159
| 8.99
| 86.4
| Proprietary
|-
| Claude-1
| 1146
| 7.9
| 77
| Proprietary
|-
| Claude-2
| 1125
| 8.06
| 78.5
| Proprietary
|-
| Claude-instant-1
| 1106
| 7.85
| 73.4
| Proprietary
|-
| GPT-3.5-turbo
| 1103
| 7.94
| 70
| Proprietary
|-
| WizardLM-70b-v1.0
| 1093
| 7.71
| 63.7
| Llama 2 Community
|-
| Vicuna-33B
| 1090
| 7.12
| 59.2
| Non-commercial
|-
| OpenChat-3.5
| 1070
| 7.81
| 64.3
| Apache-2.0
|-
| Llama-2-70b-chat
| 1065
| 6.86
| 63
| Llama 2 Community
|-
| WizardLM-13b-v1.2
| 1047
| 7.2
| 52.7
| Llama 2 Community
|-
| zephyr-7b-beta
| 1042
| 7.34
| 61.4
| MIT
|-
| MPT-30B-chat
| 1031
| 6.39
| 50.4
| CC-BY-NC-SA-4.0
|-
| Vicuna-13B
| 1031
| 6.57
| 55.8
| Llama 2 Community
|-
| QWen-Chat-14B
| 1030
| 6.96
| 66.5
| Qianwen LICENSE
|-
| falcon-180b-chat
| 1024
|
| 68
| Falcon-180B TII License
|-
| zephyr-7b-alpha
| 1024
| 6.88
|
| MIT
|-
| CodeLlama-34B-instruct
| 1022
|
| 53.7
| Llama 2 Community
|-
| Guanaco-33B
| 1021
| 6.53
| 57.6
| Non-commercial
|-
| Llama-2-13b-chat
| 1021
| 6.65
| 53.6
| Llama 2 Community
|-
| Mistral-7B-Instruct-v0.1
| 1008
| 6.84
| 55.4
| Apache 2.0
|-
| Llama-2-7b-chat
| 1001
| 6.27
| 45.8
| Llama 2 Community
|-
| Vicuna-7B
| 994
| 6.17
| 49.8
| Llama 2 Community
|-
| PaLM-Chat-Bison-001
| 991
| 6.4
|
| Proprietary
|-
| ChatGLM3-6B
| 970
|
|
| Apache-2.0
|-
| Koala-13B
| 955
| 5.35
| 44.7
| Non-commercial
|-
| GPT4All-13B-Snoozy
| 925
| 5.41
| 43
| Non-commercial
|-
| MPT-7B-Chat
| 918
| 5.42
| 32
| CC-BY-NC-SA-4.0
|-
| ChatGLM2-6B
| 918
| 4.96
| 45.5
| Apache-2.0
|-
| RWKV-4-Raven-14B
| 915
| 3.98
| 25.6
| Apache 2.0
|-
| Alpaca-13B
| 893
| 4.53
| 48.1
| Non-commercial
|-
| OpenAssistant-Pythia-12B
| 884
| 4.32
| 27
| Apache 2.0
|-
| ChatGLM-6B
| 871
| 4.5
| 36.1
| Non-commercial
|-
| FastChat-T5-3B
| 863
| 3.04
| 47.7
| Apache 2.0
|-
| StableLM-Tuned-Alpha-7B
| 833
| 2.75
| 24.4
| CC-BY-NC-SA-4.0
|-
| Dolly-V2-12B
| 810
| 3.28
| 25.7
| MIT
|-
| LLaMA-13B
| 789
| 2.61
| 47
| Non-commercial
|-
| WizardLM-30B
|
| 7.01
| 58.7
| Non-commercial
|-
| Vicuna-13B-16k
|
| 6.92
| 54.5
| Llama 2 Community
|-
| WizardLM-13B-v1.1
|
| 6.76
| 50
| Non-commercial
|-
| Tulu-30B
|
| 6.43
| 58.1
| Non-commercial
|-
| Guanaco-65B
|
| 6.41
| 62.1
| Non-commercial
|-
| OpenAssistant-LLaMA-30B
|
| 6.41
| 56
| Non-commercial
|-
| WizardLM-13B-v1.0
|
| 6.35
| 52.3
| Non-commercial
|-
| Vicuna-7B-16k
|
| 6.22
| 48.5
| Llama 2 Community
|-
| Baize-v2-13B
|
| 5.75
| 48.9
| Non-commercial
|-
| XGen-7B-8K-Inst
|
| 5.55
| 42.1
| Non-commercial
|-
| Nous-Hermes-13B
|
| 5.51
| 49.3
| Non-commercial
|-
| MPT-30B-Instruct
|
| 5.22
| 47.8
| CC-BY-SA 3.0
|-
| Falcon-40B-Instruct
|
| 5.17
| 54.7
| Apache 2.0
|-
| H2O-Oasst-OpenLLaMA-13B
|
| 4.63
| 42.8
| Apache 2.0
|}




[[Category:Important]]
[[Category:Important]]
223

edits