Large language models ranking: Difference between revisions

Revision as of 11:18, 28 November 2023

Ranking of LLMs.

Model	⭐ Arena Elo rating	📈 MT-bench (score)	MMLU	License
GPT-4-Turbo	1210	9.32		Proprietary
GPT-4	1159	8.99	86.4	Proprietary
Claude-1	1146	7.9	77	Proprietary
Claude-2	1125	8.06	78.5	Proprietary
Claude-instant-1	1106	7.85	73.4	Proprietary
GPT-3.5-turbo	1103	7.94	70	Proprietary
WizardLM-70b-v1.0	1093	7.71	63.7	Llama 2 Community
Vicuna-33B	1090	7.12	59.2	Non-commercial
OpenChat-3.5	1070	7.81	64.3	Apache-2.0
Llama-2-70b-chat	1065	6.86	63	Llama 2 Community
WizardLM-13b-v1.2	1047	7.2	52.7	Llama 2 Community
zephyr-7b-beta	1042	7.34	61.4	MIT
MPT-30B-chat	1031	6.39	50.4	CC-BY-NC-SA-4.0
Vicuna-13B	1031	6.57	55.8	Llama 2 Community
QWen-Chat-14B	1030	6.96	66.5	Qianwen LICENSE
falcon-180b-chat	1024		68	Falcon-180B TII License
zephyr-7b-alpha	1024	6.88		MIT
CodeLlama-34B-instruct	1022		53.7	Llama 2 Community
Guanaco-33B	1021	6.53	57.6	Non-commercial
Llama-2-13b-chat	1021	6.65	53.6	Llama 2 Community
Mistral-7B-Instruct-v0.1	1008	6.84	55.4	Apache 2.0
Llama-2-7b-chat	1001	6.27	45.8	Llama 2 Community
Vicuna-7B	994	6.17	49.8	Llama 2 Community
PaLM-Chat-Bison-001	991	6.4		Proprietary
ChatGLM3-6B	970			Apache-2.0
Koala-13B	955	5.35	44.7	Non-commercial
GPT4All-13B-Snoozy	925	5.41	43	Non-commercial
MPT-7B-Chat	918	5.42	32	CC-BY-NC-SA-4.0
ChatGLM2-6B	918	4.96	45.5	Apache-2.0
RWKV-4-Raven-14B	915	3.98	25.6	Apache 2.0
Alpaca-13B	893	4.53	48.1	Non-commercial
OpenAssistant-Pythia-12B	884	4.32	27	Apache 2.0
ChatGLM-6B	871	4.5	36.1	Non-commercial
FastChat-T5-3B	863	3.04	47.7	Apache 2.0
StableLM-Tuned-Alpha-7B	833	2.75	24.4	CC-BY-NC-SA-4.0
Dolly-V2-12B	810	3.28	25.7	MIT
LLaMA-13B	789	2.61	47	Non-commercial
WizardLM-30B		7.01	58.7	Non-commercial
Vicuna-13B-16k		6.92	54.5	Llama 2 Community
WizardLM-13B-v1.1		6.76	50	Non-commercial
Tulu-30B		6.43	58.1	Non-commercial
Guanaco-65B		6.41	62.1	Non-commercial
OpenAssistant-LLaMA-30B		6.41	56	Non-commercial
WizardLM-13B-v1.0		6.35	52.3	Non-commercial
Vicuna-7B-16k		6.22	48.5	Llama 2 Community
Baize-v2-13B		5.75	48.9	Non-commercial
XGen-7B-8K-Inst		5.55	42.1	Non-commercial
Nous-Hermes-13B		5.51	49.3	Non-commercial
MPT-30B-Instruct		5.22	47.8	CC-BY-SA 3.0
Falcon-40B-Instruct		5.17	54.7	Apache 2.0
H2O-Oasst-OpenLLaMA-13B		4.63	42.8	Apache 2.0