GPT-4

Exams

Exam	GPT-4 Points	GPT-4 Percentile	GPT-4 (no vision) Points	GPT-4 (no vision) Percentile	GPT-3.5 Points	GPT-3.5 Percentile
Uniform Bar Exam (MBE+MEE+MPT)1	298 / 400	~90th	298 / 400	~90th	213 / 400	~10th
LSAT	163	~88th	161	~83rd	149	~40th
SAT Evidence-Based Reading & Writing	710 / 800	~93rd	710 / 800	~93rd	670 / 800	~87th
SAT Math	700 / 800	~89th	690 / 800	~89th	590 / 800	~70th
Graduate Record Examination (GRE) Quantitative	163 / 170	~80th	157 / 170	~62nd	147 / 170	~25th
Graduate Record Examination (GRE) Verbal	169 / 170	~99th	165 / 170	~96th	154 / 170	~63rd
Graduate Record Examination (GRE) Writing	4 / 6	~54th	4 / 6	~54th	4 / 6	~54th
USABO Semifinal Exam 2020	87 / 150	99th–100th	87 / 150	99th–100th	43 / 150	31st–33rd
USNCO Local Section Exam 2022	36 / 60		38 / 60		24 / 60
Medical Knowledge Self-Assessment Program	75%		75%		53%
Codeforces Rating	392	below 5th	392	below 5th	260	below 5th
AP Art History	5	86th–100th	5	86th–100th	5	86th–100th
AP Biology	5	85th–100th	5	85th–100th	4	62nd–85th
AP Calculus BC	4	43rd–59th	4	43rd–59th	1	0th–7th

Benchmark	GPT-4	Evaluated few-shot	GPT-3.5	Evaluated few-shot	LM SOTA	Best external LM evaluated few-shot	SOTA	Best external model (includes benchmark-specific training)
MMLU	86.4%	5-shot	70.0%	5-shot	70.7%	5-shot U-PaLM	75.2%	5-shot Flan-PaLM
HellaSwag	95.3%	10-shot	85.5%	10-shot	84.2%	LLAMA (validation set)	85.6%	ALUM
AI2 Reasoning Challenge (ARC)	96.3%	25-shot	85.2%	25-shot	84.2%	8-shot PaLM	85.6%	ST-MOE
WinoGrande	87.5%	5-shot	81.6%	5-shot	84.2%	5-shot PALM	85.6%	5-shot PALM
HumanEval	67.0%	0-shot	48.1%	0-shot	26.2%	0-shot PaLM	65.8%	CodeT + GPT-3.5
DROP (f1 score)	80.9	3-shot	64.1	3-shot	70.8	1-shot PaLM	88.4