LLaMA/Model Card: Difference between revisions

LLaMA/Model Card (view source)

233 bytes added , 24 February 2023

7,785

edits

@@ Line 74: / Line 74: @@
 We present our results on eight standard common sense reasoning benchmarks in the table below.
-LLaMa	Reasoning tasks
+{| class="wikitable"
-Number of parameters	BoolQ	PIQA	SIQA	HellaSwag	WinoGrande	ARC-e	ARC-c	OBQA	COPA
+|-
-B	76.5	79.8	48.9	76.1	70.1	76.7	47.6	57.2	93
+|+ style="caption-side:bottom"|Table 2 - Summary of LLama Model Performance on Reasoning tasks
-B	78.1	80.1	50.4	79.2	73	78.1	52.7	56.4	94
+|-
-B	83.1	82.3	50.4	82.8	76	81.4	57.8	58.6	92
+! colspan="1"| LLaMa
-B	85.3	82.8	52.3	84.2	77	81.5	56	60.2	94
+! colspan="9"| Model hyper parameters
-Table 2 - Summary of LLama Model Performance on Reasoning tasks
+|-
+!# of parameters
+!BoolQ
+!PIQA
+!SIQA
+!HellaSwag
+!WinoGrande
+!ARC-e
+!ARC-c
+!OBQA
+!COPA
+|-
+|7B || 76.5 || 79.8 || 48.9 || 76.1 || 70.1 || 76.7 || 47.6 || 57.2 || 93
+|-
+|13B || 78.1 || 80.1 || 50.4 || 79.2 || 73 || 78.1 || 52.7 || 56.4 || 94
+|-
+|33B || 83.1 || 82.3 || 50.4 || 82.8 || 76 || 81.4 || 57.8 || 58.6 || 92
+|-
+|65B || 85.3 || 82.8 || 52.3 || 84.2 || 77 || 81.5 || 56 || 60.2 || 94
+|-
+|}
 We present our results on bias in the table below. Note that lower value is better indicating lower bias.