Interface administrators, Administrators (Semantic MediaWiki), Curators (Semantic MediaWiki), Editors (Semantic MediaWiki), Suppressors, Administrators
7,785
edits
Line 74: | Line 74: | ||
We present our results on eight standard common sense reasoning benchmarks in the table below. | We present our results on eight standard common sense reasoning benchmarks in the table below. | ||
{| class="wikitable" | |||
|- | |||
7B 76.5 79.8 48.9 76.1 70.1 76.7 47.6 57.2 93 | |+ style="caption-side:bottom"|Table 2 - Summary of LLama Model Performance on Reasoning tasks | ||
13B 78.1 80.1 50.4 79.2 73 78.1 52.7 56.4 94 | |- | ||
33B 83.1 82.3 50.4 82.8 76 81.4 57.8 58.6 92 | ! colspan="1"| LLaMa | ||
65B 85.3 82.8 52.3 84.2 77 81.5 56 60.2 94 | ! colspan="9"| Model hyper parameters | ||
|- | |||
!# of parameters | |||
!BoolQ | |||
!PIQA | |||
!SIQA | |||
!HellaSwag | |||
!WinoGrande | |||
!ARC-e | |||
!ARC-c | |||
!OBQA | |||
!COPA | |||
|- | |||
|7B || 76.5 || 79.8 || 48.9 || 76.1 || 70.1 || 76.7 || 47.6 || 57.2 || 93 | |||
|- | |||
|13B || 78.1 || 80.1 || 50.4 || 79.2 || 73 || 78.1 || 52.7 || 56.4 || 94 | |||
|- | |||
|33B || 83.1 || 82.3 || 50.4 || 82.8 || 76 || 81.4 || 57.8 || 58.6 || 92 | |||
|- | |||
|65B || 85.3 || 82.8 || 52.3 || 84.2 || 77 || 81.5 || 56 || 60.2 || 94 | |||
|- | |||
|} | |||
We present our results on bias in the table below. Note that lower value is better indicating lower bias. | We present our results on bias in the table below. Note that lower value is better indicating lower bias. |