Jump to content

LLaMA/Model Card: Difference between revisions

Line 74: Line 74:
We present our results on eight standard common sense reasoning benchmarks in the table below.
We present our results on eight standard common sense reasoning benchmarks in the table below.


LLaMa Reasoning tasks
{| class="wikitable"
Number of parameters BoolQ PIQA SIQA HellaSwag WinoGrande ARC-e ARC-c OBQA COPA
|-
7B 76.5 79.8 48.9 76.1 70.1 76.7 47.6 57.2 93
|+ style="caption-side:bottom"|Table 2 - Summary of LLama Model Performance on Reasoning tasks
13B 78.1 80.1 50.4 79.2 73 78.1 52.7 56.4 94
|-
33B 83.1 82.3 50.4 82.8 76 81.4 57.8 58.6 92
! colspan="1"| LLaMa
65B 85.3 82.8 52.3 84.2 77 81.5 56 60.2 94
! colspan="9"| Model hyper parameters
Table 2 - Summary of LLama Model Performance on Reasoning tasks
|-
!# of parameters
!BoolQ
!PIQA
!SIQA
!HellaSwag
!WinoGrande
!ARC-e
!ARC-c
!OBQA
!COPA
|-
|7B || 76.5 || 79.8 || 48.9 || 76.1 || 70.1 || 76.7 || 47.6 || 57.2 || 93
|-
|13B || 78.1 || 80.1 || 50.4 || 79.2 || 73 || 78.1 || 52.7 || 56.4 || 94
|-
|33B || 83.1 || 82.3 || 50.4 || 82.8 || 76 || 81.4 || 57.8 || 58.6 || 92
|-
|65B || 85.3 || 82.8 || 52.3 || 84.2 || 77 || 81.5 || 56 || 60.2 || 94
|-
|}


We present our results on bias in the table below. Note that lower value is better indicating lower bias.
We present our results on bias in the table below. Note that lower value is better indicating lower bias.