Jump to content

LLaMA/Model Card: Difference between revisions

no edit summary
(Created page with "==Model details== Organization developing the model The FAIR team of Meta AI. Model date LLaMA was trained between December. 2022 and Feb. 2023. Model version This is version 1 of the model. Model type LLaMA is an auto-regressive language model, based on the transformer architecture. The model comes in different sizes: 7B, 13B, 33B and 65B parameters. Paper or resources for more information More information can be found in the paper “LLaMA, Open and Efficient Found...")
 
No edit summary
Line 47: Line 47:
Hyperparameters for the model architecture
Hyperparameters for the model architecture


LLaMa Model hyper parameters
{| class="wikitable"
Number of parameters dimension n heads n layers Learn rate Batch size n tokens
|-
7B 4096 32 32 3.0E-04 4M 1T
! colspan="1"| LLaMa
13B 5120 40 40 3.0E-04 4M 1T
! colspan="6"| Model hyper parameters
33B 6656 52 60 1.5.E-04 4M 1.4T
|-
65B 8192 64 80 1.5.E-04 4M 1.4T
!Number of parameters
!dimension
!n heads
!n layers
!Learn rate!
!Batch size
!n tokens
|-
|7B || 4096 || 32 || 32 || 3.0E-04 || 4M || 1T
|-
|13B || 5120 || 40 || 40 || 3.0E-04 || 4M || 1T
|-
|33B || 6656 || 52 || 60 || 1.5.E-04 || 4M || 1.4T
|-
|65B || 8192 || 64 || 80 || 1.5.E-04 || 4M || 1.4T
|-
|}
Table 1 - Summary of LLama Model Hyperparameters
Table 1 - Summary of LLama Model Hyperparameters