Interface administrators, Administrators (Semantic MediaWiki), Curators (Semantic MediaWiki), Editors (Semantic MediaWiki), Suppressors, Administrators
7,785
edits
(Created page with "==Model details== Organization developing the model The FAIR team of Meta AI. Model date LLaMA was trained between December. 2022 and Feb. 2023. Model version This is version 1 of the model. Model type LLaMA is an auto-regressive language model, based on the transformer architecture. The model comes in different sizes: 7B, 13B, 33B and 65B parameters. Paper or resources for more information More information can be found in the paper “LLaMA, Open and Efficient Found...") |
No edit summary |
||
Line 47: | Line 47: | ||
Hyperparameters for the model architecture | Hyperparameters for the model architecture | ||
LLaMa Model hyper parameters | {| class="wikitable" | ||
Number of parameters dimension n heads n layers Learn rate Batch size n tokens | |- | ||
7B 4096 32 32 3.0E-04 4M 1T | ! colspan="1"| LLaMa | ||
13B 5120 40 40 3.0E-04 4M 1T | ! colspan="6"| Model hyper parameters | ||
33B 6656 52 60 1.5.E-04 4M 1.4T | |- | ||
65B 8192 64 80 1.5.E-04 4M 1.4T | !Number of parameters | ||
!dimension | |||
!n heads | |||
!n layers | |||
!Learn rate! | |||
!Batch size | |||
!n tokens | |||
|- | |||
|7B || 4096 || 32 || 32 || 3.0E-04 || 4M || 1T | |||
|- | |||
|13B || 5120 || 40 || 40 || 3.0E-04 || 4M || 1T | |||
|- | |||
|33B || 6656 || 52 || 60 || 1.5.E-04 || 4M || 1.4T | |||
|- | |||
|65B || 8192 || 64 || 80 || 1.5.E-04 || 4M || 1.4T | |||
|- | |||
|} | |||
Table 1 - Summary of LLama Model Hyperparameters | Table 1 - Summary of LLama Model Hyperparameters | ||