LLaMA/Model Card: Difference between revisions

LLaMA/Model Card (view source)

159 bytes added , 24 February 2023

no edit summary

7,785

edits

@@ Line 47: / Line 47: @@
 Hyperparameters for the model architecture
-LLaMa	Model hyper parameters
+{| class="wikitable"
-Number of parameters	dimension	n heads	n layers	Learn rate	Batch size	n tokens
+|-
-B	4096	32	32	3.0E-04	4M	1T
+! colspan="1"| LLaMa
-B	5120	40	40	3.0E-04	4M	1T
+! colspan="6"| Model hyper parameters
-B	6656	52	60	1.5.E-04	4M	1.4T
+|-
-B	8192	64	80	1.5.E-04	4M	1.4T
+!Number of parameters
+!dimension
+!n heads
+!n layers
+!Learn rate!
+!Batch size
+!n tokens
+|-
+|7B || 4096 || 32 || 32 || 3.0E-04 || 4M || 1T
+|-
+|13B || 5120 || 40 || 40 || 3.0E-04 || 4M || 1T
+|-
+|33B || 6656 || 52 || 60 || 1.5.E-04 || 4M || 1.4T
+|-
+|65B || 8192 || 64 || 80 || 1.5.E-04 || 4M || 1.4T
+|-
+|}
 Table 1 - Summary of LLama Model Hyperparameters