Jump to content

LLaMA/Model Card: Difference between revisions

No edit summary
Line 31: Line 31:
Model performance measures We use the following measure to evaluate the model:
Model performance measures We use the following measure to evaluate the model:


Accuracy for common sense reasoning, reading comprehension, natural language understanding (MMLU), BIG-bench hard, WinoGender and CrowS-Pairs,
*Accuracy for common sense reasoning, reading comprehension, natural language understanding (MMLU), BIG-bench hard, WinoGender and CrowS-Pairs,
Exact match for question answering,
*Exact match for question answering,
The toxicity score from Perspective API on RealToxicityPrompts.
*The toxicity score from Perspective API on RealToxicityPrompts.
Decision thresholds Not applicable.
Decision thresholds Not applicable.