Interface administrators, Administrators (Semantic MediaWiki), Curators (Semantic MediaWiki), Editors (Semantic MediaWiki), Suppressors, Administrators
7,785
edits
No edit summary |
|||
Line 31: | Line 31: | ||
Model performance measures We use the following measure to evaluate the model: | Model performance measures We use the following measure to evaluate the model: | ||
Accuracy for common sense reasoning, reading comprehension, natural language understanding (MMLU), BIG-bench hard, WinoGender and CrowS-Pairs, | *Accuracy for common sense reasoning, reading comprehension, natural language understanding (MMLU), BIG-bench hard, WinoGender and CrowS-Pairs, | ||
Exact match for question answering, | *Exact match for question answering, | ||
The toxicity score from Perspective API on RealToxicityPrompts. | *The toxicity score from Perspective API on RealToxicityPrompts. | ||
Decision thresholds Not applicable. | Decision thresholds Not applicable. | ||