370
edits
Line 220: | Line 220: | ||
Several techniques have been proposed to improve the accuracy and effectiveness of CoT prompting: | Several techniques have been proposed to improve the accuracy and effectiveness of CoT prompting: | ||
*[[Self-consistency sampling]], as suggested by Wang et al. (2022a), can improve reasoning accuracy by sampling a number of diverse answers and taking the majority vote. | *[[Self-consistency sampling]], as suggested by Wang et al. (2022a), can improve reasoning accuracy by sampling a number of diverse answers and taking the majority vote.<ref name="”118”"></ref> | ||
*Wang et al. (2022b) proposed using ensemble learning by altering the example order or replacing human-written rationales with model-generated ones, introducing randomness during multiple sample trials. Model outputs can then be aggregated using a majority vote to obtain the final answer. | *Wang et al. (2022b) proposed using ensemble learning by altering the example order or replacing human-written rationales with model-generated ones, introducing randomness during multiple sample trials. Model outputs can then be aggregated using a majority vote to obtain the final answer. | ||
*If training examples only have true answers but no rationales, the STaR (Self-Taught Reasoner) method by Zelikman et al. (2022) can be followed: (1) ask the model to generate reasoning chains and keep only those leading to correct answers; (2) fine-tune the model with generated rationales and repeat the process until convergence. Higher temperature settings are more likely to generate incorrect rationales with correct answers. | *If training examples only have true answers but no rationales, the STaR (Self-Taught Reasoner) method by Zelikman et al. (2022) can be followed: (1) ask the model to generate reasoning chains and keep only those leading to correct answers; (2) fine-tune the model with generated rationales and repeat the process until convergence. Higher temperature settings are more likely to generate incorrect rationales with correct answers. |
edits