370
edits
Line 221: | Line 221: | ||
*[[Self-consistency sampling]], as suggested by Wang et al. (2022a), can improve reasoning accuracy by sampling a number of diverse answers and taking the majority vote.<ref name="”118”"></ref> | *[[Self-consistency sampling]], as suggested by Wang et al. (2022a), can improve reasoning accuracy by sampling a number of diverse answers and taking the majority vote.<ref name="”118”"></ref> | ||
*Wang et al. (2022b) proposed using ensemble learning by altering the example order or replacing human-written rationales with model-generated ones, introducing randomness during multiple sample trials. Model outputs can then be aggregated using a majority vote to obtain the final answer.<ref name="”120”">Wang et al. (2022b) Rationale-Augmented Ensembles in Language Models https://arxiv.org/abs/2207.00747</ref> | *Wang et al. (2022b) proposed using ensemble learning by altering the example order or replacing human-written rationales with model-generated ones, introducing randomness during multiple sample trials. Model outputs can then be aggregated using a majority vote to obtain the final answer.<ref name="”120”">Wang et al. (2022b) Rationale-Augmented Ensembles in Language Models https://arxiv.org/abs/2207.00747</ref> | ||
*If training examples only have true answers but no rationales, the STaR (Self-Taught Reasoner) method by Zelikman et al. (2022) can be followed: (1) ask the model to generate reasoning chains and keep only those leading to correct answers; (2) fine-tune the model with generated rationales and repeat the process until convergence. Higher temperature settings are more likely to generate incorrect rationales with correct answers. | |||
*If training examples only have true answers but no rationales, the [[STaR]] ([[Self-Taught Reasoner]]) method by Zelikman et al. (2022) can be followed: (1) ask the model to generate reasoning chains and keep only those leading to correct answers; (2) fine-tune the model with generated rationales and repeat the process until convergence. Higher temperature settings are more likely to generate incorrect rationales with correct answers.<ref name="”121”">Zelikman et al. (2022) STaR: Bootstrapping Reasoning With Reasoning https://arxiv.org/abs/2203.14465</ref> | |||
*Fu et al. (2023) found that prompts with demonstrations of higher reasoning complexity lead to better performance. They also suggested that using newline (\n) symbols to separate reasoning steps works better than step indicators, periods, or semicolons. | *Fu et al. (2023) found that prompts with demonstrations of higher reasoning complexity lead to better performance. They also suggested that using newline (\n) symbols to separate reasoning steps works better than step indicators, periods, or semicolons. | ||
*Complexity-based consistency, as proposed by Fu et al. (2023), involves explicitly preferring complex chains among all generations by taking a majority vote among only the top complex chains. | *Complexity-based consistency, as proposed by Fu et al. (2023), involves explicitly preferring complex chains among all generations by taking a majority vote among only the top complex chains. | ||
*Shum et al. (2023) discovered that CoT prompts with only complex examples improve the accuracy of complex questions but perform poorly on simple questions. This finding was based on evidence from the [[GSM8k]] dataset. | *Shum et al. (2023) discovered that CoT prompts with only complex examples improve the accuracy of complex questions but perform poorly on simple questions. This finding was based on evidence from the [[GSM8k]] dataset. | ||
*Fu et al. (2023) found that changing "Q:" to "Question:" in the prompts is helpful. | *Fu et al. (2023) found that changing "Q:" to "Question:" in the prompts is helpful. | ||
*Ye & Durrett (2022) observed that including explanations in prompts has a small to moderate effect on [[NLP]] tasks that involve reasoning over text, such as [[question-answering]] (QA) and [[natural language inference]] (NLI). They also noted that nonfactual explanations are more likely to lead to incorrect predictions than inconsistent explanations. | *Ye & Durrett (2022) observed that including explanations in prompts has a small to moderate effect on [[NLP]] tasks that involve reasoning over text, such as [[question-answering]] (QA) and [[natural language inference]] (NLI). They also noted that nonfactual explanations are more likely to lead to incorrect predictions than inconsistent explanations. | ||
*[[Self-Ask]], a method proposed by Press et al. (2022), repeatedly prompts the model to ask follow-up questions, constructing the thought process iteratively. Search engine results can be used to answer these follow-up questions. Similarly, IRCoT (Interleaving Retrieval CoT; Trivedi et al. 2022) and ReAct (Reason + Act; Yao et al. 2023) combine iterative CoT prompting with queries to Wikipedia APIs. These methods search for relevant entities and content and then incorporate the retrieved information back into the context, further enhancing the model's reasoning capabilities. | *[[Self-Ask]], a method proposed by Press et al. (2022), repeatedly prompts the model to ask follow-up questions, constructing the thought process iteratively. Search engine results can be used to answer these follow-up questions. Similarly, IRCoT (Interleaving Retrieval CoT; Trivedi et al. 2022) and ReAct (Reason + Act; Yao et al. 2023) combine iterative CoT prompting with queries to Wikipedia APIs. These methods search for relevant entities and content and then incorporate the retrieved information back into the context, further enhancing the model's reasoning capabilities. | ||
edits