Jump to content

Prompt engineering for text generation: Difference between revisions

Line 205: Line 205:
Self-consistency sampling is a method for generating multiple outputs using a [[temperature]] greater than 0 and selecting the best candidate from the generated outputs. The criteria for choosing the best candidate may vary according to the task. A common approach is to use [[majority vote]]. In tasks that are easy to validate, such as programming questions with unit tests, the outputs can be run through an interpreter and their correctness can be verified using unit tests.
Self-consistency sampling is a method for generating multiple outputs using a [[temperature]] greater than 0 and selecting the best candidate from the generated outputs. The criteria for choosing the best candidate may vary according to the task. A common approach is to use [[majority vote]]. In tasks that are easy to validate, such as programming questions with unit tests, the outputs can be run through an interpreter and their correctness can be verified using unit tests.


==Chain of Thought (CoT) Prompting==
==Chain of Thought Prompting==
{{see also|Chain of Thought Prompting}}
{{see also|Chain of Thought Prompting}}
Chain-of-thought (CoT) prompting, proposed by Wei et al. (2022), is a technique that generates a sequence of short sentences to describe reasoning logic step by step. These sequences, also known as reasoning chains or rationales, eventually lead to the final answer. CoT is particularly beneficial for complicated reasoning tasks and is more effective when used with large language models (e.g., models with over 50 billion parameters). For simpler tasks, CoT prompting provides only slight improvements.
==Chain-of-Thought Prompting==
Chain-of-Thought (CoT) prompting is a technique introduced by Wei et al. (2022) to generate a sequence of short sentences describing step-by-step reasoning, known as reasoning chains or rationales, leading to the final answer. CoT prompting is particularly useful for complex reasoning tasks when applied to large language models (e.g., those with over 50 billion parameters), while simpler tasks may benefit only marginally.


===Types of CoT Prompts===
==Types of CoT Prompts==
There are two primary types of CoT prompting:
There are two main types of CoT prompting:


'''Few-shot CoT:''' This type of prompting provides the model with a few demonstrations, each containing manually written or model-generated high-quality reasoning chains.
===Few-shot CoT===
Few-shot CoT prompting involves providing the model with a limited number of demonstrations, each containing either manually written or model-generated high-quality reasoning chains. Examples of such demonstrations are provided in the original article, showcasing how this type of prompting is used to solve various mathematical reasoning problems.


'''Zero-shot CoT:''' This type of prompting uses natural language statements, such as "Let's think step by step" or "Let's work this out step by step to be sure we have the right answer," to explicitly encourage the model to first generate reasoning chains and then produce answers (Kojima et al. 2022; Zhou et al. 2022).
===Zero-shot CoT===
Zero-shot CoT prompting uses natural language statements, such as "Let's think step by step" or "Let's work this out step by step to be sure we have the right answer," to explicitly encourage the model to generate reasoning chains. Following this, a statement like "Therefore, the answer is" is used to prompt the model to produce the final answer (Kojima et al. 2022; Zhou et al. 2022).


===Tips and Extensions===
==Tips and Extensions==
Self-consistency sampling can enhance reasoning accuracy by generating diverse answers and selecting the majority vote (Wang et al. 2022a).
Several techniques have been proposed to improve the accuracy and effectiveness of CoT prompting:
Altering example order or using model-generated rationales instead of human-written ones introduces randomness during multiple sample trials. The final answer can be obtained by aggregating model outputs using majority vote (Wang et al. 2022b).
The STaR (Self-Taught Reasoner) method, proposed by Zelikman et al. (2022), can be used when training examples have true answers but no rationales. The method involves asking the language model to generate reasoning chains, keeping only those that lead to correct answers, and fine-tuning the model with generated rationales until convergence.
Using prompts with demonstrations of higher reasoning complexity, as measured by the number of reasoning steps in the chains, can achieve better performance (Fu et al. 2023).
Complexity-based consistency involves selecting complex chains from all generations by taking the majority vote among only the top complex chains (Fu et al. 2023).
Shum et al. (2023) found that CoT prompts with only complex examples improve the accuracy of complex questions but perform poorly on simple questions.
Changing "Q:" to "Question:" has been shown to be helpful (Fu et al. 2023).
Ye & Durrett (2022) observed that the benefit of including explanations in the prompt is small to moderate for NLP tasks involving reasoning over text (e.g., QA and NLI). They also found that nonfactual explanations are more likely to lead to incorrect predictions.


===Iterative Methods with External Queries===
*Self-consistency sampling, as suggested by Wang et al. (2022a), can improve reasoning accuracy by sampling a number of diverse answers and taking the majority vote.
Methods such as Self-Ask (Press et al. 2022), IRCoT (Interleaving Retrieval CoT; Trivedi et al. 2022), and ReAct (Reason + Act; Yao et al. 2023) involve prompting the model to ask follow-up questions, constructing the thought process iteratively.
*Wang et al. (2022b) proposed using ensemble learning by altering the example order or replacing human-written rationales with model-generated ones, introducing randomness during multiple sample trials. Model outputs can then be aggregated using a majority vote to obtain the final answer.
*If training examples only have true answers but no rationales, the STaR (Self-Taught Reasoner) method by Zelikman et al. (2022) can be followed: (1) ask the model to generate reasoning chains and keep only those leading to correct answers; (2) fine-tune the model with generated rationales and repeat the process until convergence. Higher temperature settings are more likely to generate incorrect rationales with correct answers.
*Fu et al. (2023) found that prompts with demonstrations of higher reasoning complexity lead to better performance. They also suggested that using newline (\n) symbols to separate reasoning steps works better than step indicators, periods, or semicolons.
*Complexity-based consistency, as proposed by Fu et al. (2023), involves explicitly preferring complex chains among all generations by taking a majority vote among only the top complex chains.
*Shum et al. (2023) discovered that CoT prompts with only complex examples improve the accuracy of complex questions but perform poorly on simple questions. This finding was based on evidence from the GSM8k dataset.
*Fu et al. (2023) found that changing "Q:" to "Question:" in the prompts is helpful.
*Ye & Durrett (2022) observed that including explanations in prompts has a small to moderate effect on NLP tasks that involve reasoning over text, such as question-answering (QA) and natural language inference (NLI). They also noted that nonfactual explanations are more likely to lead to incorrect predictions than inconsistent explanations.
*Self-Ask, a method proposed by Press et al. (2022), repeatedly prompts the model to ask follow-up questions, constructing the thought process iteratively. Search engine results can be used to answer these follow-up questions. Similarly, IRCoT (Interleaving Retrieval CoT; Trivedi et al.


==Prompt Engineering for Code Generation Models==
==Prompt Engineering for Code Generation Models==
370

edits