Prompt engineering for text generation: Difference between revisions

no edit summary
No edit summary
 
(5 intermediate revisions by 2 users not shown)
Line 140: Line 140:


====Embeddings via Contrastive Learning====
====Embeddings via Contrastive Learning====
Rubin et al. (2022) suggested training embeddings through [[contrastive learning]] specific to one [[training dataset]] for in-context learning sample selection. This approach measures the quality of an example based on a conditioned probability assigned by the language model.<ref name="”114”">Rubin et al. (2022) Learning To Retrieve Prompts for In-Context Learning https://arxiv.org/abs/2112.08633</ref>
Rubin et al. (2022) suggested training embeddings through [[contrastive learning]] specific to one [[training dataset]] for [[in-context learning]] sample selection. This approach measures the quality of an example based on a conditioned probability assigned by the language model.<ref name="”114”">Rubin et al. (2022) Learning To Retrieve Prompts for In-Context Learning https://arxiv.org/abs/2112.08633</ref>


====Q-Learning====
====Q-Learning====
Line 215: Line 215:


====Zero-shot CoT====
====Zero-shot CoT====
[[Zero-shot CoT prompting]] uses natural language statements, such as "Let's think step by step" or "Let's work this out step by step to be sure we have the right answer," to explicitly encourage the model to generate reasoning chains. Following this, a statement like "Therefore, the answer is" is used to prompt the model to produce the final answer (Kojima et al. 2022; Zhou et al. 2022).
[[Zero-shot CoT prompting]] uses natural language statements, such as "Let's think step by step" or "Let's work this out step by step to be sure we have the right answer," to explicitly encourage the model to generate reasoning chains. Following this, a statement like "Therefore, the answer is" is used to prompt the model to produce the final answer.<ref name="”128”">Kojima et al. (2022) Large Language Models are Zero-Shot Reasoners https://arxiv.org/abs/2205.11916</ref><ref name="”129”">Zhou et al. (2022) Large Language Models Are Human-Level Prompt Engineers https://arxiv.org/abs/2211.01910</ref>


===Tips and Extensions===
===Tips and Extensions===
Line 226: Line 226:
*If training examples only have true answers but no rationales, the [[STaR]] ([[Self-Taught Reasoner]]) method by Zelikman et al. (2022) can be followed: (1) ask the model to generate reasoning chains and keep only those leading to correct answers; (2) fine-tune the model with generated rationales and repeat the process until convergence. Higher temperature settings are more likely to generate incorrect rationales with correct answers.<ref name="”121”">Zelikman et al. (2022) STaR: Bootstrapping Reasoning With Reasoning https://arxiv.org/abs/2203.14465</ref>
*If training examples only have true answers but no rationales, the [[STaR]] ([[Self-Taught Reasoner]]) method by Zelikman et al. (2022) can be followed: (1) ask the model to generate reasoning chains and keep only those leading to correct answers; (2) fine-tune the model with generated rationales and repeat the process until convergence. Higher temperature settings are more likely to generate incorrect rationales with correct answers.<ref name="”121”">Zelikman et al. (2022) STaR: Bootstrapping Reasoning With Reasoning https://arxiv.org/abs/2203.14465</ref>


*Fu et al. (2023) found that prompts with demonstrations of higher reasoning complexity lead to better performance. They also suggested that using newline (\n) symbols to separate reasoning steps works better than step indicators, periods, or semicolons.
*Fu et al. (2023) found that prompts with demonstrations of higher reasoning complexity lead to better performance. They also suggested that using newline (\n) symbols to separate reasoning steps works better than step indicators, periods, or semicolons.<ref name="”122”">Fu et al. (2023) Complexity-Based Prompting for Multi-Step Reasoning https://arxiv.org/abs/2210.00720</ref>


*Complexity-based consistency, as proposed by Fu et al. (2023), involves explicitly preferring complex chains among all generations by taking a majority vote among only the top complex chains.
*Complexity-based consistency, as proposed by Fu et al. (2023), involves explicitly preferring complex chains among all generations by taking a majority vote among only the top complex chains.<ref name="”122”"></ref>


*Shum et al. (2023) discovered that CoT prompts with only complex examples improve the accuracy of complex questions but perform poorly on simple questions. This finding was based on evidence from the [[GSM8k]] dataset.
*Shum et al. (2023) discovered that CoT prompts with only complex examples improve the accuracy of complex questions but perform poorly on simple questions. This finding was based on evidence from the [[GSM8k]] dataset.<ref name="”123”">Shum et al. (2023) Automatic Prompt Augmentation and Selection with Chain-of-Thought from Labeled Data https://arxiv.org/abs/2302.12822</ref>


*Fu et al. (2023) found that changing "Q:" to "Question:" in the prompts is helpful.
*Fu et al. (2023) found that changing "Q:" to "Question:" in the prompts is helpful.<ref name="”122”"></ref>


*Ye & Durrett (2022) observed that including explanations in prompts has a small to moderate effect on [[NLP]] tasks that involve reasoning over text, such as [[question-answering]] (QA) and [[natural language inference]] (NLI). They also noted that nonfactual explanations are more likely to lead to incorrect predictions than inconsistent explanations.
*Ye & Durrett (2022) observed that including explanations in prompts has a small to moderate effect on [[NLP]] tasks that involve reasoning over text, such as [[question-answering]] (QA) and [[natural language inference]] (NLI). They also noted that nonfactual explanations are more likely to lead to incorrect predictions than inconsistent explanations.<ref name="”124”">Ye & Durrett (2022) The Unreliability of Explanations in Few-shot Prompting for Textual Reasoning https://arxiv.org/abs/2205.03401</ref>


*[[Self-Ask]], a method proposed by Press et al. (2022), repeatedly prompts the model to ask follow-up questions, constructing the thought process iteratively. Search engine results can be used to answer these follow-up questions. Similarly, IRCoT (Interleaving Retrieval CoT; Trivedi et al. 2022) and ReAct (Reason + Act; Yao et al. 2023) combine iterative CoT prompting with queries to Wikipedia APIs. These methods search for relevant entities and content and then incorporate the retrieved information back into the context, further enhancing the model's reasoning capabilities.
*[[Self-Ask]], a method proposed by Press et al. (2022), repeatedly prompts the model to ask follow-up questions, constructing the thought process iteratively.<ref name="”125”">Press et al. (2022) Measuring and Narrowing the Compositionality Gap in Language Models https://arxiv.org/abs/2210.03350</ref> Search engine results can be used to answer these follow-up questions. Similarly, [[IRCoT]] ([[Interleaving Retrieval CoT]]; Trivedi et al. 2022) and [[ReAct]] ([[Reason + Act]]; Yao et al. 2023) combine iterative CoT prompting with queries to Wikipedia APIs. These methods search for relevant entities and content and then incorporate the retrieved information back into the context, further enhancing the model's reasoning capabilities.<ref name="”126”">Trivedi et al. (2022) Interleaving Retrieval with Chain-of-Thought Reasoning for Knowledge-Intensive Multi-Step Questions https://arxiv.org/abs/2212.10509</ref><ref name="”127”">Yao et al. (2023) ReAct: Synergizing Reasoning and Acting in Language Models https://arxiv.org/abs/2210.03629</ref>
 
==26 Principals for Good Prompts==
{{see also|26 Principles of Good Prompts}}
{{:26 Principles of Good Prompts}}


==Prompt Engineering for Code Generation Models==
==Prompt Engineering for Code Generation Models==
Line 314: Line 318:
*[[Vividness]] -
*[[Vividness]] -
*[[Ecclesiastical]] -
*[[Ecclesiastical]] -
==Connecting External APIs==


==Resources==
==Resources==
1,065

edits