Prompt engineering for text generation: Difference between revisions

Line 133: Line 133:
Zhao et al. (2021) investigated few-shot classification using LLMs, specifically GPT-3. They identified several biases that contribute to high variance in performance: (1) majority label bias, (2) recency bias, and (3) common token bias. To address these biases, they proposed a method to calibrate label probabilities output by the model to be uniform when the input string is N/A.
Zhao et al. (2021) investigated few-shot classification using LLMs, specifically GPT-3. They identified several biases that contribute to high variance in performance: (1) majority label bias, (2) recency bias, and (3) common token bias. To address these biases, they proposed a method to calibrate label probabilities output by the model to be uniform when the input string is N/A.


==Tips for Example Selection==
====Tips for Example Selection====
===Semantically Similar Examples===
=====Semantically Similar Examples=====
Liu et al. (2021) suggested choosing examples that are semantically similar to the test example by employing nearest neighbor (NN) clustering in the embedding space.
Liu et al. (2021) suggested choosing examples that are semantically similar to the test example by employing nearest neighbor (NN) clustering in the embedding space.


===Diverse and Representative Examples===
=====Diverse and Representative Examples=====
Su et al. (2022) proposed a graph-based approach to select a diverse and representative set of examples: (1) construct a directed graph based on the cosine similarity between samples in the embedding space (e.g., using SBERT or other embedding models), and (2) start with a set of selected samples and a set of remaining samples, scoring each sample to encourage diverse selection.
Su et al. (2022) proposed a graph-based approach to select a diverse and representative set of examples: (1) construct a directed graph based on the cosine similarity between samples in the embedding space (e.g., using SBERT or other embedding models), and (2) start with a set of selected samples and a set of remaining samples, scoring each sample to encourage diverse selection.


===Embeddings via Contrastive Learning===
=====Embeddings via Contrastive Learning=====
Rubin et al. (2022) suggested training embeddings through contrastive learning specific to one training dataset for in-context learning sample selection. This approach measures the quality of an example based on a conditioned probability assigned by the language model.
Rubin et al. (2022) suggested training embeddings through contrastive learning specific to one training dataset for in-context learning sample selection. This approach measures the quality of an example based on a conditioned probability assigned by the language model.


===Q-Learning===
=====Q-Learning=====
Zhang et al. (2022) explored using Q-Learning for sample selection in LLM training.
Zhang et al. (2022) explored using Q-Learning for sample selection in LLM training.


===Uncertainty-Based Active Learning===
=====Uncertainty-Based Active Learning=====
Diao et al. (2023) proposed identifying examples with high disagreement or entropy among multiple sampling trials based on uncertainty-based active learning. These examples can then be annotated and used in few-shot prompts.
Diao et al. (2023) proposed identifying examples with high disagreement or entropy among multiple sampling trials based on uncertainty-based active learning. These examples can then be annotated and used in few-shot prompts.


370

edits