370
edits
No edit summary |
|||
Line 95: | Line 95: | ||
</poem> | </poem> | ||
==Few-shot Prompting | ==Few-shot Prompting (Basic)== | ||
[[Few-shot prompting]] have a task description, a few examples and then a prompt. | [[Few-shot prompting]] have a task description, a few examples and then a prompt. | ||
===For example=== | |||
<poem style="border: 1px solid; padding: 1rem"> | <poem style="border: 1px solid; padding: 1rem"> | ||
Line 120: | Line 119: | ||
*Vacationing in Florida is fun: | *Vacationing in Florida is fun: | ||
===Example output=== | |||
<poem style="border: 1px solid; padding: 1rem"> | <poem style="border: 1px solid; padding: 1rem"> | ||
Line 126: | Line 125: | ||
</poem> | </poem> | ||
== | ==Few-shot Prompting (Advanced)== | ||
In few-shot prompting, the model is presented with high-quality demonstrations, including input and desired output, for the target task. This approach enables the model to understand the human intention better and the desired criteria for answers, often resulting in improved performance compared to zero-shot prompting. However, this comes at the expense of increased token consumption and may reach the context length limit for longer input and output texts. | In few-shot prompting, the model is presented with high-quality demonstrations, including input and desired output, for the target task. This approach enables the model to understand the human intention better and the desired criteria for answers, often resulting in improved performance compared to zero-shot prompting. However, this comes at the expense of increased token consumption and may reach the context length limit for longer input and output texts. | ||
Line 133: | Line 132: | ||
Zhao et al. (2021) investigated [[few-shot classification]] using LLMs, specifically [[GPT-3]]. They identified several biases that contribute to high [[variance]] in performance: (1) majority [[label bias]], (2) [[recency bias]], and (3) [[common token bias]]. To address these [[biases]], they proposed a method to calibrate label probabilities output by the model to be uniform when the input string is N/A.<ref name="”111”">Zhao et al. (2021) Calibrate Before Use: Improving Few-Shot Performance of Language Models https://arxiv.org/abs/2102.09690</ref> | Zhao et al. (2021) investigated [[few-shot classification]] using LLMs, specifically [[GPT-3]]. They identified several biases that contribute to high [[variance]] in performance: (1) majority [[label bias]], (2) [[recency bias]], and (3) [[common token bias]]. To address these [[biases]], they proposed a method to calibrate label probabilities output by the model to be uniform when the input string is N/A.<ref name="”111”">Zhao et al. (2021) Calibrate Before Use: Improving Few-Shot Performance of Language Models https://arxiv.org/abs/2102.09690</ref> | ||
===Tips for Example Selection=== | |||
====Semantically Similar Examples==== | |||
Liu et al. (2021) suggested choosing examples that are semantically similar to the test example by employing [[k-nearest neighbors]] (KNN) clustering in the [[embedding space]]. | Liu et al. (2021) suggested choosing examples that are semantically similar to the test example by employing [[k-nearest neighbors]] (KNN) clustering in the [[embedding space]]. | ||
====Diverse and Representative Examples==== | |||
Su et al. (2022) proposed a [[graph-based approach]] to select a diverse and representative set of examples: (1) construct a directed graph based on the cosine similarity between samples in the embedding space (e.g., using [[SBERT]] or other [[embedding models]]), and (2) start with a set of selected samples and a set of remaining samples, scoring each sample to encourage [[diverse selection]]. | Su et al. (2022) proposed a [[graph-based approach]] to select a diverse and representative set of examples: (1) construct a directed graph based on the cosine similarity between samples in the embedding space (e.g., using [[SBERT]] or other [[embedding models]]), and (2) start with a set of selected samples and a set of remaining samples, scoring each sample to encourage [[diverse selection]]. | ||
====Embeddings via Contrastive Learning==== | |||
Rubin et al. (2022) suggested training embeddings through [[contrastive learning]] specific to one [[training dataset]] for in-context learning sample selection. This approach measures the quality of an example based on a conditioned probability assigned by the language model. | Rubin et al. (2022) suggested training embeddings through [[contrastive learning]] specific to one [[training dataset]] for in-context learning sample selection. This approach measures the quality of an example based on a conditioned probability assigned by the language model. | ||
====Q-Learning==== | |||
Zhang et al. (2022) explored using [[Q-Learning]] for sample selection in LLM training. | Zhang et al. (2022) explored using [[Q-Learning]] for sample selection in LLM training. | ||
====Uncertainty-Based Active Learning==== | |||
Diao et al. (2023) proposed identifying examples with [[high disagreement]] or [[entropy]] among multiple sampling trials based on [[uncertainty-based active learning]]. These examples can then be annotated and used in few-shot prompts. | Diao et al. (2023) proposed identifying examples with [[high disagreement]] or [[entropy]] among multiple sampling trials based on [[uncertainty-based active learning]]. These examples can then be annotated and used in few-shot prompts. | ||
edits