370
edits
No edit summary |
No edit summary |
||
Line 131: | Line 131: | ||
Numerous studies have explored how to construct in-context examples to maximize performance. [[Prompt format]], [[training examples]], and [[example order]] can lead to dramatically different performance outcomes, ranging from near-random guessing to near state-of-the-art (SoTA) results. | Numerous studies have explored how to construct in-context examples to maximize performance. [[Prompt format]], [[training examples]], and [[example order]] can lead to dramatically different performance outcomes, ranging from near-random guessing to near state-of-the-art (SoTA) results. | ||
Zhao et al. (2021) investigated [[few-shot classification]] using LLMs, specifically [[GPT-3]]. They identified several biases that contribute to high [[variance]] in performance: (1) majority [[label bias]], (2) [[recency bias]], and (3) [[common token bias]]. To address these [[biases]], they proposed a method to calibrate label probabilities output by the model to be uniform when the input string is N/A.<ref name="”111”">Zhao et al. (2021) Calibrate Before Use: Improving Few-Shot Performance of Language Models | Zhao et al. (2021) investigated [[few-shot classification]] using LLMs, specifically [[GPT-3]]. They identified several biases that contribute to high [[variance]] in performance: (1) majority [[label bias]], (2) [[recency bias]], and (3) [[common token bias]]. To address these [[biases]], they proposed a method to calibrate label probabilities output by the model to be uniform when the input string is N/A.<ref name="”111”">Zhao et al. (2021) Calibrate Before Use: Improving Few-Shot Performance of Language Models https://arxiv.org/abs/2102.09690</ref> | ||
====Tips for Example Selection==== | ====Tips for Example Selection==== | ||
=====Semantically Similar Examples===== | =====Semantically Similar Examples===== | ||
Liu et al. (2021) suggested choosing examples that are semantically similar to the test example by employing nearest | Liu et al. (2021) suggested choosing examples that are semantically similar to the test example by employing [[k-nearest neighbors]] (KNN) clustering in the [[embedding space]]. | ||
=====Diverse and Representative Examples===== | =====Diverse and Representative Examples===== | ||
Su et al. (2022) proposed a graph-based approach to select a diverse and representative set of examples: (1) construct a directed graph based on the cosine similarity between samples in the embedding space (e.g., using SBERT or other embedding models), and (2) start with a set of selected samples and a set of remaining samples, scoring each sample to encourage diverse selection. | Su et al. (2022) proposed a [[graph-based approach]] to select a diverse and representative set of examples: (1) construct a directed graph based on the cosine similarity between samples in the embedding space (e.g., using [[SBERT]] or other [[embedding models]]), and (2) start with a set of selected samples and a set of remaining samples, scoring each sample to encourage [[diverse selection]]. | ||
=====Embeddings via Contrastive Learning===== | =====Embeddings via Contrastive Learning===== | ||
Rubin et al. (2022) suggested training embeddings through contrastive learning specific to one training dataset for in-context learning sample selection. This approach measures the quality of an example based on a conditioned probability assigned by the language model. | Rubin et al. (2022) suggested training embeddings through [[contrastive learning]] specific to one [[training dataset]] for in-context learning sample selection. This approach measures the quality of an example based on a conditioned probability assigned by the language model. | ||
=====Q-Learning===== | =====Q-Learning===== | ||
Zhang et al. (2022) explored using Q-Learning for sample selection in LLM training. | Zhang et al. (2022) explored using [[Q-Learning]] for sample selection in LLM training. | ||
=====Uncertainty-Based Active Learning===== | =====Uncertainty-Based Active Learning===== | ||
Diao et al. (2023) proposed identifying examples with high disagreement or entropy among multiple sampling trials based on uncertainty-based active learning. These examples can then be annotated and used in few-shot prompts. | Diao et al. (2023) proposed identifying examples with [[high disagreement]] or [[entropy]] among multiple sampling trials based on [[uncertainty-based active learning]]. These examples can then be annotated and used in few-shot prompts. | ||
==Tips for Example Ordering== | ==Tips for Example Ordering== |
edits