370
edits
Line 133: | Line 133: | ||
Zhao et al. (2021) investigated few-shot classification using LLMs, specifically GPT-3. They identified several biases that contribute to high variance in performance: (1) majority label bias, (2) recency bias, and (3) common token bias. To address these biases, they proposed a method to calibrate label probabilities output by the model to be uniform when the input string is N/A. | Zhao et al. (2021) investigated few-shot classification using LLMs, specifically GPT-3. They identified several biases that contribute to high variance in performance: (1) majority label bias, (2) recency bias, and (3) common token bias. To address these biases, they proposed a method to calibrate label probabilities output by the model to be uniform when the input string is N/A. | ||
==Tips for Example Selection== | ====Tips for Example Selection==== | ||
===Semantically Similar Examples=== | =====Semantically Similar Examples===== | ||
Liu et al. (2021) suggested choosing examples that are semantically similar to the test example by employing nearest neighbor (NN) clustering in the embedding space. | Liu et al. (2021) suggested choosing examples that are semantically similar to the test example by employing nearest neighbor (NN) clustering in the embedding space. | ||
===Diverse and Representative Examples=== | =====Diverse and Representative Examples===== | ||
Su et al. (2022) proposed a graph-based approach to select a diverse and representative set of examples: (1) construct a directed graph based on the cosine similarity between samples in the embedding space (e.g., using SBERT or other embedding models), and (2) start with a set of selected samples and a set of remaining samples, scoring each sample to encourage diverse selection. | Su et al. (2022) proposed a graph-based approach to select a diverse and representative set of examples: (1) construct a directed graph based on the cosine similarity between samples in the embedding space (e.g., using SBERT or other embedding models), and (2) start with a set of selected samples and a set of remaining samples, scoring each sample to encourage diverse selection. | ||
===Embeddings via Contrastive Learning=== | =====Embeddings via Contrastive Learning===== | ||
Rubin et al. (2022) suggested training embeddings through contrastive learning specific to one training dataset for in-context learning sample selection. This approach measures the quality of an example based on a conditioned probability assigned by the language model. | Rubin et al. (2022) suggested training embeddings through contrastive learning specific to one training dataset for in-context learning sample selection. This approach measures the quality of an example based on a conditioned probability assigned by the language model. | ||
===Q-Learning=== | =====Q-Learning===== | ||
Zhang et al. (2022) explored using Q-Learning for sample selection in LLM training. | Zhang et al. (2022) explored using Q-Learning for sample selection in LLM training. | ||
===Uncertainty-Based Active Learning=== | =====Uncertainty-Based Active Learning===== | ||
Diao et al. (2023) proposed identifying examples with high disagreement or entropy among multiple sampling trials based on uncertainty-based active learning. These examples can then be annotated and used in few-shot prompts. | Diao et al. (2023) proposed identifying examples with high disagreement or entropy among multiple sampling trials based on uncertainty-based active learning. These examples can then be annotated and used in few-shot prompts. | ||
edits