Zero shot, one shot and few shot learning

See also: Machine learning terms

Zero-shot, one-shot, and few-shot learning are three closely related settings in machine learning and prompt engineering that describe how many labelled examples a model sees of a target task or class before it has to make a prediction. Zero-shot uses no task-specific examples, one-shot uses exactly one, and few-shot uses a small handful (commonly anywhere from two to a few dozen). All three are responses to the same practical problem: building useful systems when labeled data for the exact thing you care about is scarce, expensive, or simply nonexistent.

The terms have two distinct lineages that often get conflated, and it is worth pulling them apart before going any further. The older lineage is classical zero-shot and few-shot learning, a body of work in computer vision and NLP that started in 2008 with Larochelle, Erhan, and Bengio's "Zero-data Learning of New Tasks." Here, the model is trained with auxiliary information (attributes, class descriptions, semantic embeddings) so that at test time it can generalize to classes that were never in the training set. The newer lineage is zero-shot, one-shot, and few-shot prompting, a vocabulary popularized by Brown et al. in the 2020 GPT-3 paper "Language Models are Few-Shot Learners." Here, a pretrained large language model is conditioned on zero, one, or several input-output examples placed inside the prompt, with no parameter updates at all. Both lineages share a goal (rapid generalization with little data) but the mechanisms are very different.

Definitions

The table below summarizes how the three settings differ, using the GPT-3 framing that has become standard in the LLM era.

Setting	Examples in prompt or support set	Typical use	Example
Zero-shot	0	The model is given only a task description or instruction. It must rely entirely on knowledge acquired during pretraining or auxiliary class information.	"Translate to French: cheese ->"
One-shot	1	A single demonstration is provided. Often used when a task is hard to describe in words but easy to show.	"Translate to French: sea otter -> loutre de mer. cheese ->"
Few-shot	A small handful, commonly 2 to 32 (Brown et al. allowed up to 100)	Multiple demonstrations let the model infer format, label space, and edge cases.	A list of 5 to 20 worked English-to-French pairs followed by a new English word.

In classical few-shot learning the analogous structure is the N-way K-shot task: at test time the model sees a small support set of N novel classes with K labelled examples each, then must classify a query set drawn from the same N classes. A 5-way 1-shot Omniglot task, for example, gives the model one example each of five new handwritten characters and asks it to identify another image of one of them.

It is worth noting that the cutoffs are conventional, not principled. Brown et al. drew the line at "as many demonstrations as fit in the context window," while the Omniglot tradition tends to use small fixed K values such as 1 and 5. Different papers report numbers differently, and "few-shot" in a 2024 LLM paper often means three to ten examples while "few-shot" in a 2018 vision paper might mean exactly one or five.

Two distinct lineages

Classical zero-shot learning (2008 onward)

The classical zero-shot setting was first formalized by Hugo Larochelle, Dumitru Erhan, and Yoshua Bengio in 2008 under the name "zero-data learning," applied to character recognition and a multi-task drug discovery problem. A year later, Mark Palatucci and colleagues at Carnegie Mellon coined the now-standard term "zero-shot learning" in a NIPS 2009 paper that decoded fMRI brain activity to predict which word a person was thinking of, even for words not in the training set. The trick was to map both the input (brain activity) and the output (a word) into a shared semantic space defined by hand-crafted features.

The attribute-based variant was popularized in computer vision the same year by Christoph Lampert, Hannes Nickisch, and Stefan Harmeling, whose CVPR 2009 paper "Learning to Detect Unseen Object Classes by Between-Class Attribute Transfer" introduced the Animals with Attributes dataset. The idea is that animals can be described by a fixed vocabulary of attributes ("has stripes," "four legs," "black and white," "lives in Africa") and a model that predicts attributes well can identify a zebra at test time even if it has never seen one labelled, as long as it knows the zebra's attribute profile. The paper introduced two methods, Direct Attribute Prediction (DAP) and Indirect Attribute Prediction (IAP), achieving 40.5% and 27.8% accuracy respectively on Animals with Attributes.

With the rise of word embedding methods in 2013, embedding-based zero-shot approaches took over. Andrea Frome and colleagues at Google introduced DeViSE at NIPS 2013, which trained a CNN to project images into a word embedding space (specifically a Word2Vec space) so that an unseen class label like "okapi" could still be located in the same vector space as the image. Richard Socher, Milind Ganjoo, Christopher Manning, and Andrew Ng's "Zero-Shot Learning Through Cross-Modal Transfer" (NIPS 2013) made a similar move using a different embedding architecture. Mohammad Norouzi and colleagues' ConSE method (ICLR 2014) showed that you could simply average word embeddings of the top-K predictions of an off-the-shelf classifier to get strong zero-shot results without any joint training.

A practical complication that this body of work surfaced is the gap between conventional zero-shot evaluation (where test classes are disjoint from train classes) and generalized zero-shot learning, where at test time you might see either a familiar class or a brand-new one and the model must handle both. Generalized zero-shot is much harder because models tend to be biased toward familiar classes. Yongqin Xian, Christoph Lampert, Bernt Schiele, and Zeynep Akata's 2018 "Zero-Shot Learning: A Comprehensive Evaluation of the Good, the Bad and the Ugly" exposed how much published numbers depended on quirks in evaluation protocols and proposed a unified benchmark that is still cited today.

Few-shot learning and meta-learning (2015 onward)

Few-shot learning as a research area really took off after Brenden Lake, Ruslan Salakhutdinov, and Joshua Tenenbaum's 2015 Science paper "Human-level concept learning through probabilistic program induction." The paper introduced the Omniglot dataset of 1,623 handwritten characters from 50 alphabets and showed that a Bayesian program learning model could match human performance on one-shot classification. Omniglot quickly became the "MNIST of few-shot learning" and is still the standard sanity check.

The deep learning community responded with three families of approaches that are now standard reference points.

Family	Core idea	Representative papers
Metric-based	Learn an embedding space where same-class examples are close; classify by nearest neighbour or class prototype	Siamese Networks (Koch, Zemel, and Salakhutdinov, 2015); Matching Networks (Vinyals et al., 2016); Prototypical Networks (Snell, Swersky, and Zemel, 2017); Relation Networks (Sung et al., 2018)
Optimization-based	Learn an initialization or update rule that adapts quickly to new tasks with few gradient steps	MAML (Finn, Abbeel, and Levine, 2017); Reptile and FOMAML (Nichol, Achiam, and Schulman, 2018); Meta-SGD (Li et al., 2017)
Model-based	Use external memory or specialized architectures so the model can store and retrieve task information at inference time	Memory-Augmented Neural Networks (Santoro et al., 2016); SNAIL (Mishra et al., 2018)

Vinyals et al.'s Matching Networks paper deserves a special mention because it introduced the episodic training paradigm now standard in the field. Instead of training on a fixed set of classes, the model is trained on a stream of small N-way K-shot tasks sampled from a large pool of classes, so that learning to handle new tasks itself becomes the training objective. Matching Networks also introduced mini-ImageNet, the most heavily used few-shot benchmark, derived from ImageNet.

Chelsea Finn, Pieter Abbeel, and Sergey Levine's 2017 ICML paper on MAML is probably the most influential optimization-based result. MAML finds an initialization such that one or a few gradient descent steps on the support set produce good predictions on the query set. Crucially, MAML is model-agnostic: the same idea works for convolutional neural network image classifiers, recurrent neural network sequence models, and reinforcement learning policies. Snell, Swersky, and Zemel's Prototypical Networks (NeurIPS 2017) showed that something much simpler often works comparably well: compute a class prototype as the mean of the embedded support examples, classify a query by Euclidean distance plus a softmax, and stop there.

A recurring uncomfortable finding is that strong transfer learning baselines (pretrain a backbone on the base classes, then fit a linear classifier on top of the frozen features for the novel classes) are often competitive with sophisticated meta-learning methods, especially as backbones get larger. Wei-Yu Chen et al.'s ICLR 2019 paper "A Closer Look at Few-Shot Classification" was an early statement of this.

In-context learning in large language models (2020 onward)

The phrase "few-shot learning" took on a second meaning when Tom Brown and colleagues at OpenAI published "Language Models are Few-Shot Learners" in 2020, the paper that introduced GPT-3 and its 175 billion parameters. Brown et al. evaluated GPT-3 in three modes that map directly onto the terminology you find in modern prompt engineering tutorials.

Brown et al. setting	Demonstrations	Description
Zero-shot	0	Only a natural-language task description is given (for example: "Translate English to French:").
One-shot	1	Task description plus a single English-French pair, then the new English word.
Few-shot	Up to ~100	Task description plus as many demonstrations as fit in the 2,048-token context window.

The striking result was not just that GPT-3 could do this at all but that performance on many tasks scaled smoothly with both model size and the number of in-context examples, often approaching the accuracy of dedicated fine-tuned models without a single gradient update. This phenomenon is now called in-context learning (ICL), and Brown et al.'s vocabulary of zero-shot, one-shot, and few-shot prompting has become the lingua franca of prompt engineering.

In-context learning is fundamentally different from classical few-shot learning in three ways. First, no weights are updated; the "learning" happens entirely in the forward pass. Second, the demonstrations consume context window space, so there is a hard cap on how many you can use, especially for longer tasks. Third, what is actually being exploited is the model's pretraining: the same model with random weights would learn nothing from a few examples in a prompt. Stanford researchers have argued that ICL is an emergent ability, only reliable above a certain scale, although whether emergence is genuine or an artifact of how it is measured is contested.

A related paradigm is zero-shot chain-of-thought. Jason Wei et al.'s 2022 paper "Chain-of-Thought Prompting Elicits Reasoning in Large Language Models" showed that giving an LLM a few demonstrations that include intermediate reasoning steps dramatically boosts performance on multi-step problems; with eight chain-of-thought exemplars, a 540B-parameter PaLM hit then-state-of-the-art on the GSM8K math word problem benchmark. Takeshi Kojima et al.'s NeurIPS 2022 paper "Large Language Models are Zero-Shot Reasoners" went further and showed that simply appending the phrase "Let's think step by step" to a question, with no demonstrations at all, lifted accuracy on MultiArith from 17.7% to 78.7% with text-davinci-002. Zero-shot CoT is now a routine prompt-engineering trick.

Examples

A few worked examples make the differences concrete.

Example 1: zero-shot image recognition with attributes

A classifier is trained on hundreds of animal species labelled with a fixed attribute vocabulary ("has stripes," "four legs," "hooved," "lives in Africa"). It learns to predict attributes from images. At test time the system is shown a picture of a zebra, an animal it has never seen labelled. Even so, the model can predict that the image has stripes, four legs, hooves, and an African habitat. Cross-referencing this attribute profile against a database of unseen species (which includes "zebra") returns the correct class. This is the recipe behind Lampert et al.'s 2009 work and the Animals with Attributes dataset.

Example 2: zero-shot image classification with CLIP

Alec Radford et al.'s 2021 OpenAI paper "Learning Transferable Visual Models From Natural Language Supervision" trained CLIP on roughly 400 million (image, caption) pairs scraped from the web. To classify a new image into a set of categories, you simply embed the image and embed text prompts like "a photo of a [category]," then pick the category whose text embedding has the highest cosine similarity. CLIP achieved 76.2% top-1 accuracy on ImageNet without ever training on the ImageNet labels, matching the original ResNet-50 and shrinking the robustness gap to natural distribution shifts by up to 75%.

Example 3: one-shot learning of a handwritten character

Imagine you want a system that can recognize a specific Tibetan character it has never seen before. You show it one example of the character, then ask it to identify which of several test images contains the same character. Classical machine learning would need many copies of the character to learn the relevant features. A Siamese network trained on Omniglot, by contrast, learns to compare pairs of character images and can score the test images by similarity to the single example, often matching or beating untrained humans. Lake, Salakhutdinov, and Tenenbaum's 2015 Bayesian program learning model achieved roughly 95% on the 20-way 1-shot Omniglot task, comparable to human performance.

Example 4: few-shot learning of a new fruit category

A fine-grained image classifier has been trained on dozens of fruit species. Given five photos of a new fruit (say, the rambutan) it has never seen, a Prototypical Network embeds the five images, averages their embeddings into a class prototype, and classifies new fruit images by Euclidean distance to that prototype and the prototypes of the other classes. This is a 5-shot learning task, and it works because the embedding space was trained to put similar fruits close together regardless of which classes were used during training.

Example 5: few-shot in-context prompting an LLM

A prompt to GPT-3 or GPT-4 might look like this:

Classify the sentiment of each review as positive or negative.

Review: "The food was cold and the service was terrible."
Sentiment: negative

Review: "Loved every bite. Coming back next week."
Sentiment: positive

Review: "Service was fine but the menu felt overpriced."
Sentiment:

Given this three-shot prompt, the model is expected to output "negative" (or possibly "mixed," depending on how strict you want it to be). No weights are updated; the model is just continuing the pattern.

Benchmarks and datasets

Different communities have settled on different evaluation suites.

Benchmark	Type	Classes	Year	Source
Omniglot	Few-shot character recognition	1,623	2015	Lake, Salakhutdinov, and Tenenbaum
mini-ImageNet	Few-shot image classification	100 (64/16/20)	2016	Vinyals et al.
tieredImageNet	Few-shot image classification (harder splits)	608	2018	Ren et al.
Animals with Attributes 2	Zero-shot image classification	50	2018	Xian et al.
CUB-200-2011	Fine-grained zero/few-shot bird classification	200	2011	Wah et al.
Meta-Dataset	Cross-domain few-shot	10 datasets	2020	Triantafillou et al.
SuperGLUE / BIG-bench / MMLU	Zero-shot and few-shot language tasks	Varies	2019 to 2022	Multiple

The gap between vision and language benchmarks reflects the gap between the two lineages. Vision benchmarks fix the N-way K-shot protocol and measure accuracy in episodes. Language benchmarks measure zero-shot and few-shot prompted accuracy on a set of NLP tasks; the demonstrations and the test query both live inside a single text prompt.

Practical guidance for prompt engineering

When working with LLMs, the choice between zero-shot, one-shot, and few-shot is mostly a question of how much you trust the model to infer the task from instructions versus how much you can constrain it by example.

Use zero-shot when the task is easy to describe in words and the desired output format is unambiguous (sentiment classification on a binary label, simple translation, fact lookups). Add "Let's think step by step" for arithmetic or multi-step reasoning per Kojima et al.
Use one-shot when a single example clarifies an awkward output format that words alone cannot pin down (a specific JSON shape, a peculiar list style, a tone you want matched).
Use few-shot when the task has subtle edge cases that you want to demonstrate (mixed-sentiment reviews, distinctions between similar categories), or when zero-shot performance is unstable.
Consider fine-tuning instead of few-shot prompting when you have hundreds of labelled examples, when latency or cost matters, or when the model keeps drifting away from the format despite good demonstrations.

Demonstration selection matters more than novices expect. Prefer examples that are diverse and representative rather than near-duplicates of one another. Avoid demonstrations whose answers come from the same class in a row; that triggers majority-label bias (see below).

Limitations and known failure modes

Few-shot prompting looks magical until you measure how brittle it is. The literature has documented several biases that consistently hurt accuracy.

Failure mode	What happens	Source
Demonstration ordering	Performance can swing from near-state-of-the-art to near-random based on the order of the same examples; a good order for one model often does not transfer to another	Lu et al., "Fantastically Ordered Prompts and Where to Find Them," ACL 2022
Majority-label bias	If most demonstrations belong to one class, the model is biased toward predicting that class	Zhao et al., "Calibrate Before Use," ICML 2021
Recency bias	The model is biased toward classes that appear near the end of the prompt	Zhao et al., 2021
Common token bias	The model prefers answer tokens that are frequent in pretraining data	Zhao et al., 2021
Label correctness matters less than format	Randomly relabeling demonstrations barely hurts accuracy in many settings; the model is mostly using demonstrations to infer label space, input distribution, and output format rather than to learn from the (input, label) mapping	Min et al., "Rethinking the Role of Demonstrations," EMNLP 2022
Generalized zero-shot bias	When test inputs include both seen and unseen classes, models heavily favour seen classes	Xian et al., 2018
Domain shift	Few-shot accuracy collapses when novel classes come from a visibly different domain than base classes (for example, training on natural images and testing on satellite imagery)	Triantafillou et al., "Meta-Dataset," ICLR 2020

Zhao et al.'s contextual calibration trick (ask the model what it would predict for a content-free input like "N/A," then subtract that bias) recovered up to 30 percentage points of absolute accuracy on some tasks, which is a strong indicator of how unstable raw few-shot prompting can be.

One consequence of these failure modes is that small benchmarks of zero-shot or few-shot performance can be misleading. Reported numbers depend on the prompt template, the exact demonstrations chosen, the order, the temperature, and the model version. When comparing methods, it is good practice to report mean and standard deviation across many random seeds and demonstration orderings.

Relationship to other techniques

Zero-shot, one-shot, and few-shot learning sit at the intersection of several broader ideas in machine learning. The table below sketches how they relate.

Related technique	Relationship
Transfer learning	The umbrella concept: reuse knowledge from one task or distribution on another. Zero/few-shot learning is transfer with extreme data scarcity at the target.
Meta-learning	"Learning to learn." Trains a model across many small tasks so it adapts quickly to a new one. Most pre-2020 few-shot work is meta-learning.
In-context learning	The mechanism by which an LLM does few-shot prompting at inference time. No gradient updates.
Fine-tuning	Updates model parameters on labelled data. Zero/few-shot prompting is the no-update alternative.
Instruction tuning	Fine-tuning a base LLM on instruction-following data so that zero-shot prompting works better.
Chain-of-thought prompting	Augments zero/few-shot prompting with intermediate reasoning steps, often boosting accuracy on multi-step problems.
Self-supervised pretraining	Provides the broad foundation of features and knowledge that makes zero/few-shot generalization possible.
Data augmentation	Synthetically expands a few labelled examples; complementary to few-shot learning rather than a substitute.

Beyond classification: zero-shot in other modalities

The zero/few-shot vocabulary now extends well outside its original homes of image classification and text NLP.

Segmentation. Meta AI's Segment Anything Model (SAM), introduced by Alexander Kirillov and colleagues at ICCV 2023, was trained on over one billion masks across eleven million images and segments arbitrary objects in new images zero-shot, prompted by points, boxes, or text. It is often competitive with fully supervised baselines on tasks it was never explicitly trained for.
Speech. Self-supervised speech models can do zero-shot keyword spotting in new languages, and few-shot cloning of a target speaker from a few seconds of audio is now standard in commercial TTS systems.
Robotics. Robot policies trained on diverse demonstrations sometimes generalize zero-shot or few-shot to new manipulation tasks, often by being prompted with a goal image or natural-language description.
Code. LLMs perform few-shot code generation by being shown a couple of input-output examples or a docstring, and zero-shot generation by being given only a function signature plus a comment.
Tabular data. Models like TabPFN (Hollmann et al., 2023) perform in-context tabular classification: the entire training table is fed into the prompt and the model predicts on new rows in a single forward pass.

Explain like I'm 5 (ELI5)

Imagine you have never seen a duckbilled platypus before. If a friend says, "It's a small swimming animal with fur, a beak like a duck, and a flat tail," and you spot one in the wild, you can probably point and say "that's a platypus." That is zero-shot learning: you used a description, not examples. If your friend instead shows you one photo of a platypus, that is one-shot learning. If they show you five photos taken from different angles, that is few-shot learning. Computers do something similar: zero-shot models use general descriptions or knowledge, one-shot models use a single example, and few-shot models use a handful. The whole point is to learn quickly when you do not have thousands of training pictures lying around.

References

Larochelle, H., Erhan, D., and Bengio, Y. (2008). "Zero-data Learning of New Tasks." *Proceedings of the 23rd AAAI Conference on Artificial Intelligence*. https://aaai.org/papers/00646-aaai08-103-zero-data-learning-of-new-tasks/
Palatucci, M., Pomerleau, D., Hinton, G. E., and Mitchell, T. M. (2009). "Zero-shot Learning with Semantic Output Codes." *Advances in Neural Information Processing Systems (NIPS) 22*. https://papers.nips.cc/paper/3650-zero-shot-learning-with-semantic-output-codes
Lampert, C. H., Nickisch, H., and Harmeling, S. (2009). "Learning to Detect Unseen Object Classes by Between-Class Attribute Transfer." *IEEE Conference on Computer Vision and Pattern Recognition (CVPR)*, 951-958. https://ieeexplore.ieee.org/document/5206594
Frome, A., Corrado, G. S., Shlens, J., Bengio, S., Dean, J., Ranzato, M., and Mikolov, T. (2013). "DeViSE: A Deep Visual-Semantic Embedding Model." *Advances in Neural Information Processing Systems (NIPS) 26*. https://papers.nips.cc/paper/5204-devise-a-deep-visual-semantic-embedding-model
Socher, R., Ganjoo, M., Manning, C. D., and Ng, A. Y. (2013). "Zero-Shot Learning Through Cross-Modal Transfer." *Advances in Neural Information Processing Systems (NIPS) 26*. https://papers.nips.cc/paper/5027-zero-shot-learning-through-cross-modal-transfer
Norouzi, M., Mikolov, T., Bengio, S., Singer, Y., Shlens, J., Frome, A., Corrado, G., and Dean, J. (2014). "Zero-Shot Learning by Convex Combination of Semantic Embeddings." *International Conference on Learning Representations (ICLR)*. https://arxiv.org/abs/1312.5650
Lake, B. M., Salakhutdinov, R., and Tenenbaum, J. B. (2015). "Human-level concept learning through probabilistic program induction." *Science*, 350(6266), 1332-1338. https://www.cs.cmu.edu/~rsalakhu/papers/LakeEtAl2015Science.pdf
Koch, G., Zemel, R., and Salakhutdinov, R. (2015). "Siamese Neural Networks for One-shot Image Recognition." *ICML Deep Learning Workshop*.
Vinyals, O., Blundell, C., Lillicrap, T., Kavukcuoglu, K., and Wierstra, D. (2016). "Matching Networks for One Shot Learning." *Advances in Neural Information Processing Systems (NeurIPS) 29*. https://arxiv.org/abs/1606.04080
Santoro, A., Bartunov, S., Botvinick, M., Wierstra, D., and Lillicrap, T. (2016). "Meta-Learning with Memory-Augmented Neural Networks." *International Conference on Machine Learning (ICML)*.
Finn, C., Abbeel, P., and Levine, S. (2017). "Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks." *International Conference on Machine Learning (ICML)*. https://arxiv.org/abs/1703.03400
Snell, J., Swersky, K., and Zemel, R. (2017). "Prototypical Networks for Few-shot Learning." *Advances in Neural Information Processing Systems (NeurIPS) 30*. https://arxiv.org/abs/1703.05175
Ravi, S. and Larochelle, H. (2017). "Optimization as a Model for Few-Shot Learning." *International Conference on Learning Representations (ICLR)*.
Sung, F., Yang, Y., Zhang, L., Xiang, T., Torr, P., and Hospedales, T. (2018). "Learning to Compare: Relation Network for Few-Shot Learning." *IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)*.
Nichol, A., Achiam, J., and Schulman, J. (2018). "On First-Order Meta-Learning Algorithms." *arXiv preprint* arXiv:1803.02999.
Mishra, N., Rohaninejad, M., Chen, X., and Abbeel, P. (2018). "A Simple Neural Attentive Meta-Learner." *International Conference on Learning Representations (ICLR)*.
Xian, Y., Lampert, C. H., Schiele, B., and Akata, Z. (2018). "Zero-Shot Learning: A Comprehensive Evaluation of the Good, the Bad and the Ugly." *IEEE Transactions on Pattern Analysis and Machine Intelligence*, 41(9), 2251-2265. https://arxiv.org/abs/1707.00600
Ren, M., Triantafillou, E., Ravi, S., Snell, J., Swersky, K., Tenenbaum, J., Larochelle, H., and Zemel, R. (2018). "Meta-Learning for Semi-Supervised Few-Shot Classification." *International Conference on Learning Representations (ICLR)*.
Chen, W.-Y., Liu, Y.-C., Kira, Z., Wang, Y.-C. F., and Huang, J.-B. (2019). "A Closer Look at Few-Shot Classification." *International Conference on Learning Representations (ICLR)*.
Triantafillou, E. et al. (2020). "Meta-Dataset: A Dataset of Datasets for Learning to Learn from Few Examples." *International Conference on Learning Representations (ICLR)*.
Brown, T. B. et al. (2020). "Language Models are Few-Shot Learners." *Advances in Neural Information Processing Systems (NeurIPS) 33*. https://arxiv.org/abs/2005.14165
Zhao, T. Z., Wallace, E., Feng, S., Klein, D., and Singh, S. (2021). "Calibrate Before Use: Improving Few-Shot Performance of Language Models." *International Conference on Machine Learning (ICML)*. https://arxiv.org/abs/2102.09690
Radford, A. et al. (2021). "Learning Transferable Visual Models From Natural Language Supervision." *International Conference on Machine Learning (ICML)*. https://arxiv.org/abs/2103.00020
Lu, Y., Bartolo, M., Moore, A., Riedel, S., and Stenetorp, P. (2022). "Fantastically Ordered Prompts and Where to Find Them: Overcoming Few-Shot Prompt Order Sensitivity." *Annual Meeting of the Association for Computational Linguistics (ACL)*. https://arxiv.org/abs/2104.08786
Wei, J., Wang, X., Schuurmans, D., Bosma, M., Chi, E. H., Le, Q., and Zhou, D. (2022). "Chain-of-Thought Prompting Elicits Reasoning in Large Language Models." *Advances in Neural Information Processing Systems (NeurIPS) 35*. https://arxiv.org/abs/2201.11903
Min, S., Lyu, X., Holtzman, A., Artetxe, M., Lewis, M., Hajishirzi, H., and Zettlemoyer, L. (2022). "Rethinking the Role of Demonstrations: What Makes In-Context Learning Work?" *Conference on Empirical Methods in Natural Language Processing (EMNLP)*. https://arxiv.org/abs/2202.12837
Kojima, T., Gu, S. S., Reid, M., Matsuo, Y., and Iwasawa, Y. (2022). "Large Language Models are Zero-Shot Reasoners." *Advances in Neural Information Processing Systems (NeurIPS) 35*. https://arxiv.org/abs/2205.11916
Kirillov, A. et al. (2023). "Segment Anything." *International Conference on Computer Vision (ICCV)*. https://arxiv.org/abs/2304.02643
Dong, Q. et al. (2024). "A Survey on In-context Learning." *Conference on Empirical Methods in Natural Language Processing (EMNLP)*. https://arxiv.org/abs/2301.00234

Definitions

Two distinct lineages

Classical zero-shot learning (2008 onward)

Few-shot learning and meta-learning (2015 onward)

In-context learning in large language models (2020 onward)

Examples

Example 1: zero-shot image recognition with attributes

Example 2: zero-shot image classification with CLIP

Example 3: one-shot learning of a handwritten character

Example 4: few-shot learning of a new fruit category

Example 5: few-shot in-context prompting an LLM

Benchmarks and datasets

Practical guidance for prompt engineering

Limitations and known failure modes

Relationship to other techniques

Beyond classification: zero-shot in other modalities

Explain like I'm 5 (ELI5)

References

Improve this article

Related Articles

Agentic Context Engineering

ARC-AGI 2

Meta Prompting

Context engineering

Tree of Thoughts

Prompt

Definitions

Two distinct lineages

Classical zero-shot learning (2008 onward)

Few-shot learning and meta-learning (2015 onward)

In-context learning in large language models (2020 onward)

Examples

Example 1: zero-shot image recognition with attributes

Example 2: zero-shot image classification with CLIP

Example 3: one-shot learning of a handwritten character

Example 4: few-shot learning of a new fruit category

Example 5: few-shot in-context prompting an LLM

Benchmarks and datasets

Practical guidance for prompt engineering

Limitations and known failure modes

Relationship to other techniques

Beyond classification: zero-shot in other modalities

Explain like I'm 5 (ELI5)

References

Related Articles

Agentic Context Engineering

ARC-AGI 2

Meta Prompting

Context engineering

Tree of Thoughts

Prompt