Papers

Attention Is All You Need	arxiv:1706.03762	influential paper that introduced Transformer
An Image is Worth 16x16 Words	arxiv:2010.11929	Transformers for Image Recognition at Scale - Vision Transformer (ViT)
Block-Recurrent Transformers	https://arxiv.org/abs/2203.07852
Language Models are Few-Shot Learners	https://arxiv.org/abs/2005.14165	GPT
Memorizing Transformers	https://arxiv.org/abs/2203.08913
MobileViT	https://arxiv.org/abs/2110.02178	Light-weight, General-purpose, and Mobile-friendly Vision Transformer
OpenAI CLIP	https://arxiv.org/abs/2103.00020, https://openai.com/blog/clip/	Learning Transferable Visual Models From Natural Language Supervision
STaR	https://arxiv.org/abs/2203.14465	Bootstrapping Reasoning With Reasoning
Transformer-XL	https://arxiv.org/abs/1901.02860	Attentive Language Models Beyond a Fixed-Length Context