Name
|
Submission Date
|
Source
|
Note
|
ImageNet Classification with Deep Convolutional Neural Networks (AlexNet) |
2012 |
AlexNet Paper |
|
Very Deep Convolutional Networks for Large-Scale Image Recognition (VGGNet) |
2014/09/04 |
arxiv:409.1556 |
|
Deep Residual Learning for Image Recognition (ResNet) |
2015/12/10 |
arxiv:409.1556 |
|
Going Deeper with Convolutions (GoogleNet) |
2015/12/10 |
arxiv:409.1556 |
|
Attention Is All You Need (Transformer) |
2017/06/12 |
arxiv:1706.03762 |
influential paper that introduced Transformer
|
Transformer-XL |
2019/01/09 |
arxiv:1901.02860 |
Attentive Language Models Beyond a Fixed-Length Context
|
Language Models are Few-Shot Learners |
2020/05/28 |
arxiv:2005.14165 |
GPT
|
An Image is Worth 16x16 Words |
2020/10/22 |
arxiv:2010.11929 |
Transformers for Image Recognition at Scale - Vision Transformer (ViT)
|
OpenAI CLIP |
2021/02/26 |
arxiv:2103.00020 OpenAI Blog |
Learning Transferable Visual Models From Natural Language Supervision
|
MobileViT |
2021/10/05 |
arxiv:2110.02178 |
Light-weight, General-purpose, and Mobile-friendly Vision Transformer
|
Block-Recurrent Transformers |
2022/03/11 |
arxiv:2203.07852 |
|
Memorizing Transformers |
2022/03/16 |
arxiv:2203.08913 |
|
STaR |
2022/03/28 |
arxiv:2203.14465 |
Bootstrapping Reasoning With Reasoning
|