Papers

Revision as of 11:24, 28 February 2023 by Elegant angel (talk | contribs)

Important Papers

Name Date Source Type Organization Product Note
ImageNet Classification with Deep Convolutional Neural Networks (AlexNet) 2012 AlexNet Paper Computer Vision AlexNet
Efficient Estimation of Word Representations in Vector Space (Word2Vec) 2013/01/16 arxiv:1301.3781 Natural Language Processing Word2Vec
Playing Atari with Deep Reinforcement Learning (DQN) 2013/12/19 arxiv:1312.5602 DQN (Deep Q-Learning)
Generative Adversarial Networks (GAN) 2014/06/10 arxiv:1406.2661 GAN (Generative Adversarial Network)
Very Deep Convolutional Networks for Large-Scale Image Recognition (VGGNet) 2014/09/04 arxiv:1409.1556 Computer Vision VGGNet
Sequence to Sequence Learning with Neural Networks (Seq2Seq) 2014/09/10 arxiv:1409.3215 Natural Language Processing Seq2Seq
Adam: A Method for Stochastic Optimization 2014/12/22 arxiv:1412.6980 Adam
Deep Residual Learning for Image Recognition (ResNet) 2015/12/10 arxiv:1512.03385 Computer Vision ResNet
Going Deeper with Convolutions (GoogleNet) 2015/12/10 arxiv:1409.4842 Computer Vision Google GoogleNet
Asynchronous Methods for Deep Reinforcement Learning (A3C) 2016/02/04 arxiv:1602.01783 A3C
WaveNet: A Generative Model for Raw Audio 2016/09/12 arxiv:1609.03499 Audio WaveNet
Attention Is All You Need (Transformer) 2017/06/12 arxiv:1706.03762 Natural Language Processing Google Transformer Influential paper
Proximal Policy Optimization Algorithms (PPO) 2017/07/20 arxiv:1707.06347 PPO (Proximal Policy Optimization)
Improving Language Understanding by Generative Pre-Training (GPT) 2018 paper source Natural Language Processing OpenAI GPT (Generative Pre-Training)
Deep contextualized word representations (ELMo) 2018/02/15 arxiv:1802.05365 Natural Language Processing ELMo
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding 2018/04/20 arxiv:1804.07461
website
Natural Language Processing GLUE (General Language Understanding Evaluation)
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding 2018/10/11 arxiv:1810.04805 Natural Language Processing Google BERT (Bidirectional Encoder Representations from Transformers)
Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context 2019/01/09 arxiv:1901.02860
github
Transformer-XL
Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model (MuZero) 2019/11/19 arxiv:1911.08265 MuZero
Language Models are Few-Shot Learners (GPT-3) 2020/05/28 arxiv:2005.14165 Natural Language Processing OpenAI GPT-3
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale (ViT) 2020/10/22 arxiv:2010.11929
GitHub
Computer Vision Google ViT (Vision Transformer)
Learning Transferable Visual Models From Natural Language Supervision (CLIP) 2021/02/26 arxiv:2103.00020
Blog Post
Computer Vision OpenAI CLIP (Contrastive Language-Image Pre-Training)
MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer 2021/10/05 arxiv:2110.02178
GitHub
Computer Vision Apple MobileViT
LaMDA: Language Models for Dialog Applications 2022/01/20 arxiv:2201.08239
Blog Post
Natural Language Processing Google LaMDA (Language Models for Dialog Applications)
Block-Recurrent Transformers 2022/03/11 arxiv:2203.07852
Memorizing Transformers 2022/03/16 arxiv:2203.08913
STaR: Bootstrapping Reasoning With Reasoning 2022/03/28 arxiv:2203.14465 STaR (Self-Taught Reasoner)

Other Papers

Name Date Source Type Organization Product Note
Language Is Not All You Need: Aligning Perception with Language Models (Kosmos-1 2023/02/27 arxiv:2302.14045 Natural Language Processing Microsoft Kosmos-1
LLaMA: Open and Efficient Foundation Language Models 2023/02/25 paper
blog post
github
Natural Language Processing Meta LLaMA
Structure and Content-Guided Video Synthesis with Diffusion Models (Gen-1) 2023/02/06 arxiv:2302.03011
blog post
Vidoe-to-Video Runway Gen-1
Dreamix: Video Diffusion Models are General Video Editors 2023/02/03 arxiv:2302.01329
blog post
Google Dreamix
FLAME: A small language model for spreadsheet formulas 2023/01/31 arxiv:2301.13779 Microsoft FLAME
SingSong: Generating musical accompaniments from singing 2023/01/30 arxiv:2301.12662
blog post
Audio SingSong
MusicLM: Generating Music From Text 2023/01/26 arxiv:2301.11325
blog post
Audio Google MusicLM
Mastering Diverse Domains through World Models (DreamerV3) 2023/01/10 arxiv:2301.04104v1
blogpost
DeepMind DreamerV3
Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers (VALL-E) 2023/01/05 arxiv:2301.02111
Demo
Micorsoft VALL-E
Muse: Text-To-Image Generation via Masked Generative Transformers 2023/01/02 arxiv:2301.00704
blog post
Computer Vision Google Muse
BLOOM: A 176B-Parameter Open-Access Multilingual Language Model 2022/11/09 arxiv:2211.05100
Blog Post
Natural Language Processing Hugging Face BLOOM Open source LLM that is a competitor to GPT-3
AudioLM: a Language Modeling Approach to Audio Generation 2022/09/07 arxiv:2209.03143
web page
blog post
Audio Google AudioML
Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding (Imagen) 2022/05/23 arxiv:2205.11487
Blog Post
Computer Vision Google Imagen
Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback 2022/04/12 arxiv:2204.05862
GitHub
Natural Language Processing Anthropic RLHF (Reinforcement Learning from Human Feedback)
PaLM: Scaling Language Modeling with Pathways 2022/04/05 arxiv:2204.02311
Blog Post
Natural Language Processing Google PaLM (Pathways Language Model)
Constitutional AI: Harmlessness from AI Feedback 2021/12/12 arxiv:2212.08073 Natural Language Processing Anthropic Constitutional AI, Claude
Improving language models by retrieving from trillions of tokens (RETRO) 2021/12/08 arxiv:2112.04426
Blog post
Natural Language Processing OpenAI RETRO (Retrieval Enhanced Transformer)
InstructPix2Pix: Learning to Follow Image Editing Instructions 2021/11/17 arxiv:2211.09800
Blog Post
Computer Vision UC Berkley InstructPix2Pix
REALM: Retrieval-Augmented Language Model Pre-Training 2020/02/10 arxiv:2002.08909
Blog Post
Natural Language Processing Google REALM (Retrieval-Augmented Language Model Pre-Training)
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer (T5) 2019/10/23 arxiv:1910.10683 Natural Language Processing
blog post
Google T5 (Text-To-Text Transfer Transformer)
RoBERTa: A Robustly Optimized BERT Pretraining Approach 2019/07/26 arxiv:1907.11692 Natural Language Processing
blog post
Meta RoBERTa (Robustly Optimized BERT Pretraining Approach)
Probabilistic Face Embeddings 2019/04/21 arxiv:1904.09658 Computer Vision PFEs (Probabilistic Face Embeddings)