Papers

From AI Wiki
(Redirected from Paper)

Important Papers

Name Date Source Type Organization Product Note
ImageNet Classification with Deep Convolutional Neural Networks (AlexNet) 2012 AlexNet Paper Computer Vision AlexNet
Efficient Estimation of Word Representations in Vector Space (Word2Vec) 2013/01/16 arxiv:1301.3781 Natural Language Processing Word2Vec
Playing Atari with Deep Reinforcement Learning (DQN) 2013/12/19 arxiv:1312.5602 DQN (Deep Q-Learning)
Generative Adversarial Networks (GAN) 2014/06/10 arxiv:1406.2661 GAN (Generative Adversarial Network)
Very Deep Convolutional Networks for Large-Scale Image Recognition (VGGNet) 2014/09/04 arxiv:1409.1556 Computer Vision VGGNet
Sequence to Sequence Learning with Neural Networks (Seq2Seq) 2014/09/10 arxiv:1409.3215 Natural Language Processing Seq2Seq
Adam: A Method for Stochastic Optimization 2014/12/22 arxiv:1412.6980 Adam
Deep Residual Learning for Image Recognition (ResNet) 2015/12/10 arxiv:1512.03385 Computer Vision ResNet
Going Deeper with Convolutions (GoogleNet) 2015/12/10 arxiv:1409.4842 Computer Vision Google GoogleNet
Asynchronous Methods for Deep Reinforcement Learning (A3C) 2016/02/04 arxiv:1602.01783 A3C
WaveNet: A Generative Model for Raw Audio 2016/09/12 arxiv:1609.03499 Audio WaveNet
Attention Is All You Need (Transformer) 2017/06/12 arxiv:1706.03762 Natural Language Processing Google Transformer Influential paper
Proximal Policy Optimization Algorithms (PPO) 2017/07/20 arxiv:1707.06347 PPO (Proximal Policy Optimization)
Improving Language Understanding by Generative Pre-Training (GPT) 2018 paper source Natural Language Processing OpenAI GPT (Generative Pre-Training Transformer)
Deep contextualized word representations (ELMo) 2018/02/15 arxiv:1802.05365 Natural Language Processing ELMo
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding 2018/04/20 arxiv:1804.07461
website
Natural Language Processing GLUE (General Language Understanding Evaluation)
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding 2018/10/11 arxiv:1810.04805 Natural Language Processing Google BERT (Bidirectional Encoder Representations from Transformers)
Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context 2019/01/09 arxiv:1901.02860
github
Transformer-XL
Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model (MuZero) 2019/11/19 arxiv:1911.08265 MuZero
Language Models are Few-Shot Learners (GPT-3) 2020/05/28 arxiv:2005.14165 Natural Language Processing OpenAI GPT-3
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale (ViT) 2020/10/22 arxiv:2010.11929
GitHub
Computer Vision Google ViT (Vision Transformer)
Learning Transferable Visual Models From Natural Language Supervision (CLIP) 2021/02/26 arxiv:2103.00020
Blog Post
Computer Vision OpenAI CLIP (Contrastive Language-Image Pre-Training)
MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer 2021/10/05 arxiv:2110.02178
GitHub
Computer Vision Apple MobileViT
LaMDA: Language Models for Dialog Applications 2022/01/20 arxiv:2201.08239
Blog Post
Natural Language Processing Google LaMDA (Language Models for Dialog Applications)
Block-Recurrent Transformers 2022/03/11 arxiv:2203.07852
Memorizing Transformers 2022/03/16 arxiv:2203.08913
STaR: Bootstrapping Reasoning With Reasoning 2022/03/28 arxiv:2203.14465 STaR (Self-Taught Reasoner)
GPT-4 Technical Report 2023/03/15 arxiv:2303.08774
blog post
system card
Natural Language Processing
Multimodal
OpenAI GPT-4

Other Papers

Name Date Source Type Organization Product Note
Self-Rewarding Language Models 2024/01/18 arxiv:2401.10020 Natural Language Processing Meta
LLM in a flash: Efficient Large Language Model Inference with Limited Memory 2023/12/12 arxiv:2312.11514
HuggingFace
Natural Language Processing Apple
Animate Anyone: Consistent and Controllable Image-to-Video Synthesis for Character Animation 2023/12/07 arxiv:2311.17117
Website
Video
GitHub
Tutorial
Computer Vision Alibaba Animate Anyone
MatterGen: a generative model for inorganic materials design 2023/12/06 arxiv:2312.03687
Tweet
Materials Science Microsoft MatterGen
Audiobox: Generating audio from voice and natural language prompts 2023/11/30 Paper
Website
Audio Meta Audiobox
Unnatural Error Correction: GPT-4 Can Almost Perfectly Handle Unnatural Scrambled Text 2023/11/30 arxiv:2311.18805 Natural Language Processing University of Tokyo Scrambled Bench
MeshGPT: Generating Triangle Meshes with Decoder-Only Transformers 2023/11/27 arxiv:2311.15475
Website
Computer Vision MeshGPT
Ferret: Refer and Ground Anything Anywhere at Any Granularity 2023/10/11 arxiv:2310.07704
GitHub
Multimodal
Natural Language Processing
Apple Ferret
SeamlessM4T - Massively Multilingual & Multimodal Machine Translation 2023/08/23 Paper
Website
Blogpost
Demo
GitHub
Natural Language Processing Meta SeamlessM4T
RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control 2023/08/01 arxiv:2307.15818
Website
Blogpost
Robotics Google RT-2
Towards Generalist Biomedical AI 2023/07/26 arxiv:2307.14334 Natural Language Processing Google Med-PaLM
Large Language Models Understand and Can be Enhanced by Emotional Stimuli 2023/07/14 arxiv:2307.11760 Natural Language Processing EmotionPrompt
MusicGen: Simple and Controllable Music Generation 2023/06/08 arxiv:2306.05284
GitHub
Example
Audio Meta MusicGen
CodeTF: One-stop Transformer Library for State-of-the-art Code LLM 2023/05/31 arxiv:2306.00029
GitHub
Natural Language Processing Salesforce CodeTF
Bytes Are All You Need: Transformers Operating Directly On File Bytes 2023/05/31 arxiv:2306.00238 Computer Vision Apple
Scaling Speech Technology to 1,000+ Languages 2023/05/22 Paper
<Blogpost
GitHub
Languages covered
Natural Language Processing Meta Massively Multilingual Speech (MMS)
RWKV: Reinventing RNNs for the Transformer Era 2023/05/22 arxiv:2305.13048 Natural Language Processing Receptance Weighted Key Value (RWKV)
ImageBind: One Embedding Space To Bind Them All 2023/05/09 arxiv:2305.05665
Website
Demo
Blog
GitHub
Multimodal
Computer Vision
Natural Language Processing
Meta ImageBind
Real-Time Neural Appearance Models 2023/05/05 Paper
Blog
NVIDIA
Poisoning Language Models During Instruction Tuning 2023/05/01 arxiv:2305.00944 Natural Language Processing
Generative Agents: Interactive Simulacra of Human Behavior 2023/04/07 arxiv:2304.03442 Human-AI Interaction
Natural Language Processing
Stanford Generative agents
Segment Anything 2023/04/05 Paper
Website
Blog
GitHub
Computer Vision Meta Segment Anything Model (SAM)
HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in HuggingFace (Microsoft JARVIS) 2023/03/30 arxiv:2303.17580
HuggingFace Space
JARVIS GitHub
Natural Language Processing
Multimodal
Microsoft
Hugging Face
HuggingGPT
JARVIS
BloombergGPT: A Large Language Model for Finance 2023/03/30 arxiv:2303.17564
press release
twitter thread
Natural Language Processing Bloomberg BloombergGPT
Sparks of Artificial General Intelligence: Early experiments with GPT-4 2023/03/22 arxiv:2303.12712 Natural Language Processing
Multimodal
Microsoft
Reflexion: an autonomous agent with dynamic memory and self-reflection 2023/03/20 arxiv:2303.11366
blog post
GitHub
GitHub 2
Natural Language Processing Reflexion
PaLM-E: An Embodied Multimodal Language Model 2023/03/06 arxiv:2303.03378
blog
Natural Language Processing
Multimodal
Google PaLM-E
Google USM: Scaling Automatic Speech Recognition Beyond 100 Languages 2023/03/02 arxiv:2303.01037
blog
Natural Language Processing Google Universal Speech Model (USM)
Language Is Not All You Need: Aligning Perception with Language Models (Kosmos-1) 2023/02/27 arxiv:2302.14045 Natural Language Processing Microsoft Kosmos-1
LLaMA: Open and Efficient Foundation Language Models 2023/02/25 paper
blog post
github
Natural Language Processing Meta LLaMA
Structure and Content-Guided Video Synthesis with Diffusion Models (Gen-1) 2023/02/06 arxiv:2302.03011
blog post
Video-to-Video Runway Gen-1
Dreamix: Video Diffusion Models are General Video Editors 2023/02/03 arxiv:2302.01329
blog post
Google Dreamix
FLAME: A small language model for spreadsheet formulas 2023/01/31 arxiv:2301.13779 Microsoft FLAME
SingSong: Generating musical accompaniments from singing 2023/01/30 arxiv:2301.12662
blog post
Audio SingSong
MusicLM: Generating Music From Text 2023/01/26 arxiv:2301.11325
blog post
Audio Google MusicLM
Mastering Diverse Domains through World Models (DreamerV3) 2023/01/10 arxiv:2301.04104v1
blogpost
DeepMind DreamerV3
Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers (VALL-E) 2023/01/05 arxiv:2301.02111
Demo
Micorsoft VALL-E
Muse: Text-To-Image Generation via Masked Generative Transformers 2023/01/02 arxiv:2301.00704
blog post
Computer Vision Google Muse
BLOOM: A 176B-Parameter Open-Access Multilingual Language Model 2022/11/09 arxiv:2211.05100
Blog Post
Natural Language Processing Hugging Face BLOOM Open source LLM that is a competitor to GPT-3
ReAct: Synergizing Reasoning and Acting in Language Models 2022/10/06 arxiv:2210.03629
Blogpost
GitHub
Natural Language Processing Google ReAct
AudioLM: a Language Modeling Approach to Audio Generation 2022/09/07 arxiv:2209.03143
web page
blog post
Audio Google AudioML
Emergent Abilities of Large Language Models 2022/06/15 arxiv:2206.07682
Blog Post
Natural Language Processing Google Emergent Abilities
Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding (Imagen) 2022/05/23 arxiv:2205.11487
Blog Post
Computer Vision Google Imagen
A Generalist Agent (Gato) 2022/05/23 arxiv:2205.06175
Blog Post
Natural Language Processing
Multimodal
DeepMind Gato
Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback 2022/04/12 arxiv:2204.05862
GitHub
Natural Language Processing Anthropic RLHF (Reinforcement Learning from Human Feedback)
PaLM: Scaling Language Modeling with Pathways 2022/04/05 arxiv:2204.02311
Blog Post
Natural Language Processing Google PaLM (Pathways Language Model)
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models 2022/01/28 arxiv:2201.11903
Blog Post
Natural Language Processing Google Chain of Thought Prompting (CoT prompting)
Constitutional AI: Harmlessness from AI Feedback 2021/12/12 arxiv:2212.08073 Natural Language Processing Anthropic Constitutional AI, Claude
Improving language models by retrieving from trillions of tokens (RETRO) 2021/12/08 arxiv:2112.04426
Blog post
Natural Language Processing OpenAI RETRO (Retrieval Enhanced Transformer)
InstructPix2Pix: Learning to Follow Image Editing Instructions 2021/11/17 arxiv:2211.09800
Blog Post
Computer Vision UC Berkley InstructPix2Pix
REALM: Retrieval-Augmented Language Model Pre-Training 2020/02/10 arxiv:2002.08909
Blog Post
Natural Language Processing Google REALM (Retrieval-Augmented Language Model Pre-Training)
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer (T5) 2019/10/23 arxiv:1910.10683 Natural Language Processing
blog post
Google T5 (Text-To-Text Transfer Transformer)
RoBERTa: A Robustly Optimized BERT Pretraining Approach 2019/07/26 arxiv:1907.11692 Natural Language Processing
blog post
Meta RoBERTa (Robustly Optimized BERT Pretraining Approach)
Probabilistic Face Embeddings 2019/04/21 arxiv:1904.09658 Computer Vision PFEs (Probabilistic Face Embeddings)
Language Models are Unsupervised Multitask Learners (GPT-2) 2018 paper Natural Language Processing OpenAI GPT-2
Deep reinforcement learning from human preferences 2017/06/12 arxiv:1706.03741
Blog post
GitHub
OpenAI RLHF (Reinforcement Learning from Human Feedback)