| Name | Date | Source | Type | Organization | Product | Note |
|---|---|---|---|---|---|---|
| ImageNet Classification with Deep Convolutional Neural Networks (AlexNet) | 2012 | AlexNet Paper | Computer Vision | AlexNet | ||
| Efficient Estimation of Word Representations in Vector Space (Word2Vec) | 2013/01/16 | arxiv:1301.3781 | Natural Language Processing | Word2Vec | ||
| Playing Atari with Deep Reinforcement Learning (DQN) | 2013/12/19 | arxiv:1312.5602 | DQN (Deep Q-Learning) | |||
| Generative Adversarial Networks (GAN) | 2014/06/10 | arxiv:1406.2661 | GAN (Generative Adversarial Network) | |||
| Very Deep Convolutional Networks for Large-Scale Image Recognition (VGGNet) | 2014/09/04 | arxiv:1409.1556 | Computer Vision | VGGNet | ||
| Sequence to Sequence Learning with Neural Networks (Seq2Seq) | 2014/09/10 | arxiv:1409.3215 | Natural Language Processing | Seq2Seq | ||
| Adam: A Method for Stochastic Optimization | 2014/12/22 | arxiv:1412.6980 | Adam | |||
| Deep Residual Learning for Image Recognition (ResNet) | 2015/12/10 | arxiv:1512.03385 | Computer Vision | ResNet | ||
| Going Deeper with Convolutions (GoogleNet) | 2015/12/10 | arxiv:1409.4842 | Computer Vision | GoogleNet | ||
| Asynchronous Methods for Deep Reinforcement Learning (A3C) | 2016/02/04 | arxiv:1602.01783 | A3C | |||
| WaveNet: A Generative Model for Raw Audio | 2016/09/12 | arxiv:1609.03499 | Audio | WaveNet | ||
| Attention Is All You Need (Transformer) | 2017/06/12 | arxiv:1706.03762 | Natural Language Processing | Transformer | Influential paper | |
| Proximal Policy Optimization Algorithms (PPO) | 2017/07/20 | arxiv:1707.06347 | PPO (Proximal Policy Optimization) | |||
| Improving Language Understanding by Generative Pre-Training (GPT) | 2018 | paper source | Natural Language Processing | OpenAI | GPT (Generative Pre-Training Transformer) | |
| Deep contextualized word representations (ELMo) | 2018/02/15 | arxiv:1802.05365 | Natural Language Processing | ELMo | ||
| GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding | 2018/04/20 | arxiv:1804.07461 website |
Natural Language Processing | GLUE (General Language Understanding Evaluation) | ||
| BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding | 2018/10/11 | arxiv:1810.04805 | Natural Language Processing | BERT (Bidirectional Encoder Representations from Transformers) | ||
| Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context | 2019/01/09 | arxiv:1901.02860 github |
Transformer-XL | |||
| Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model (MuZero) | 2019/11/19 | arxiv:1911.08265 | MuZero | |||
| Language Models are Few-Shot Learners (GPT-3) | 2020/05/28 | arxiv:2005.14165 | Natural Language Processing | OpenAI | GPT-3 | |
| An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale (ViT) | 2020/10/22 | arxiv:2010.11929 GitHub |
Computer Vision | ViT (Vision Transformer) | ||
| Learning Transferable Visual Models From Natural Language Supervision (CLIP) | 2021/02/26 | arxiv:2103.00020 Blog Post |
Computer Vision | OpenAI | CLIP (Contrastive Language-Image Pre-Training) | |
| MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer | 2021/10/05 | arxiv:2110.02178 GitHub |
Computer Vision | Apple | MobileViT | |
| LaMDA: Language Models for Dialog Applications | 2022/01/20 | arxiv:2201.08239 Blog Post |
Natural Language Processing | LaMDA (Language Models for Dialog Applications) | ||
| Block-Recurrent Transformers | 2022/03/11 | arxiv:2203.07852 | ||||
| Memorizing Transformers | 2022/03/16 | arxiv:2203.08913 | ||||
| STaR: Bootstrapping Reasoning With Reasoning | 2022/03/28 | arxiv:2203.14465 | STaR (Self-Taught Reasoner) | |||
| GPT-4 Technical Report | 2023/03/15 | arxiv:2303.08774 blog post system card |
Natural Language Processing Multimodal |
OpenAI | GPT-4 |
| Name | Date | Source | Type | Organization | Product | Note |
|---|---|---|---|---|---|---|
| Self-Rewarding Language Models | 2024/01/18 | arxiv:2401.10020 | Natural Language Processing | Meta | ||
| LLM in a flash: Efficient Large Language Model Inference with Limited Memory | 2023/12/12 | arxiv:2312.11514 HuggingFace |
Natural Language Processing | Apple | ||
| Animate Anyone: Consistent and Controllable Image-to-Video Synthesis for Character Animation | 2023/12/07 | arxiv:2311.17117 Website Video GitHub Tutorial |
Computer Vision | Alibaba | Animate Anyone | |
| MatterGen: a generative model for inorganic materials design | 2023/12/06 | arxiv:2312.03687 Tweet |
Materials Science | Microsoft | MatterGen | |
| Audiobox: Generating audio from voice and natural language prompts | 2023/11/30 | Paper Website |
Audio | Meta | Audiobox | |
| Unnatural Error Correction: GPT-4 Can Almost Perfectly Handle Unnatural Scrambled Text | 2023/11/30 | arxiv:2311.18805 | Natural Language Processing | University of Tokyo | Scrambled Bench | |
| MeshGPT: Generating Triangle Meshes with Decoder-Only Transformers | 2023/11/27 | arxiv:2311.15475 Website |
Computer Vision | MeshGPT | ||
| Ferret: Refer and Ground Anything Anywhere at Any Granularity | 2023/10/11 | arxiv:2310.07704 GitHub |
Multimodal Natural Language Processing |
Apple | Ferret | |
| SeamlessM4T - Massively Multilingual & Multimodal Machine Translation | 2023/08/23 | Paper Website Blogpost Demo GitHub |
Natural Language Processing | Meta | SeamlessM4T | |
| RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control | 2023/08/01 | arxiv:2307.15818 Website Blogpost |
Robotics | RT-2 | ||
| Towards Generalist Biomedical AI | 2023/07/26 | arxiv:2307.14334 | Natural Language Processing | Med-PaLM | ||
| Large Language Models Understand and Can be Enhanced by Emotional Stimuli | 2023/07/14 | arxiv:2307.11760 | Natural Language Processing | EmotionPrompt | ||
| MusicGen: Simple and Controllable Music Generation | 2023/06/08 | arxiv:2306.05284 GitHub Example |
Audio | Meta | MusicGen | |
| CodeTF: One-stop Transformer Library for State-of-the-art Code LLM | 2023/05/31 | arxiv:2306.00029 GitHub |
Natural Language Processing | Salesforce | CodeTF | |
| Bytes Are All You Need: Transformers Operating Directly On File Bytes | 2023/05/31 | arxiv:2306.00238 | Computer Vision | Apple | ||
| Scaling Speech Technology to 1,000+ Languages | 2023/05/22 | Paper <Blogpost GitHub Languages covered |
Natural Language Processing | Meta | Massively Multilingual Speech (MMS) | |
| RWKV: Reinventing RNNs for the Transformer Era | 2023/05/22 | arxiv:2305.13048 | Natural Language Processing | Receptance Weighted Key Value (RWKV) | ||
| ImageBind: One Embedding Space To Bind Them All | 2023/05/09 | arxiv:2305.05665 Website Demo Blog GitHub |
Multimodal Computer Vision Natural Language Processing |
Meta | ImageBind | |
| Real-Time Neural Appearance Models | 2023/05/05 | Paper Blog |
NVIDIA | |||
| Poisoning Language Models During Instruction Tuning | 2023/05/01 | arxiv:2305.00944 | Natural Language Processing | |||
| Generative Agents: Interactive Simulacra of Human Behavior | 2023/04/07 | arxiv:2304.03442 | Human-AI Interaction Natural Language Processing |
Stanford | Generative agents | |
| Segment Anything | 2023/04/05 | Paper Website Blog GitHub |
Computer Vision | Meta | Segment Anything Model (SAM) | |
| HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in HuggingFace (Microsoft JARVIS) | 2023/03/30 | arxiv:2303.17580 HuggingFace Space JARVIS GitHub |
Natural Language Processing Multimodal |
Microsoft Hugging Face |
HuggingGPT JARVIS |
|
| BloombergGPT: A Large Language Model for Finance | 2023/03/30 | arxiv:2303.17564 press release twitter thread |
Natural Language Processing | Bloomberg | BloombergGPT | |
| Sparks of Artificial General Intelligence: Early experiments with GPT-4 | 2023/03/22 | arxiv:2303.12712 | Natural Language Processing Multimodal |
Microsoft | ||
| Reflexion: an autonomous agent with dynamic memory and self-reflection | 2023/03/20 | arxiv:2303.11366 blog post GitHub GitHub 2 |
Natural Language Processing | Reflexion | ||
| PaLM-E: An Embodied Multimodal Language Model | 2023/03/06 | arxiv:2303.03378 blog |
Natural Language Processing Multimodal |
PaLM-E | ||
| Google USM: Scaling Automatic Speech Recognition Beyond 100 Languages | 2023/03/02 | arxiv:2303.01037 blog |
Natural Language Processing | Universal Speech Model (USM) | ||
| Language Is Not All You Need: Aligning Perception with Language Models (Kosmos-1) | 2023/02/27 | arxiv:2302.14045 | Natural Language Processing | Microsoft | Kosmos-1 | |
| LLaMA: Open and Efficient Foundation Language Models | 2023/02/25 | paper blog post github |
Natural Language Processing | Meta | LLaMA | |
| Structure and Content-Guided Video Synthesis with Diffusion Models (Gen-1) | 2023/02/06 | arxiv:2302.03011 blog post |
Video-to-Video | Runway | Gen-1 | |
| Dreamix: Video Diffusion Models are General Video Editors | 2023/02/03 | arxiv:2302.01329 blog post |
Dreamix | |||
| FLAME: A small language model for spreadsheet formulas | 2023/01/31 | arxiv:2301.13779 | Microsoft | FLAME | ||
| SingSong: Generating musical accompaniments from singing | 2023/01/30 | arxiv:2301.12662 blog post |
Audio | SingSong | ||
| MusicLM: Generating Music From Text | 2023/01/26 | arxiv:2301.11325 blog post |
Audio | MusicLM | ||
| Mastering Diverse Domains through World Models (DreamerV3) | 2023/01/10 | arxiv:2301.04104v1 blogpost |
DeepMind | DreamerV3 | ||
| Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers (VALL-E) | 2023/01/05 | arxiv:2301.02111 Demo |
Micorsoft | VALL-E | ||
| Muse: Text-To-Image Generation via Masked Generative Transformers | 2023/01/02 | arxiv:2301.00704 blog post |
Computer Vision | Muse | ||
| BLOOM: A 176B-Parameter Open-Access Multilingual Language Model | 2022/11/09 | arxiv:2211.05100 Blog Post |
Natural Language Processing | Hugging Face | BLOOM | Open source LLM that is a competitor to GPT-3 |
| ReAct: Synergizing Reasoning and Acting in Language Models | 2022/10/06 | arxiv:2210.03629 Blogpost GitHub |
Natural Language Processing | ReAct | ||
| AudioLM: a Language Modeling Approach to Audio Generation | 2022/09/07 | arxiv:2209.03143 web page blog post |
Audio | AudioML | ||
| Emergent Abilities of Large Language Models | 2022/06/15 | arxiv:2206.07682 Blog Post |
Natural Language Processing | Emergent Abilities | ||
| Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding (Imagen) | 2022/05/23 | arxiv:2205.11487 Blog Post |
Computer Vision | Imagen | ||
| A Generalist Agent (Gato) | 2022/05/23 | arxiv:2205.06175 Blog Post |
Natural Language Processing Multimodal |
DeepMind | Gato | |
| Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback | 2022/04/12 | arxiv:2204.05862 GitHub |
Natural Language Processing | Anthropic | RLHF (Reinforcement Learning from Human Feedback) | |
| PaLM: Scaling Language Modeling with Pathways | 2022/04/05 | arxiv:2204.02311 Blog Post |
Natural Language Processing | PaLM (Pathways Language Model) | ||
| Chain-of-Thought Prompting Elicits Reasoning in Large Language Models | 2022/01/28 | arxiv:2201.11903 Blog Post |
Natural Language Processing | Chain of Thought Prompting (CoT prompting) | ||
| Constitutional AI: Harmlessness from AI Feedback | 2021/12/12 | arxiv:2212.08073 | Natural Language Processing | Anthropic | Constitutional AI, Claude | |
| Improving language models by retrieving from trillions of tokens (RETRO) | 2021/12/08 | arxiv:2112.04426 Blog post |
Natural Language Processing | OpenAI | RETRO (Retrieval Enhanced Transformer) | |
| InstructPix2Pix: Learning to Follow Image Editing Instructions | 2021/11/17 | arxiv:2211.09800 Blog Post |
Computer Vision | UC Berkley | InstructPix2Pix | |
| REALM: Retrieval-Augmented Language Model Pre-Training | 2020/02/10 | arxiv:2002.08909 Blog Post |
Natural Language Processing | REALM (Retrieval-Augmented Language Model Pre-Training) | ||
| Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer (T5) | 2019/10/23 | arxiv:1910.10683 | Natural Language Processing blog post |
T5 (Text-To-Text Transfer Transformer) | ||
| RoBERTa: A Robustly Optimized BERT Pretraining Approach | 2019/07/26 | arxiv:1907.11692 | Natural Language Processing blog post |
Meta | RoBERTa (Robustly Optimized BERT Pretraining Approach) | |
| Probabilistic Face Embeddings | 2019/04/21 | arxiv:1904.09658 | Computer Vision | PFEs (Probabilistic Face Embeddings) | ||
| Language Models are Unsupervised Multitask Learners (GPT-2) | 2018 | paper | Natural Language Processing | OpenAI | GPT-2 | |
| Deep reinforcement learning from human preferences | 2017/06/12 | arxiv:1706.03741 Blog post GitHub |
OpenAI | RLHF (Reinforcement Learning from Human Feedback) |