| Self-Rewarding Language Models | 2024/01/18 | arxiv:2401.10020 | Natural Language Processing | Meta | | |
| LLM in a flash: Efficient Large Language Model Inference with Limited Memory | 2023/12/12 | arxiv:2312.11514 | | | | |
| HuggingFace | Natural Language Processing | Apple | | | | |
| Animate Anyone: Consistent and Controllable Image-to-Video Synthesis for Character Animation | 2023/12/07 | arxiv:2311.17117 | | | | |
| Website | | | | | | |
| Video | | | | | | |
| GitHub | | | | | | |
| Tutorial | Computer Vision | Alibaba | Animate Anyone | | | |
| MatterGen: a generative model for inorganic materials design | 2023/12/06 | arxiv:2312.03687 | | | | |
| Tweet | Materials Science | Microsoft | MatterGen | | | |
| Audiobox: Generating audio from voice and natural language prompts | 2023/11/30 | Paper | | | | |
| Website | Audio | Meta | Audiobox | | | |
| Unnatural Error Correction: GPT-4 Can Almost Perfectly Handle Unnatural Scrambled Text | 2023/11/30 | arxiv:2311.18805 | Natural Language Processing | University of Tokyo | Scrambled Bench | |
| MeshGPT: Generating Triangle Meshes with Decoder-Only Transformers | 2023/11/27 | arxiv:2311.15475 | | | | |
| Website | Computer Vision | | MeshGPT | | | |
| Ferret: Refer and Ground Anything Anywhere at Any Granularity | 2023/10/11 | arxiv:2310.07704 | | | | |
| GitHub | Multimodal | | | | | |
| Natural Language Processing | Apple | Ferret | | | | |
| SeamlessM4T - Massively Multilingual & Multimodal Machine Translation | 2023/08/23 | Paper | | | | |
| Website | | | | | | |
| Blogpost | | | | | | |
| Demo | | | | | | |
| GitHub | Natural Language Processing | Meta | SeamlessM4T | | | |
| RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control | 2023/08/01 | arxiv:2307.15818 | | | | |
| Website | | | | | | |
| Blogpost | Robotics | Google | RT-2 | | | |
| Towards Generalist Biomedical AI | 2023/07/26 | arxiv:2307.14334 | Natural Language Processing | Google | Med-PaLM | |
| Large Language Models Understand and Can be Enhanced by Emotional Stimuli | 2023/07/14 | arxiv:2307.11760 | Natural Language Processing | EmotionPrompt | | |
| MusicGen: Simple and Controllable Music Generation | 2023/06/08 | arxiv:2306.05284 | | | | |
| GitHub | | | | | | |
| Example | Audio | Meta | MusicGen | | | |
| CodeTF: One-stop Transformer Library for State-of-the-art Code LLM | 2023/05/31 | arxiv:2306.00029 | | | | |
| GitHub | Natural Language Processing | Salesforce | CodeTF | | | |
| Bytes Are All You Need: Transformers Operating Directly On File Bytes | 2023/05/31 | arxiv:2306.00238 | Computer Vision | Apple | | |
| Scaling Speech Technology to 1,000+ Languages | 2023/05/22 | Paper | | | | |
| <Blogpost | | | | | | |
| GitHub | | | | | | |
| Languages covered | Natural Language Processing | Meta | Massively Multilingual Speech (MMS) | | | |
| RWKV: Reinventing RNNs for the Transformer Era | 2023/05/22 | arxiv:2305.13048 | Natural Language Processing | | Receptance Weighted Key Value (RWKV) | |
| ImageBind: One Embedding Space To Bind Them All | 2023/05/09 | arxiv:2305.05665 | | | | |
| Website | | | | | | |
| Demo | | | | | | |
| Blog | | | | | | |
| GitHub | Multimodal | | | | | |
| Computer Vision | | | | | | |
| Natural Language Processing | Meta | ImageBind | | | | |
| Real-Time Neural Appearance Models | 2023/05/05 | Paper | | | | |
| Blog | | NVIDIA | | | | |
| Poisoning Language Models During Instruction Tuning | 2023/05/01 | arxiv:2305.00944 | Natural Language Processing | | | |
| Generative Agents: Interactive Simulacra of Human Behavior | 2023/04/07 | arxiv:2304.03442 | Human-AI Interaction | | | |
| Natural Language Processing | Stanford | Generative agents | | | | |
| Segment Anything | 2023/04/05 | Paper | | | | |
| Website | | | | | | |
| Blog | | | | | | |
| GitHub | Computer Vision | Meta | Segment Anything Model (SAM) | | | |
| HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in HuggingFace (Microsoft JARVIS) | 2023/03/30 | arxiv:2303.17580 | | | | |
| HuggingFace Space | | | | | | |
| JARVIS GitHub | Natural Language Processing | | | | | |
| Multimodal | Microsoft | | | | | |
| Hugging Face | HuggingGPT | | | | | |
| JARVIS | | | | | | |
| BloombergGPT: A Large Language Model for Finance | 2023/03/30 | arxiv:2303.17564 | | | | |
| press release | | | | | | |
| twitter thread | Natural Language Processing | Bloomberg | BloombergGPT | | | |
| Sparks of Artificial General Intelligence: Early experiments with GPT-4 | 2023/03/22 | arxiv:2303.12712 | Natural Language Processing | | | |
| Multimodal | Microsoft | | | | | |
| Reflexion: an autonomous agent with dynamic memory and self-reflection | 2023/03/20 | arxiv:2303.11366 | | | | |
| blog post | | | | | | |
| GitHub | | | | | | |
| GitHub 2 | Natural Language Processing | | Reflexion | | | |
| PaLM-E: An Embodied Multimodal Language Model | 2023/03/06 | arxiv:2303.03378 | | | | |
| blog | Natural Language Processing | | | | | |
| Multimodal | Google | PaLM-E | | | | |
| Google USM: Scaling Automatic Speech Recognition Beyond 100 Languages | 2023/03/02 | arxiv:2303.01037 | | | | |
| blog | Natural Language Processing | Google | Universal Speech Model (USM) | | | |
| Language Is Not All You Need: Aligning Perception with Language Models (Kosmos-1) | 2023/02/27 | arxiv:2302.14045 | Natural Language Processing | Microsoft | Kosmos-1 | |
| LLaMA: Open and Efficient Foundation Language Models | 2023/02/25 | paper | | | | |
| blog post | | | | | | |
| github | Natural Language Processing | Meta | LLaMA | | | |
| Structure and Content-Guided Video Synthesis with Diffusion Models (Gen-1) | 2023/02/06 | arxiv:2302.03011 | | | | |
| blog post | Video-to-Video | Runway | Gen-1 | | | |
| Dreamix: Video Diffusion Models are General Video Editors | 2023/02/03 | arxiv:2302.01329 | | | | |
| blog post | | Google | Dreamix | | | |
| FLAME: A small language model for spreadsheet formulas | 2023/01/31 | arxiv:2301.13779 | | Microsoft | FLAME | |
| SingSong: Generating musical accompaniments from singing | 2023/01/30 | arxiv:2301.12662 | | | | |
| blog post | Audio | | SingSong | | | |
| MusicLM: Generating Music From Text | 2023/01/26 | arxiv:2301.11325 | | | | |
| blog post | Audio | Google | MusicLM | | | |
| Mastering Diverse Domains through World Models (DreamerV3) | 2023/01/10 | arxiv:2301.04104v1 | | | | |
| blogpost | | DeepMind | DreamerV3 | | | |
| Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers (VALL-E) | 2023/01/05 | arxiv:2301.02111 | | | | |
| Demo | | Micorsoft | VALL-E | | | |
| Muse: Text-To-Image Generation via Masked Generative Transformers | 2023/01/02 | arxiv:2301.00704 | | | | |
| blog post | Computer Vision | Google | Muse | | | |
| BLOOM: A 176B-Parameter Open-Access Multilingual Language Model | 2022/11/09 | arxiv:2211.05100 | | | | |
| Blog Post | Natural Language Processing | Hugging Face | BLOOM | Open source LLM that is a competitor to GPT-3 | | |
| ReAct: Synergizing Reasoning and Acting in Language Models | 2022/10/06 | arxiv:2210.03629 | | | | |
| Blogpost | | | | | | |
| GitHub | Natural Language Processing | Google | ReAct | | | |
| AudioLM: a Language Modeling Approach to Audio Generation | 2022/09/07 | arxiv:2209.03143 | | | | |
| web page | | | | | | |
| blog post | Audio | Google | AudioML | | | |
| Emergent Abilities of Large Language Models | 2022/06/15 | arxiv:2206.07682 | | | | |
| Blog Post | Natural Language Processing | Google | Emergent Abilities | | | |
| Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding (Imagen) | 2022/05/23 | arxiv:2205.11487 | | | | |
| Blog Post | Computer Vision | Google | Imagen | | | |
| A Generalist Agent (Gato) | 2022/05/23 | arxiv:2205.06175 | | | | |
| Blog Post | Natural Language Processing | | | | | |
| Multimodal | DeepMind | Gato | | | | |
| Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback | 2022/04/12 | arxiv:2204.05862 | | | | |
| GitHub | Natural Language Processing | Anthropic | RLHF (Reinforcement Learning from Human Feedback) | | | |
| PaLM: Scaling Language Modeling with Pathways | 2022/04/05 | arxiv:2204.02311 | | | | |
| Blog Post | Natural Language Processing | Google | PaLM (Pathways Language Model) | | | |
| Chain-of-Thought Prompting Elicits Reasoning in Large Language Models | 2022/01/28 | arxiv:2201.11903 | | | | |
| Blog Post | Natural Language Processing | Google | Chain of Thought Prompting (CoT prompting) | | | |
| Constitutional AI: Harmlessness from AI Feedback | 2021/12/12 | arxiv:2212.08073 | Natural Language Processing | Anthropic | Constitutional AI, Claude | |
| Improving language models by retrieving from trillions of tokens (RETRO) | 2021/12/08 | arxiv:2112.04426 | | | | |
| Blog post | Natural Language Processing | OpenAI | RETRO (Retrieval Enhanced Transformer) | | | |
| InstructPix2Pix: Learning to Follow Image Editing Instructions | 2021/11/17 | arxiv:2211.09800 | | | | |
| Blog Post | Computer Vision | UC Berkley | InstructPix2Pix | | | |
| REALM: Retrieval-Augmented Language Model Pre-Training | 2020/02/10 | arxiv:2002.08909 | | | | |
| Blog Post | Natural Language Processing | Google | REALM (Retrieval-Augmented Language Model Pre-Training) | | | |
| Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer (T5) | 2019/10/23 | arxiv:1910.10683 | Natural Language Processing | | | |
| blog post | Google | T5 (Text-To-Text Transfer Transformer) | | | | |
| RoBERTa: A Robustly Optimized BERT Pretraining Approach | 2019/07/26 | arxiv:1907.11692 | Natural Language Processing | | | |
| blog post | Meta | RoBERTa (Robustly Optimized BERT Pretraining Approach) | | | | |
| Probabilistic Face Embeddings | 2019/04/21 | arxiv:1904.09658 | Computer Vision | | PFEs (Probabilistic Face Embeddings) | |
| Language Models are Unsupervised Multitask Learners (GPT-2) | 2018 | paper | Natural Language Processing | OpenAI | GPT-2 | |
| Deep reinforcement learning from human preferences | 2017/06/12 | arxiv:1706.03741 | | | | |
| Blog post | | | | | | |
| GitHub | | OpenAI | RLHF (Reinforcement Learning from Human Feedback) | | | |