Name
|
Date
|
Source
|
Type
|
Organization
|
Product
|
Note
|
MusicGen: Simple and Controllable Music Generation |
2023/06/08 |
arxiv:2306.05284 GitHub Example |
Audio |
Meta |
MusicGen |
|
CodeTF: One-stop Transformer Library for State-of-the-art Code LLM |
2023/05/31 |
arxiv:2306.00029 GitHub |
Natural Language Processing |
Salesforce |
CodeTF |
|
Bytes Are All You Need: Transformers Operating Directly On File Bytes |
2023/05/31 |
arxiv:2306.00238 |
Computer Vision |
Apple |
|
|
Scaling Speech Technology to 1,000+ Languages |
2023/05/22 |
Paper <Blogpost GitHub Languages covered |
Natural Language Processing |
Meta |
Massively Multilingual Speech (MMS) |
|
RWKV: Reinventing RNNs for the Transformer Era |
2023/05/22 |
arxiv:2305.13048 |
Natural Language Processing |
|
Receptance Weighted Key Value (RWKV) |
|
ImageBind: One Embedding Space To Bind Them All |
2023/05/09 |
arxiv:2305.05665 Website Demo Blog GitHub |
Multimodal Computer Vision Natural Language Processing |
Meta |
ImageBind |
|
Real-Time Neural Appearance Models |
2023/05/05 |
Paper Blog |
|
NVIDIA |
|
|
Poisoning Language Models During Instruction Tuning |
2023/05/01 |
arxiv:2305.00944 |
Natural Language Processing |
|
|
|
Generative Agents: Interactive Simulacra of Human Behavior |
2023/04/07 |
arxiv:2304.03442 |
Human-AI Interaction Natural Language Processing |
Stanford |
Generative agents |
|
Segment Anything |
2023/04/05 |
Paper Website Blog GitHub |
Computer Vision |
Meta |
Segment Anything Model (SAM) |
|
HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in HuggingFace (Microsoft JARVIS) |
2023/03/30 |
arxiv:2303.17580 HuggingFace Space JARVIS GitHub |
Natural Language Processing Multimodal |
Microsoft Hugging Face |
HuggingGPT JARVIS |
|
BloombergGPT: A Large Language Model for Finance |
2023/03/30 |
arxiv:2303.17564 press release twitter thread |
Natural Language Processing |
Bloomberg |
BloombergGPT |
|
Sparks of Artificial General Intelligence: Early experiments with GPT-4 |
2023/03/22 |
arxiv:2303.12712 |
Natural Language Processing Multimodal |
Microsoft |
|
|
Reflexion: an autonomous agent with dynamic memory and self-reflection |
2023/03/20 |
arxiv:2303.11366 blog post GitHub GitHub 2 |
Natural Language Processing |
|
Reflexion |
|
PaLM-E: An Embodied Multimodal Language Model |
2023/03/06 |
arxiv:2303.03378 blog |
Natural Language Processing Multimodal |
Google |
PaLM-E |
|
Google USM: Scaling Automatic Speech Recognition Beyond 100 Languages |
2023/03/02 |
arxiv:2303.01037 blog |
Natural Language Processing |
Google |
Universal Speech Model (USM) |
|
Language Is Not All You Need: Aligning Perception with Language Models (Kosmos-1) |
2023/02/27 |
arxiv:2302.14045 |
Natural Language Processing |
Microsoft |
Kosmos-1 |
|
LLaMA: Open and Efficient Foundation Language Models |
2023/02/25 |
paper blog post github |
Natural Language Processing |
Meta |
LLaMA |
|
Structure and Content-Guided Video Synthesis with Diffusion Models (Gen-1) |
2023/02/06 |
arxiv:2302.03011 blog post |
Video-to-Video |
Runway |
Gen-1 |
|
Dreamix: Video Diffusion Models are General Video Editors |
2023/02/03 |
arxiv:2302.01329 blog post |
|
Google |
Dreamix |
|
FLAME: A small language model for spreadsheet formulas |
2023/01/31 |
arxiv:2301.13779 |
|
Microsoft |
FLAME |
|
SingSong: Generating musical accompaniments from singing |
2023/01/30 |
arxiv:2301.12662 blog post |
Audio |
|
SingSong |
|
MusicLM: Generating Music From Text |
2023/01/26 |
arxiv:2301.11325 blog post |
Audio |
Google |
MusicLM |
|
Mastering Diverse Domains through World Models (DreamerV3) |
2023/01/10 |
arxiv:2301.04104v1 blogpost |
|
DeepMind |
DreamerV3 |
|
Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers (VALL-E) |
2023/01/05 |
arxiv:2301.02111 Demo |
|
Micorsoft |
VALL-E |
|
Muse: Text-To-Image Generation via Masked Generative Transformers |
2023/01/02 |
arxiv:2301.00704 blog post |
Computer Vision |
Google |
Muse |
|
BLOOM: A 176B-Parameter Open-Access Multilingual Language Model |
2022/11/09 |
arxiv:2211.05100 Blog Post |
Natural Language Processing |
Hugging Face |
BLOOM |
Open source LLM that is a competitor to GPT-3
|
ReAct: Synergizing Reasoning and Acting in Language Models |
2022/10/06 |
arxiv:2210.03629 Blogpost GitHub |
Natural Language Processing |
Google |
ReAct |
|
AudioLM: a Language Modeling Approach to Audio Generation |
2022/09/07 |
arxiv:2209.03143 web page blog post |
Audio |
Google |
AudioML |
|
Emergent Abilities of Large Language Models |
2022/06/15 |
arxiv:2206.07682 Blog Post |
Natural Language Processing |
Google |
Emergent Abilities |
|
Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding (Imagen) |
2022/05/23 |
arxiv:2205.11487 Blog Post |
Computer Vision |
Google |
Imagen |
|
A Generalist Agent (Gato) |
2022/05/23 |
arxiv:2205.06175 Blog Post |
Natural Language Processing Multimodal |
DeepMind |
Gato |
|
Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback |
2022/04/12 |
arxiv:2204.05862 GitHub |
Natural Language Processing |
Anthropic |
RLHF (Reinforcement Learning from Human Feedback) |
|
PaLM: Scaling Language Modeling with Pathways |
2022/04/05 |
arxiv:2204.02311 Blog Post |
Natural Language Processing |
Google |
PaLM (Pathways Language Model) |
|
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models |
2022/01/28 |
arxiv:2201.11903 Blog Post |
Natural Language Processing |
Google |
Chain of Thought Prompting (CoT prompting) |
|
Constitutional AI: Harmlessness from AI Feedback |
2021/12/12 |
arxiv:2212.08073 |
Natural Language Processing |
Anthropic |
Constitutional AI, Claude |
|
Improving language models by retrieving from trillions of tokens (RETRO) |
2021/12/08 |
arxiv:2112.04426 Blog post |
Natural Language Processing |
OpenAI |
RETRO (Retrieval Enhanced Transformer) |
|
InstructPix2Pix: Learning to Follow Image Editing Instructions |
2021/11/17 |
arxiv:2211.09800 Blog Post |
Computer Vision |
UC Berkley |
InstructPix2Pix |
|
REALM: Retrieval-Augmented Language Model Pre-Training |
2020/02/10 |
arxiv:2002.08909 Blog Post |
Natural Language Processing |
Google |
REALM (Retrieval-Augmented Language Model Pre-Training) |
|
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer (T5) |
2019/10/23 |
arxiv:1910.10683 |
Natural Language Processing blog post |
Google |
T5 (Text-To-Text Transfer Transformer) |
|
RoBERTa: A Robustly Optimized BERT Pretraining Approach |
2019/07/26 |
arxiv:1907.11692 |
Natural Language Processing blog post |
Meta |
RoBERTa (Robustly Optimized BERT Pretraining Approach) |
|
Probabilistic Face Embeddings |
2019/04/21 |
arxiv:1904.09658 |
Computer Vision |
|
PFEs (Probabilistic Face Embeddings) |
|
Language Models are Unsupervised Multitask Learners (GPT-2) |
2018 |
paper |
Natural Language Processing |
OpenAI |
GPT-2 |
|