Papers: Difference between revisions
No edit summary |
|||
(120 intermediate revisions by 6 users not shown) | |||
Line 1: | Line 1: | ||
{{Needs Expansion}} | |||
{| | __TOC__ | ||
==Important Papers== | |||
{| class="wikitable sortable" | |||
|- | |- | ||
!Name | !Name | ||
! | !Date | ||
!Source | !Source | ||
!Type | !Type | ||
!Organization | |||
!Product | |||
!Note | !Note | ||
|- | |- | ||
|[[ImageNet Classification with Deep Convolutional Neural Networks (AlexNet)]] || 2012 || [https://proceedings.neurips.cc/paper/2012/file/c399862d3b9d6b76c8436e924a68c45b-Paper.pdf AlexNet Paper] || || | |[[ImageNet Classification with Deep Convolutional Neural Networks (AlexNet)]] || 2012 || [https://proceedings.neurips.cc/paper/2012/file/c399862d3b9d6b76c8436e924a68c45b-Paper.pdf AlexNet Paper] || [[Computer Vision]] || || [[AlexNet]] || | ||
|- | |- | ||
|[[Efficient Estimation of Word Representations in Vector Space (Word2Vec)]] || 2013/01/16 || [[arxiv:1301.3781]] || [[ | |[[Efficient Estimation of Word Representations in Vector Space (Word2Vec)]] || 2013/01/16 || [[arxiv:1301.3781]] || [[Natural Language Processing]] || || [[Word2Vec]] || | ||
|- | |- | ||
|[[Playing Atari with Deep Reinforcement Learning (DQN)]] || 2013/12/19 || [[arxiv:1312.5602]] || || | |[[Playing Atari with Deep Reinforcement Learning (DQN)]] || 2013/12/19 || [[arxiv:1312.5602]] || || || [[DQN]] ([[Deep Q-Learning]]) || | ||
|- | |- | ||
|[[ | |[[Generative Adversarial Networks (GAN)]] || 2014/06/10 || [[arxiv:1406.2661]] || || || [[GAN]] ([[Generative Adversarial Network]]) || | ||
|- | |- | ||
|[[Deep | |[[Very Deep Convolutional Networks for Large-Scale Image Recognition (VGGNet)]] || 2014/09/04 || [[arxiv:1409.1556]] || [[Computer Vision]] || || [[VGGNet]] || | ||
|- | |- | ||
|[[ | |[[Sequence to Sequence Learning with Neural Networks (Seq2Seq)]] || 2014/09/10 || [[arxiv:1409.3215]] || [[Natural Language Processing]] || || [[Seq2Seq]] || | ||
|- | |- | ||
|[[ | |[[Adam: A Method for Stochastic Optimization]] || 2014/12/22 || [[arxiv:1412.6980]] || || || [[Adam]] || | ||
|- | |- | ||
|[[ | |[[Deep Residual Learning for Image Recognition (ResNet)]] || 2015/12/10 || [[arxiv:1512.03385]] || [[Computer Vision]] || || [[ResNet]] || | ||
|- | |- | ||
|[[ | |[[Going Deeper with Convolutions (GoogleNet)]] || 2015/12/10 || [[arxiv:1409.4842]] || [[Computer Vision]] || [[Google]] || [[GoogleNet]] || | ||
|- | |- | ||
|[[ | |[[Asynchronous Methods for Deep Reinforcement Learning (A3C)]] || 2016/02/04 || [[arxiv:1602.01783]] || || || [[A3C]] || | ||
|- | |- | ||
|[[ | |[[WaveNet: A Generative Model for Raw Audio]] || 2016/09/12 || [[arxiv:1609.03499]] || [[Audio]] || || [[WaveNet]] || | ||
|- | |- | ||
|[[ | |[[Attention Is All You Need (Transformer)]] || 2017/06/12 || [[arxiv:1706.03762]] || [[Natural Language Processing]] || [[Google]] || [[Transformer]] || Influential paper | ||
|- | |- | ||
|[[ | |[[Proximal Policy Optimization Algorithms (PPO)]] || 2017/07/20 || [[arxiv:1707.06347]] || || || [[PPO]] ([[Proximal Policy Optimization]]) || | ||
|- | |- | ||
|[[ | |[[Improving Language Understanding by Generative Pre-Training (GPT)]] || 2018 || [https://s3-us-west-2.amazonaws.com/openai-assets/research-covers/language-unsupervised/language_understanding_paper.pdf paper source] || [[Natural Language Processing]] || [[OpenAI]] || [[GPT]] ([[Generative Pre-Training Transformer]]) || | ||
|- | |- | ||
|[[ | |[[Deep contextualized word representations (ELMo)]] || 2018/02/15 || [[arxiv:1802.05365]] || [[Natural Language Processing]] || || [[ELMo]] || | ||
|- | |- | ||
|[[ | |[[GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding]] || 2018/04/20 || [[arxiv:1804.07461]]<br>[https://gluebenchmark.com/ website] || [[Natural Language Processing]] || || [[GLUE]] ([[General Language Understanding Evaluation]]) || | ||
|- | |- | ||
|[[ | |[[BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding]] || 2018/10/11 || [[arxiv:1810.04805]] || [[Natural Language Processing]] || [[Google]] || [[BERT]] ([[Bidirectional Encoder Representations from Transformers]]) || | ||
|- | |- | ||
|[[ | |[[Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context]] || 2019/01/09 || [[arxiv:1901.02860]]<br>[https://github.com/kimiyoung/transformer-xl github] || || || [[Transformer-XL]] || | ||
|- | |- | ||
|[[ | |[[Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model (MuZero)]] || 2019/11/19 || [[arxiv:1911.08265]] || || || [[MuZero]] || | ||
|- | |- | ||
|[[ | |[[Language Models are Few-Shot Learners (GPT-3)]] || 2020/05/28 || [[arxiv:2005.14165]] || [[Natural Language Processing]] || [[OpenAI]] || [[GPT-3]] || | ||
|- | |- | ||
|[[ | |[[An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale (ViT)]] || 2020/10/22 || [[arxiv:2010.11929]]<br>[https://github.com/google-research/vision_transformer GitHub] || [[Computer Vision]] || [[Google]] || [[ViT]] ([[Vision Transformer]]) || | ||
|- | |||
|[[Learning Transferable Visual Models From Natural Language Supervision (CLIP)]] || 2021/02/26 || [[arxiv:2103.00020]]<br>[https://openai.com/blog/clip/ Blog Post] || [[Computer Vision]] || [[OpenAI]] || [[CLIP]] ([[Contrastive Language-Image Pre-Training]]) || | |||
|- | |||
|[[MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer]] || 2021/10/05 || [[arxiv:2110.02178]]<br>[https://github.com/apple/ml-cvnets GitHub] || [[Computer Vision]] || [[Apple]] || [[MobileViT]] || | |||
|- | |||
|[[LaMDA: Language Models for Dialog Applications]] || 2022/01/20 || [[arxiv:2201.08239]]<br>[https://blog.google/technology/ai/lamda/ Blog Post] || [[Natural Language Processing]] || [[Google]] || [[LaMDA]] ([[Language Models for Dialog Applications]]) || | |||
|- | |||
|[[Block-Recurrent Transformers]] || 2022/03/11 || [[arxiv:2203.07852]] || || || || | |||
|- | |||
|[[Memorizing Transformers]] || 2022/03/16 ||[[arxiv:2203.08913]] || || || || | |||
|- | |||
|[[STaR: Bootstrapping Reasoning With Reasoning]] || 2022/03/28 || [[arxiv:2203.14465]] || || || [[STaR]] ([[Self-Taught Reasoner]]) || | |||
|- | |||
|[[GPT-4 Technical Report]] || 2023/03/15 || [[arxiv:2303.08774]]<br>[https://openai.com/research/gpt-4 blog post]<br>[https://cdn.openai.com/papers/gpt-4-system-card.pdf system card] || [[Natural Language Processing]]<br>[[Multimodal]] || [[OpenAI]] || [[GPT-4]] || | |||
|- | |- | ||
|} | |} | ||
==Other Papers== | |||
https://arxiv. | {| class="wikitable sortable" | ||
|- | |||
!Name | |||
!Date | |||
!Source | |||
!Type | |||
!Organization | |||
!Product | |||
!Note | |||
|- | |||
|[[Self-Rewarding Language Models]] || 2024/01/18 || [[arxiv:2401.10020]] || [[Natural Language Processing]] || [[Meta]] || || | |||
|- | |||
|[[LLM in a flash: Efficient Large Language Model Inference with Limited Memory]] || 2023/12/12 || [[arxiv:2312.11514]]<br>[https://huggingface.co/papers/2312.11514 HuggingFace] || [[Natural Language Processing]] || [[Apple]] || || | |||
|- | |||
|[[Animate Anyone: Consistent and Controllable Image-to-Video Synthesis for Character Animation]] || 2023/12/07 || [[arxiv:2311.17117]]<br>[https://humanaigc.github.io/animate-anyone/ Website]<br>[https://www.youtube.com/watch?v=8PCn5hLKNu4 Video]<br>[https://github.com/HumanAIGC/AnimateAnyone GitHub]<br>[https://www.alibabacloud.com/blog/quickly-set-up-a-virtual-clothes-try-on-services-with-pai_600484 Tutorial] || [[Computer Vision]] || [[Alibaba]] || [[Animate Anyone]] || | |||
|- | |||
|[[MatterGen: a generative model for inorganic materials design]] || 2023/12/06 || [[arxiv:2312.03687]]<br>[https://twitter.com/xie_tian/status/1732798976779595968 Tweet] || [[Materials Science]] || [[Microsoft]] || [[MatterGen]] || | |||
|- | |||
|[[Audiobox: Generating audio from voice and natural language prompts]] || 2023/11/30 || [https://ai.meta.com/research/publications/audiobox-unified-audio-generation-with-natural-language-prompts/ Paper]<br>[https://ai.meta.com/blog/audiobox-generating-audio-voice-natural-language-prompts/ Website] || [[Audio]] || [[Meta]] || [[Audiobox]] || | |||
|- | |||
|[[Unnatural Error Correction: GPT-4 Can Almost Perfectly Handle Unnatural Scrambled Text]] || 2023/11/30 || [[arxiv:2311.18805]] || [[Natural Language Processing]] || [[University of Tokyo]] || [[Scrambled Bench]] || | |||
|- | |||
|[[MeshGPT: Generating Triangle Meshes with Decoder-Only Transformers]] || 2023/11/27 || [[arxiv:2311.15475]]<br>[https://nihalsid.github.io/mesh-gpt/ Website] || [[Computer Vision]] || || [[MeshGPT]] || | |||
|- | |||
|[[Ferret: Refer and Ground Anything Anywhere at Any Granularity]] || 2023/10/11 || [[arxiv:2310.07704]]<br>[https://github.com/apple/ml-ferret GitHub] || [[Multimodal]]<br>[[Natural Language Processing]] || [[Apple]] || [[Ferret]] || | |||
|- | |||
|[[SeamlessM4T - Massively Multilingual & Multimodal Machine Translation]] || 2023/08/23 || [https://ai.meta.com/research/publications/seamless-m4t/ Paper]<br>[https://ai.meta.com/resources/models-and-libraries/seamless-communication/ Website]<br>[https://www.deepmind.com/blog/rt-2-new-model-translates-vision-and-language-into-action Blogpost]<br>[https://seamless.metademolab.com/ Demo]<br>[https://github.com/facebookresearch/seamless_communication GitHub] || [[Natural Language Processing]] || [[Meta]] || [[SeamlessM4T]] || | |||
|- | |||
|[[RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control]] || 2023/08/01 || [[arxiv:2307.15818]]<br>[https://robotics-transformer2.github.io/ Website]<br>[https://www.deepmind.com/blog/rt-2-new-model-translates-vision-and-language-into-action Blogpost] || [[Robotics]] || [[Google]] || [[RT-2]] || | |||
|- | |||
|[[Towards Generalist Biomedical AI]] || 2023/07/26 || [[arxiv:2307.14334]] || [[Natural Language Processing]] || [[Google]] || [[Med-PaLM]] || | |||
|- | |||
|[[Large Language Models Understand and Can be Enhanced by Emotional Stimuli]] || 2023/07/14 || [[arxiv:2307.11760]] || [[Natural Language Processing]] || [[EmotionPrompt]] || | |||
|- | |||
|[[MusicGen: Simple and Controllable Music Generation]] || 2023/06/08 || [[arxiv:2306.05284]]<br>[https://github.com/facebookresearch/audiocraft GitHub]<br>[https://ai.honu.io/papers/musicgen/ Example] || [[Audio]] || [[Meta]] || [[MusicGen]] || | |||
|- | |||
|[[CodeTF: One-stop Transformer Library for State-of-the-art Code LLM]] || 2023/05/31 || [[arxiv:2306.00029]]<br>[https://github.com/salesforce/CodeTF GitHub] || [[Natural Language Processing]] || [[Salesforce]] || [[CodeTF]] || | |||
|- | |||
|[[Bytes Are All You Need: Transformers Operating Directly On File Bytes]] || 2023/05/31 || [[arxiv:2306.00238]] || [[Computer Vision]] || [[Apple]] || || | |||
|- | |||
|[[Scaling Speech Technology to 1,000+ Languages]] || 2023/05/22 || [https://research.facebook.com/publications/scaling-speech-technology-to-1000-languages/ Paper]<br><[https://ai.facebook.com/blog/multilingual-model-speech-recognition/ Blogpost]<br>[https://github.com/facebookresearch/fairseq/tree/main/examples/mms GitHub]<br>[https://dl.fbaipublicfiles.com/mms/misc/language_coverage_mms.html Languages covered] || [[Natural Language Processing]] || [[Meta]] || [[Massively Multilingual Speech]] ([[MMS]]) || | |||
|- | |||
|[[RWKV: Reinventing RNNs for the Transformer Era]] || 2023/05/22 || [[arxiv:2305.13048]] || [[Natural Language Processing]] || || [[Receptance Weighted Key Value]] ([[RWKV]]) || | |||
|- | |||
|[[ImageBind: One Embedding Space To Bind Them All]] || 2023/05/09 || [[arxiv:2305.05665]]<br>[https://imagebind.metademolab.com/ Website]<br>[https://imagebind.metademolab.com/demo Demo]<br>[https://ai.facebook.com/blog/imagebind-six-modalities-binding-ai/ Blog]<br>[https://github.com/facebookresearch/ImageBind GitHub] || [[Multimodal]]<br>[[Computer Vision]]<br>[[Natural Language Processing]] || [[Meta]] || [[ImageBind]] || | |||
|- | |||
|[[Real-Time Neural Appearance Models]] || 2023/05/05 || [https://research.nvidia.com/labs/rtr/neural_appearance_models/assets/nvidia_neural_materials_paper-2023-05.pdf Paper]<br>[https://research.nvidia.com/labs/rtr/neural_appearance_models/ Blog] || || [[NVIDIA]] || || | |||
|- | |||
|[[Poisoning Language Models During Instruction Tuning]] || 2023/05/01 || [[arxiv:2305.00944]] || [[Natural Language Processing]] || || || | |||
|- | |||
|[[Generative Agents: Interactive Simulacra of Human Behavior]] || 2023/04/07 || [[arxiv:2304.03442]] || [[Human-AI Interaction]]<br>[[Natural Language Processing]] || [[Stanford]] || [[Generative agents]] || | |||
|- | |||
|[[Segment Anything]] || 2023/04/05 || [https://ai.facebook.com/research/publications/segment-anything/ Paper]<br>[https://segment-anything.com/ Website]<br>[https://ai.facebook.com/blog/segment-anything-foundation-model-image-segmentation/ Blog]<br>[https://github.com/facebookresearch/segment-anything GitHub] || [[Computer Vision]] || [[Meta]] || [[Segment Anything Model]] ([[SAM]]) || | |||
|- | |||
|[[HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in HuggingFace (Microsoft JARVIS)]] || 2023/03/30 || [[arxiv:2303.17580]]<br>[https://huggingface.co/spaces/microsoft/HuggingGPT HuggingFace Space]<br>[https://github.com/microsoft/JARVIS JARVIS GitHub] || [[Natural Language Processing]]<br>[[Multimodal]] || [[Microsoft]]<br>[[Hugging Face]] || [[HuggingGPT]]<br>[[JARVIS]] || | |||
|- | |||
|[[BloombergGPT: A Large Language Model for Finance]] || 2023/03/30 || [[arxiv:2303.17564]]<br>[https://www.bloomberg.com/company/press/bloomberggpt-50-billion-parameter-llm-tuned-finance/ press release]<br>[https://twitter.com/rasbt/status/1642880757566676992 twitter thread] || [[Natural Language Processing]] || [[Bloomberg]] || [[BloombergGPT]] || | |||
|- | |||
|[[Sparks of Artificial General Intelligence: Early experiments with GPT-4]] || 2023/03/22 || [[arxiv:2303.12712]] || [[Natural Language Processing]]<br>[[Multimodal]] || [[Microsoft]] || || | |||
|- | |||
|[[Reflexion: an autonomous agent with dynamic memory and self-reflection]] || 2023/03/20 || [[arxiv:2303.11366]]<br>[https://nanothoughts.substack.com/p/reflecting-on-reflexion blog post]<br>[https://github.com/noahshinn024/reflexion GitHub]<br>[https://github.com/GammaTauAI/reflexion-human-eval GitHub 2] || [[Natural Language Processing]] || || [[Reflexion]] || | |||
|- | |||
|[[PaLM-E: An Embodied Multimodal Language Model]] || 2023/03/06 || [[arxiv:2303.03378]]<br>[https://palm-e.github.io/ blog] || [[Natural Language Processing]]<br>[[Multimodal]] || [[Google]] || [[PaLM-E]] || | |||
|- | |||
|[[Google USM: Scaling Automatic Speech Recognition Beyond 100 Languages]] || 2023/03/02 || [[arxiv:2303.01037]]<br>[https://sites.research.google/usm/ blog] || [[Natural Language Processing]] || [[Google]] || [[Universal Speech Model]] ([[USM]]) || | |||
|- | |||
|[[Language Is Not All You Need: Aligning Perception with Language Models (Kosmos-1)]] || 2023/02/27 ||[[arxiv:2302.14045]] || [[Natural Language Processing]] || [[Microsoft]] || [[Kosmos-1]] || | |||
|- | |||
|[[LLaMA: Open and Efficient Foundation Language Models]] || 2023/02/25 ||[https://research.facebook.com/publications/llama-open-and-efficient-foundation-language-models/ paper]<br>[https://ai.facebook.com/blog/large-language-model-llama-meta-ai/ blog post]<br>[https://github.com/facebookresearch/llama github] || [[Natural Language Processing]] || [[Meta]] || [[LLaMA]] || | |||
|- | |||
|[[Structure and Content-Guided Video Synthesis with Diffusion Models (Gen-1)]] || 2023/02/06 ||[[arxiv:2302.03011]]<br>[https://research.runwayml.com/gen1 blog post] || [[Video-to-Video]] || [[Runway]] || [[Gen-1]] || | |||
|- | |||
|[[Dreamix: Video Diffusion Models are General Video Editors]] || 2023/02/03 ||[[arxiv:2302.01329]]<br>[https://dreamix-video-editing.github.io/ blog post] || || [[Google]] || [[Dreamix]] || | |||
|- | |||
|[[FLAME: A small language model for spreadsheet formulas]] || 2023/01/31 ||[[arxiv:2301.13779]] || || [[Microsoft]] || [[FLAME]] || | |||
|- | |||
|[[SingSong: Generating musical accompaniments from singing]] || 2023/01/30 ||[[arxiv:2301.12662]]<br>[https://storage.googleapis.com/sing-song/index.html blog post] || [[Audio]] || || [[SingSong]] || | |||
|- | |||
|[[MusicLM: Generating Music From Text]] || 2023/01/26 ||[[arxiv:2301.11325]]<br>[https://google-research.github.io/seanet/musiclm/examples/ blog post] || [[Audio]] || [[Google]] || [[MusicLM]] || | |||
|- | |||
|[[Mastering Diverse Domains through World Models (DreamerV3)]] || 2023/01/10 ||[[arxiv:2301.04104v1]]<br>[https://danijar.com/project/dreamerv3/ blogpost] || || [[DeepMind]] || [[DreamerV3]] || | |||
|- | |||
|[[Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers (VALL-E)]] || 2023/01/05 ||[[arxiv:2301.02111]]<br>[https://valle-demo.github.io/ Demo] || || [[Micorsoft]] || [[VALL-E]] || | |||
|- | |||
|[[Muse: Text-To-Image Generation via Masked Generative Transformers]] || 2023/01/02 ||[[arxiv:2301.00704]]<br>[https://muse-model.github.io/ blog post] || [[Computer Vision]] || [[Google]] || [[Muse]] || | |||
|- | |||
|[[BLOOM: A 176B-Parameter Open-Access Multilingual Language Model]] || 2022/11/09 || [[arxiv:2211.05100]]<br>[https://bigscience.huggingface.co/blog/bloom Blog Post] || [[Natural Language Processing]] || [[Hugging Face]] || [[BLOOM]] || Open source [[LLM]] that is a competitor to [[GPT-3]] | |||
|- | |||
|[[ReAct: Synergizing Reasoning and Acting in Language Models]] || 2022/10/06 || [[arxiv:2210.03629]]<br>[https://ai.googleblog.com/2022/11/react-synergizing-reasoning-and-acting.html Blogpost]<br>[https://github.com/ysymyth/ReAct GitHub] || [[Natural Language Processing]] || [[Google]] || [[ReAct]] || | |||
|- | |||
|[[AudioLM: a Language Modeling Approach to Audio Generation]] || 2022/09/07 || [[arxiv:2209.03143]]<br>[https://google-research.github.io/seanet/audiolm/examples/ web page]<br>[https://ai.googleblog.com/2022/10/audiolm-language-modeling-approach-to.html blog post] || [[Audio]] || [[Google]] || [[AudioML]] || | |||
|- | |||
|[[Emergent Abilities of Large Language Models]] || 2022/06/15 || [[arxiv:2206.07682]]<br>[https://ai.googleblog.com/2022/11/characterizing-emergent-phenomena-in.html Blog Post] || [[Natural Language Processing]] || [[Google]] || [[Emergent Abilities]] || | |||
|- | |||
|[[Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding (Imagen)]] || 2022/05/23 || [[arxiv:2205.11487]]<br>[https://imagen.research.google/ Blog Post] || [[Computer Vision]] || [[Google]] || [[Imagen]] || | |||
|- | |||
|[[A Generalist Agent (Gato)]] || 2022/05/23 || [[arxiv:2205.06175]]<br>[https://www.deepmind.com/blog/a-generalist-agent Blog Post] || [[Natural Language Processing]]<br>[[Multimodal]] || [[DeepMind]] || [[Gato]] || | |||
|- | |||
|[[Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback]] || 2022/04/12 || [[arxiv:2204.05862]]<br>[https://github.com/anthropics/hh-rlhf GitHub] || [[Natural Language Processing]] || [[Anthropic]] || [[RLHF]] ([[Reinforcement Learning from Human Feedback]]) || | |||
|- | |||
|[[PaLM: Scaling Language Modeling with Pathways]] || 2022/04/05 || [[arxiv:2204.02311]]<br>[https://ai.googleblog.com/2022/04/pathways-language-model-palm-scaling-to.html Blog Post] || [[Natural Language Processing]] || [[Google]] || [[PaLM]] ([[Pathways Language Model]]) || | |||
|- | |||
|[[Chain-of-Thought Prompting Elicits Reasoning in Large Language Models]] || 2022/01/28 || [[arxiv:2201.11903]]<br>[https://ai.googleblog.com/2022/05/language-models-perform-reasoning-via.html Blog Post] || [[Natural Language Processing]] || [[Google]] || [[Chain of Thought Prompting]] (CoT prompting) || | |||
|- | |||
|[[Constitutional AI: Harmlessness from AI Feedback]] || 2021/12/12 || [[arxiv:2212.08073]] || [[Natural Language Processing]] || [[Anthropic]] || [[Constitutional AI]], [[Claude]] || | |||
|- | |||
|[[Improving language models by retrieving from trillions of tokens (RETRO)]] || 2021/12/08 || [[arxiv:2112.04426]]<br>[https://www.deepmind.com/blog/improving-language-models-by-retrieving-from-trillions-of-tokens Blog post] || [[Natural Language Processing]] || [[OpenAI]] || [[RETRO]] ([[Retrieval Enhanced Transformer]]) || | |||
|- | |||
|[[InstructPix2Pix: Learning to Follow Image Editing Instructions]] || 2021/11/17 || [[arxiv:2211.09800]]<br>[https://www.timothybrooks.com/instruct-pix2pix Blog Post] || [[Computer Vision]] || [[UC Berkley]] || [[InstructPix2Pix]] || | |||
|- | |||
|[[REALM: Retrieval-Augmented Language Model Pre-Training]] || 2020/02/10 || [[arxiv:2002.08909]]<br>[https://ai.googleblog.com/2020/08/realm-integrating-retrieval-into.html Blog Post] || [[Natural Language Processing]] || [[Google]] || [[REALM]] ([[Retrieval-Augmented Language Model Pre-Training]]) || | |||
|- | |||
|[[Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer (T5)]] || 2019/10/23 || [[arxiv:1910.10683]] || [[Natural Language Processing]]<br>[https://ai.googleblog.com/2020/02/exploring-transfer-learning-with-t5.html blog post] || [[Google]] || [[T5]] ([[Text-To-Text Transfer Transformer]]) || | |||
|- | |||
|[[RoBERTa: A Robustly Optimized BERT Pretraining Approach]] || 2019/07/26 || [[arxiv:1907.11692]] || [[Natural Language Processing]]<br>[https://ai.facebook.com/blog/roberta-an-optimized-method-for-pretraining-self-supervised-nlp-systems/ blog post] || [[Meta]] || [[RoBERTa]] ([[Robustly Optimized BERT Pretraining Approach]]) || | |||
|- | |||
|[[Probabilistic Face Embeddings]] || 2019/04/21 || [[arxiv:1904.09658]] || [[Computer Vision]] || || [[PFE]]s ([[Probabilistic Face Embeddings]]) || | |||
|- | |||
|[[Language Models are Unsupervised Multitask Learners (GPT-2)]] || 2018 || [https://d4mucfpksywv.cloudfront.net/better-language-models/language_models_are_unsupervised_multitask_learners.pdf paper] || [[Natural Language Processing]] || [[OpenAI]] || [[GPT-2]] || | |||
|- | |||
|[[Deep reinforcement learning from human preferences]] || 2017/06/12 || [[arxiv:1706.03741]]<br>[https://openai.com/research/learning-from-human-preferences Blog post]<br>[https://github.com/mrahtz/learning-from-human-preferences GitHub] || || [[OpenAI]] || [[RLHF]] ([[Reinforcement Learning from Human Feedback]]) || | |||
|- | |||
|} |