Papers: Difference between revisions
(22 intermediate revisions by 4 users not shown) | |||
Line 80: | Line 80: | ||
!Product | !Product | ||
!Note | !Note | ||
|- | |||
|[[Self-Rewarding Language Models]] || 2024/01/18 || [[arxiv:2401.10020]] || [[Natural Language Processing]] || [[Meta]] || || | |||
|- | |||
|[[LLM in a flash: Efficient Large Language Model Inference with Limited Memory]] || 2023/12/12 || [[arxiv:2312.11514]]<br>[https://huggingface.co/papers/2312.11514 HuggingFace] || [[Natural Language Processing]] || [[Apple]] || || | |||
|- | |||
|[[Animate Anyone: Consistent and Controllable Image-to-Video Synthesis for Character Animation]] || 2023/12/07 || [[arxiv:2311.17117]]<br>[https://humanaigc.github.io/animate-anyone/ Website]<br>[https://www.youtube.com/watch?v=8PCn5hLKNu4 Video]<br>[https://github.com/HumanAIGC/AnimateAnyone GitHub]<br>[https://www.alibabacloud.com/blog/quickly-set-up-a-virtual-clothes-try-on-services-with-pai_600484 Tutorial] || [[Computer Vision]] || [[Alibaba]] || [[Animate Anyone]] || | |||
|- | |||
|[[MatterGen: a generative model for inorganic materials design]] || 2023/12/06 || [[arxiv:2312.03687]]<br>[https://twitter.com/xie_tian/status/1732798976779595968 Tweet] || [[Materials Science]] || [[Microsoft]] || [[MatterGen]] || | |||
|- | |||
|[[Audiobox: Generating audio from voice and natural language prompts]] || 2023/11/30 || [https://ai.meta.com/research/publications/audiobox-unified-audio-generation-with-natural-language-prompts/ Paper]<br>[https://ai.meta.com/blog/audiobox-generating-audio-voice-natural-language-prompts/ Website] || [[Audio]] || [[Meta]] || [[Audiobox]] || | |||
|- | |||
|[[Unnatural Error Correction: GPT-4 Can Almost Perfectly Handle Unnatural Scrambled Text]] || 2023/11/30 || [[arxiv:2311.18805]] || [[Natural Language Processing]] || [[University of Tokyo]] || [[Scrambled Bench]] || | |||
|- | |||
|[[MeshGPT: Generating Triangle Meshes with Decoder-Only Transformers]] || 2023/11/27 || [[arxiv:2311.15475]]<br>[https://nihalsid.github.io/mesh-gpt/ Website] || [[Computer Vision]] || || [[MeshGPT]] || | |||
|- | |||
|[[Ferret: Refer and Ground Anything Anywhere at Any Granularity]] || 2023/10/11 || [[arxiv:2310.07704]]<br>[https://github.com/apple/ml-ferret GitHub] || [[Multimodal]]<br>[[Natural Language Processing]] || [[Apple]] || [[Ferret]] || | |||
|- | |||
|[[SeamlessM4T - Massively Multilingual & Multimodal Machine Translation]] || 2023/08/23 || [https://ai.meta.com/research/publications/seamless-m4t/ Paper]<br>[https://ai.meta.com/resources/models-and-libraries/seamless-communication/ Website]<br>[https://www.deepmind.com/blog/rt-2-new-model-translates-vision-and-language-into-action Blogpost]<br>[https://seamless.metademolab.com/ Demo]<br>[https://github.com/facebookresearch/seamless_communication GitHub] || [[Natural Language Processing]] || [[Meta]] || [[SeamlessM4T]] || | |||
|- | |||
|[[RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control]] || 2023/08/01 || [[arxiv:2307.15818]]<br>[https://robotics-transformer2.github.io/ Website]<br>[https://www.deepmind.com/blog/rt-2-new-model-translates-vision-and-language-into-action Blogpost] || [[Robotics]] || [[Google]] || [[RT-2]] || | |||
|- | |||
|[[Towards Generalist Biomedical AI]] || 2023/07/26 || [[arxiv:2307.14334]] || [[Natural Language Processing]] || [[Google]] || [[Med-PaLM]] || | |||
|- | |||
|[[Large Language Models Understand and Can be Enhanced by Emotional Stimuli]] || 2023/07/14 || [[arxiv:2307.11760]] || [[Natural Language Processing]] || [[EmotionPrompt]] || | |||
|- | |||
|[[MusicGen: Simple and Controllable Music Generation]] || 2023/06/08 || [[arxiv:2306.05284]]<br>[https://github.com/facebookresearch/audiocraft GitHub]<br>[https://ai.honu.io/papers/musicgen/ Example] || [[Audio]] || [[Meta]] || [[MusicGen]] || | |||
|- | |||
|[[CodeTF: One-stop Transformer Library for State-of-the-art Code LLM]] || 2023/05/31 || [[arxiv:2306.00029]]<br>[https://github.com/salesforce/CodeTF GitHub] || [[Natural Language Processing]] || [[Salesforce]] || [[CodeTF]] || | |||
|- | |||
|[[Bytes Are All You Need: Transformers Operating Directly On File Bytes]] || 2023/05/31 || [[arxiv:2306.00238]] || [[Computer Vision]] || [[Apple]] || || | |||
|- | |||
|[[Scaling Speech Technology to 1,000+ Languages]] || 2023/05/22 || [https://research.facebook.com/publications/scaling-speech-technology-to-1000-languages/ Paper]<br><[https://ai.facebook.com/blog/multilingual-model-speech-recognition/ Blogpost]<br>[https://github.com/facebookresearch/fairseq/tree/main/examples/mms GitHub]<br>[https://dl.fbaipublicfiles.com/mms/misc/language_coverage_mms.html Languages covered] || [[Natural Language Processing]] || [[Meta]] || [[Massively Multilingual Speech]] ([[MMS]]) || | |||
|- | |||
|[[RWKV: Reinventing RNNs for the Transformer Era]] || 2023/05/22 || [[arxiv:2305.13048]] || [[Natural Language Processing]] || || [[Receptance Weighted Key Value]] ([[RWKV]]) || | |||
|- | |- | ||
|[[ImageBind: One Embedding Space To Bind Them All]] || 2023/05/09 || [[arxiv:2305.05665]]<br>[https://imagebind.metademolab.com/ Website]<br>[https://imagebind.metademolab.com/demo Demo]<br>[https://ai.facebook.com/blog/imagebind-six-modalities-binding-ai/ Blog]<br>[https://github.com/facebookresearch/ImageBind GitHub] || [[Multimodal]]<br>[[Computer Vision]]<br>[[Natural Language Processing]] || [[Meta]] || [[ImageBind]] || | |[[ImageBind: One Embedding Space To Bind Them All]] || 2023/05/09 || [[arxiv:2305.05665]]<br>[https://imagebind.metademolab.com/ Website]<br>[https://imagebind.metademolab.com/demo Demo]<br>[https://ai.facebook.com/blog/imagebind-six-modalities-binding-ai/ Blog]<br>[https://github.com/facebookresearch/ImageBind GitHub] || [[Multimodal]]<br>[[Computer Vision]]<br>[[Natural Language Processing]] || [[Meta]] || [[ImageBind]] || | ||
|- | |- | ||
|[[Real-Time Neural Appearance Models]] || 2023/05/05 || [https://research.nvidia.com/labs/rtr/neural_appearance_models/assets/nvidia_neural_materials_paper-2023-05.pdf Paper]<br>[https://research.nvidia.com/labs/rtr/neural_appearance_models/ Blog] || || [[NVIDIA]] || || | |[[Real-Time Neural Appearance Models]] || 2023/05/05 || [https://research.nvidia.com/labs/rtr/neural_appearance_models/assets/nvidia_neural_materials_paper-2023-05.pdf Paper]<br>[https://research.nvidia.com/labs/rtr/neural_appearance_models/ Blog] || || [[NVIDIA]] || || | ||
|- | |||
|[[Poisoning Language Models During Instruction Tuning]] || 2023/05/01 || [[arxiv:2305.00944]] || [[Natural Language Processing]] || || || | |||
|- | |- | ||
|[[Generative Agents: Interactive Simulacra of Human Behavior]] || 2023/04/07 || [[arxiv:2304.03442]] || [[Human-AI Interaction]]<br>[[Natural Language Processing]] || [[Stanford]] || [[Generative agents]] || | |[[Generative Agents: Interactive Simulacra of Human Behavior]] || 2023/04/07 || [[arxiv:2304.03442]] || [[Human-AI Interaction]]<br>[[Natural Language Processing]] || [[Stanford]] || [[Generative agents]] || | ||
Line 154: | Line 190: | ||
|- | |- | ||
|[[Language Models are Unsupervised Multitask Learners (GPT-2)]] || 2018 || [https://d4mucfpksywv.cloudfront.net/better-language-models/language_models_are_unsupervised_multitask_learners.pdf paper] || [[Natural Language Processing]] || [[OpenAI]] || [[GPT-2]] || | |[[Language Models are Unsupervised Multitask Learners (GPT-2)]] || 2018 || [https://d4mucfpksywv.cloudfront.net/better-language-models/language_models_are_unsupervised_multitask_learners.pdf paper] || [[Natural Language Processing]] || [[OpenAI]] || [[GPT-2]] || | ||
|- | |||
|[[Deep reinforcement learning from human preferences]] || 2017/06/12 || [[arxiv:1706.03741]]<br>[https://openai.com/research/learning-from-human-preferences Blog post]<br>[https://github.com/mrahtz/learning-from-human-preferences GitHub] || || [[OpenAI]] || [[RLHF]] ([[Reinforcement Learning from Human Feedback]]) || | |||
|- | |- | ||
|} | |} |