Papers: Difference between revisions
Line 84: | Line 84: | ||
|- | |- | ||
|[[Bytes Are All You Need: Transformers Operating Directly On File Bytes]] || 2023/05/31 || [[arxiv:2306.00238]] || [[Computer Vision]] || [[Apple]] || || | |[[Bytes Are All You Need: Transformers Operating Directly On File Bytes]] || 2023/05/31 || [[arxiv:2306.00238]] || [[Computer Vision]] || [[Apple]] || || | ||
|- | |||
|[[Scaling Speech Technology to 1,000+ Languages]] || 2023/05/22 || [https://research.facebook.com/publications/scaling-speech-technology-to-1000-languages/ Paper]<br><[https://ai.facebook.com/blog/multilingual-model-speech-recognition/ Blogpost]<br>[https://github.com/facebookresearch/fairseq/tree/main/examples/mms GitHub]<br>[https://dl.fbaipublicfiles.com/mms/misc/language_coverage_mms.html Languages covered] || [[Natural Language Processing]] || [[Meta]] || [[Massively Multilingual Speech]] ([[MMS]]) || | |||
|- | |- | ||
|[[ImageBind: One Embedding Space To Bind Them All]] || 2023/05/09 || [[arxiv:2305.05665]]<br>[https://imagebind.metademolab.com/ Website]<br>[https://imagebind.metademolab.com/demo Demo]<br>[https://ai.facebook.com/blog/imagebind-six-modalities-binding-ai/ Blog]<br>[https://github.com/facebookresearch/ImageBind GitHub] || [[Multimodal]]<br>[[Computer Vision]]<br>[[Natural Language Processing]] || [[Meta]] || [[ImageBind]] || | |[[ImageBind: One Embedding Space To Bind Them All]] || 2023/05/09 || [[arxiv:2305.05665]]<br>[https://imagebind.metademolab.com/ Website]<br>[https://imagebind.metademolab.com/demo Demo]<br>[https://ai.facebook.com/blog/imagebind-six-modalities-binding-ai/ Blog]<br>[https://github.com/facebookresearch/ImageBind GitHub] || [[Multimodal]]<br>[[Computer Vision]]<br>[[Natural Language Processing]] || [[Meta]] || [[ImageBind]] || |