Papers: Difference between revisions

46 bytes added ,  7 February 2023
no edit summary
No edit summary
No edit summary
Line 52: Line 52:
|[[An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale (ViT)]] || 2020/10/22 || [[arxiv:2010.11929]]<br>[https://github.com/google-research/vision_transformer GitHub] || [[Computer Vision]] || [[Google]] || [[Vision Transformer]] ([[ViT]])
|[[An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale (ViT)]] || 2020/10/22 || [[arxiv:2010.11929]]<br>[https://github.com/google-research/vision_transformer GitHub] || [[Computer Vision]] || [[Google]] || [[Vision Transformer]] ([[ViT]])
|-
|-
|[[Learning Transferable Visual Models From Natural Language Supervision (CLIP)]] || 2021/02/26 || [[arxiv:2103.00020]]<br>[https://openai.com/blog/clip/ Blog Post] || [[Computer Vision]] || [[OpenAI]] || [[CLIP]]
|[[Learning Transferable Visual Models From Natural Language Supervision (CLIP)]] || 2021/02/26 || [[arxiv:2103.00020]]<br>[https://openai.com/blog/clip/ Blog Post] || [[Computer Vision]] || [[OpenAI]] || [[CLIP]] ([[Contrastive Language-Image Pre-Training]])
|-
|-
|[[MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer]] || 2021/10/05 || [[arxiv:2110.02178]]<br>[https://github.com/apple/ml-cvnets GitHub] || [[Computer Vision]] || [[Apple]] || [[MobileViT]]
|[[MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer]] || 2021/10/05 || [[arxiv:2110.02178]]<br>[https://github.com/apple/ml-cvnets GitHub] || [[Computer Vision]] || [[Apple]] || [[MobileViT]]