Multimodal AI
22 articles
CLIP Score
AI Benchmarks, Computer Vision, Image Generation
EgoSchema
AI Benchmarks, Computer Vision, Video Understanding
GPT Image 1
AI Models, Generative AI, Image Generation
Gemini (app)
AI Chatbots, Consumer AI, Google
Gemini 2.5 Flash
AI Models, Google DeepMind, Large Language Models
Gemini 3
Frontier Models, Gemini, Google
Gemini 3 Pro
Google, Models
Gemma 3
AI Models, Google, Google DeepMind
LINGO-2 (Wayve)
AI Models, Autonomous Vehicles, Vision Language Models
LLaVA (Large Language and Vision Assistant)
Open-Source Models, Vision-Language Models
Llama 3.2
AI Models, LLaMA, Large Language Models
Llama 4 Scout and Maverick
AI Models, LLaMA, Large Language Models
MMMU
AI Benchmarks, Machine Learning
MMMU-Pro
AI Benchmarks, Computer Vision, Large Language Models
MathVista
Benchmarks
PaLM-E: An Embodied Multimodal Language Model
Embodied AI, Google DeepMind
Paper2Video
AI Research, Articles with short description, Benchmarks
Pika (video generation)
AI Companies, AI Models, Artificial Intelligence
Pixtral
AI Models, Large Language Models, Mistral AI
Reka AI
AI Companies, Large Language Models
Sora 2
AI Models, Generative AI, OpenAI
Video-MME
AI Benchmarks, Computer Vision, Video Understanding