Multimodal AI
93 articles
Baidu ERNIE
Chinese AI, Large Language Models
CLIP (Contrastive Language-Image Pre-training)
Computer Vision, Deep Learning, Machine Learning
CLIP Score
AI Benchmarks, Computer Vision, Image Generation
CM3leon
Image Generation, Meta AI
Chameleon (Meta AI)
AI Models, Meta AI
Claude Sonnet 4.5
AI Code Generation, AI Tools & Products, Anthropic
CogAgent
AI Agents, Chinese AI
CogVLM
Chinese AI, Open Source AI
Computer-use agent
AI Agents, Artificial Intelligence, Computer Science
DeepSeek Janus
AI Models, Chinese AI
DeepSeek-OCR
Chinese AI, Computer Vision, Open Source AI
DeepSeek-VL
AI Models, Chinese AI
DeepSeek-VL2
Chinese AI, Mixture of Experts
Document Question Answering Models
AI Models
Doubao Seed 1.6
AI Models, Chinese AI, Large Language Models
ERNIE 4.5
Chinese AI, Large Language Models, Open Source AI
ERNIE 5.0
Chinese AI, Large Language Models
ERQA
AI Benchmarks, Embodied AI, Google DeepMind
EgoSchema
AI Benchmarks, Computer Vision
Feature Extraction Models
AI Models
Flamingo (visual language model)
AI Models, Google DeepMind
Fox (benchmark)
AI Benchmarks, Computer Vision
GPT Image 1
AI Models, Generative AI, Image Generation
GPT-4V (Vision)
Large Language Models, OpenAI
GPT-4o mini
OpenAI, Small Language Models
Gemini (app)
AI Tools & Products, Conversational AI, Google
Gemini 1.0
Google DeepMind, Large Language Models
Gemini 1.5 Flash
Google DeepMind, Large Language Models
Gemini 1.5 Pro
Google DeepMind, Large Language Models
Gemini 2.0 Flash
Google DeepMind, Large Language Models
Gemini 2.0 Flash-Lite
Google DeepMind, Large Language Models
Gemini 2.5 Flash
AI Models, Google DeepMind, Large Language Models
Gemini 3
AI Models, Google, Google DeepMind
Gemini 3 Flash
AI Models, Google, Large Language Models
Gemini 3 Pro
AI Models, Google
Gemini 3.1 Pro
Google DeepMind, Large Language Models
Gemini 3.5 Flash
Google DeepMind, Large Language Models
Gemini Ultra
Google DeepMind, Large Language Models
Gemma 3
AI Models, Google, Google DeepMind
HY-World 2.0
Chinese AI, World Models
Image-to-Text Models
Machine Learning
ImageBind
AI Models, Meta AI
InternVL
Chinese AI, Open Source AI
LINGO-2 (Wayve)
AI Models, Autonomous Vehicles
LLaVA (Large Language and Vision Assistant)
Open Source AI
LayoutLM
Large Language Models
Llama 3.2
AI Models, Large Language Models, Meta AI
Llama 3.2 Vision
Large Language Models, Meta AI
Llama 4 Scout and Maverick
AI Models, Large Language Models, Meta AI
MM-BrowseComp
AI Agents, AI Benchmarks
MMMU
AI Benchmarks, Machine Learning
MMMU-Pro
AI Benchmarks, Computer Vision, Large Language Models
MMStar
AI Benchmarks
MathVista
AI Benchmarks
MetaCLIP
Data & Datasets, Meta AI
MiniCPM-V
Chinese AI, Open Source AI
Molmo
AI Models, Open Source AI
Muse Spark
Meta AI, Reasoning Models
NVIDIA Cosmos Reason
Embodied AI, NVIDIA
NVLM
Large Language Models, NVIDIA