AI Wiki
Category

Multimodal AI

29 articles

CLIP Score

AI Benchmarks, Computer Vision, Image Generation

ERNIE 4.5

Chinese AI, Large Language Models, Open Source AI

ERNIE 5.0

Chinese AI, Large Language Models

EgoSchema

AI Benchmarks, Computer Vision, Video Understanding

GPT Image 1

AI Models, Generative AI, Image Generation

Gemini (app)

AI Chatbots, Consumer AI, Google

Gemini 2.5 Flash

AI Models, Google DeepMind, Large Language Models

Gemini 3

Frontier Models, Gemini, Google

Gemini 3 Pro

Google, Models

Gemma 3

AI Models, Google, Google DeepMind

LINGO-2 (Wayve)

AI Models, Autonomous Vehicles, Vision Language Models

LLaVA (Large Language and Vision Assistant)

Open-Source Models, Vision-Language Models

Llama 3.2

AI Models, LLaMA, Large Language Models

Llama 4 Scout and Maverick

AI Models, LLaMA, Large Language Models

MMMU

AI Benchmarks, Machine Learning

MMMU-Pro

AI Benchmarks, Computer Vision, Large Language Models

MathVista

Benchmarks

Nano Banana Pro

Generative AI, Image Generation

PaLM-E: An Embodied Multimodal Language Model

Embodied AI, Google DeepMind

Paper2Video

AI Research, Articles with short description, Benchmarks

Pika (video generation)

AI Companies, AI Models, Artificial Intelligence

Pixtral

AI Models, Large Language Models, Mistral AI

Qwen3-Omni

Chinese AI, Large Language Models, Open Source AI

Qwen3-VL

Chinese AI, Large Language Models, Open Source AI

Reka AI

AI Companies, Large Language Models

Seedream 4.0

Chinese AI, Generative AI, Image Generation

Sora 2

AI Models, Generative AI, OpenAI

Video-MME

AI Benchmarks, Computer Vision, Video Understanding

ZeroBench

AI Benchmarks, Computer Vision