AI Wiki
Category

Multimodal AI

93 articles

Baidu ERNIE

Chinese AI, Large Language Models

CLIP (Contrastive Language-Image Pre-training)

Computer Vision, Deep Learning, Machine Learning

CLIP Score

AI Benchmarks, Computer Vision, Image Generation

CM3leon

Image Generation, Meta AI

Chameleon (Meta AI)

AI Models, Meta AI

Claude Sonnet 4.5

AI Code Generation, AI Tools & Products, Anthropic

CogAgent

AI Agents, Chinese AI

CogVLM

Chinese AI, Open Source AI

Computer-use agent

AI Agents, Artificial Intelligence, Computer Science

DeepSeek Janus

AI Models, Chinese AI

DeepSeek-OCR

Chinese AI, Computer Vision, Open Source AI

DeepSeek-VL

AI Models, Chinese AI

DeepSeek-VL2

Chinese AI, Mixture of Experts

Document Question Answering Models

AI Models

Doubao Seed 1.6

AI Models, Chinese AI, Large Language Models

ERNIE 4.5

Chinese AI, Large Language Models, Open Source AI

ERNIE 5.0

Chinese AI, Large Language Models

ERQA

AI Benchmarks, Embodied AI, Google DeepMind

EgoSchema

AI Benchmarks, Computer Vision

Feature Extraction Models

AI Models

Flamingo (visual language model)

AI Models, Google DeepMind

Fox (benchmark)

AI Benchmarks, Computer Vision

GPT Image 1

AI Models, Generative AI, Image Generation

GPT-4V (Vision)

Large Language Models, OpenAI

GPT-4o mini

OpenAI, Small Language Models

Gemini (app)

AI Tools & Products, Conversational AI, Google

Gemini 1.0

Google DeepMind, Large Language Models

Gemini 1.5 Flash

Google DeepMind, Large Language Models

Gemini 1.5 Pro

Google DeepMind, Large Language Models

Gemini 2.0 Flash

Google DeepMind, Large Language Models

Gemini 2.0 Flash-Lite

Google DeepMind, Large Language Models

Gemini 2.5 Flash

AI Models, Google DeepMind, Large Language Models

Gemini 3

AI Models, Google, Google DeepMind

Gemini 3 Flash

AI Models, Google, Large Language Models

Gemini 3 Pro

AI Models, Google

Gemini 3.1 Pro

Google DeepMind, Large Language Models

Gemini 3.5 Flash

Google DeepMind, Large Language Models

Gemini Ultra

Google DeepMind, Large Language Models

Gemma 3

AI Models, Google, Google DeepMind

HY-World 2.0

Chinese AI, World Models

Image-to-Text Models

Machine Learning

ImageBind

AI Models, Meta AI

InternVL

Chinese AI, Open Source AI

LINGO-2 (Wayve)

AI Models, Autonomous Vehicles

LLaVA (Large Language and Vision Assistant)

Open Source AI

LayoutLM

Large Language Models

Llama 3.2

AI Models, Large Language Models, Meta AI

Llama 3.2 Vision

Large Language Models, Meta AI

Llama 4 Scout and Maverick

AI Models, Large Language Models, Meta AI

MM-BrowseComp

AI Agents, AI Benchmarks

MMMU

AI Benchmarks, Machine Learning

MMMU-Pro

AI Benchmarks, Computer Vision, Large Language Models

MMStar

AI Benchmarks

MathVista

AI Benchmarks

MetaCLIP

Data & Datasets, Meta AI

MiniCPM-V

Chinese AI, Open Source AI

Molmo

AI Models, Open Source AI

Muse Spark

Meta AI, Reasoning Models

NVIDIA Cosmos Reason

Embodied AI, NVIDIA

NVLM

Large Language Models, NVIDIA