Template:Infobox software
Ollama is an open-source tool designed to simplify the deployment and management of large language models (LLMs) locally on personal computers and servers.[1] It provides a streamlined interface for downloading, running, and managing various open-source LLMs without requiring cloud computing services or extensive technical expertise, positioning itself as "Docker for LLMs."[2]
Ollama was founded in 2021 by Jeffrey Morgan and Michael Chiang in Palo Alto, California.[3][4] The company participated in Y Combinator's Winter 2021 batch and raised $125,000 in pre-seed funding from investors including Y Combinator, Essence Venture Capital, Rogue Capital, and Sunflower Capital.[5]
Prior to founding Ollama, Morgan and Chiang, along with Sean Li, created Kitematic, a tool designed to simplify Docker container management on macOS, which was eventually acquired by Docker, Inc.[6][7] Jeffrey Morgan and Sean Li graduated from the University of Waterloo (BASc 2013, Software Engineering), while Michael Chiang was an electrical engineering student there at the time of Kitematic's acquisition.[7] This experience in making complex command-line tools accessible through simpler interfaces directly influenced Ollama's design philosophy.[6]
The platform quickly gained traction in the open-source AI community for its ease of use and Docker-like simplicity in managing LLMs.[5] Initial releases focused on core functionality for running models like LLaMA 2, with subsequent updates introducing features such as multimodal support and tool calling.[8]
| Date | Milestone | Notes |
|---|---|---|
| 2021 | Company Founded | Participated in Y Combinator W21 batch |
| March 23, 2021 | Pre-seed Funding | Raised $125,000 from Y Combinator and other investors[9] |
| 2023 | Public Launch | Basic model management and inference capabilities |
| February 8, 2024 | OpenAI Compatibility | Initial compatibility with the OpenAI Chat Completions API at /v1/chat/completions[10]
|
| February 15, 2024 | Windows Preview | Native Windows build with built-in GPU acceleration and always-on API[11] |
| March 14, 2024 | AMD GPU Preview | Preview acceleration on supported AMD Radeon/Instinct cards on Windows and Linux[12] |
| July 30, 2025 | Desktop App | Official GUI app for macOS/Windows with file drag-and-drop and context-length controls[13] |
| September 19, 2025 | Cloud Models (Preview) | Option to run larger models on datacenter hardware while maintaining local workflows[14] |
| September 26, 2025 | Version 0.12.3 | Latest stable release with web search API and performance optimizations[15] |
Ollama is built primarily in Go and leverages llama.cpp as its underlying inference engine through CGo bindings.[16][17] The llama.cpp project, created by Georgi Gerganov in March 2023, provides an efficient C++ implementation of LLaMA and other language models, enabling them to run on consumer-grade hardware.[16]
Ollama primarily uses the GGUF (GPT-Generated Unified Format) file format for storing and loading models.[18] GGUF replaced the earlier GGML format and provides better compatibility, metadata handling, and performance optimization for quantized models.[19] This quantization is what allows massive models (for example 70 billion parameters) to run on machines with limited VRAM.
Ollama can also import models from specific Safetensors directories for supported architectures (for example Llama, Mistral, Gemma, Phi).[20]
| Area | Details | Notes |
|---|---|---|
| Server & Port | Local HTTP server at 127.0.0.1:11434 |
Configurable via OLLAMA_HOST environment variable[21]
|
| Core Endpoints | /api/generate, /api/chat, /api/embeddings, model management |
Streaming JSON supported[22] |
| OpenAI Compatibility | /v1/chat/completions |
Drop-in replacement for many OpenAI-based clients[10] |
| Local-First Design | All processing occurs locally | Ensures complete data privacy[2] |
| Multimodal Support | Text, images, and other data types | Self-contained projection layers[23] |
| Tool Calling | External function calls | Enhances reasoning and automation[8] |
| Cloud Integration | Hybrid mode for larger models | Maintains local workflows (v0.12.0+)[14] |
| Performance | Flash attention, GPU/CPU overlap | Batch processing for efficiency[15] |
A key component of Ollama is the Modelfile, which serves as a blueprint for creating and sharing models.[20] Similar to a Dockerfile, the Modelfile defines model behavior and configuration.
| Instruction | Description | Example |
|---|---|---|
| FROM | (Required) Specifies the base model or local GGUF path | FROM llama3.2 or FROM ./model.gguf
|
| PARAMETER | Sets model parameters | PARAMETER temperature 0.7PARAMETER num_ctx 4096
|
| SYSTEM | Defines system message/persona | SYSTEM "You are a helpful assistant"
|
| TEMPLATE | Sets prompt template format | TEMPLATE "[INST] Template:.System Template:.Prompt [/INST]"
|
| ADAPTER | Applies LoRA/QLoRA adapters | ADAPTER /path/to/adapter.bin
|
| LICENSE | Specifies model license | LICENSE "MIT"
|
| MESSAGE | Provides conversation history for few-shot learning | MESSAGE user "What is 1+1?"MESSAGE assistant "2"
|
# Specify the base model FROM llama3.2 # Set model parameters PARAMETER temperature 0.8 PARAMETER num_ctx 4096 PARAMETER stop </s> # Set the system message SYSTEM """ You are an expert Python programming assistant. Always provide clear, concise code examples. Your responses must be formatted in Markdown. """ # Define the chat template TEMPLATE """ <|start_header_id|>system<|end_header_id|> {{ .System }}<|eot_id|> <|start_header_id|>user<|end_header_id|> {{ .Prompt }}<|eot_id|> <|start_header_id|>assistant<|end_header_id|> """
This custom model can be created with: ollama create my-assistant -f ./Modelfile
| Command | Description | Example |
|---|---|---|
ollama run |
Runs a model interactively | ollama run llama3.2
|
ollama pull |
Downloads a model | ollama pull gemma:2b
|
ollama create |
Creates custom model from Modelfile | ollama create mymodel -f ./Modelfile
|
ollama list |
Lists installed models | ollama list
|
ollama rm |
Removes a model | ollama rm llama3.2
|
ollama cp |
Copies a model | ollama cp llama3.2 mymodel
|
ollama push |
Uploads model to registry | ollama push mymodel
|
ollama serve |
Starts the Ollama server | ollama serve
|
Ollama exposes a REST API on port 11434 by default, providing programmatic access to model functionality:[22]
| Endpoint | Method | Description | Example |
|---|---|---|---|
/api/generate |
POST | Generate text completion | curl -X POST http://localhost:11434/api/generate -d '{"model":"llama3.2","prompt":"Hello"}'
|
/api/chat |
POST | Chat conversation interface | curl -X POST http://localhost:11434/api/chat -d '{"model":"llama3.2","messages":[{"role":"user","content":"Hi"}]}'
|
/api/embeddings |
POST | Generate text embeddings | curl -X POST http://localhost:11434/api/embeddings -d '{"model":"llama3.2","prompt":"Test"}'
|
/api/pull |
POST | Download a model | curl -X POST http://localhost:11434/api/pull -d '{"name":"llama3.2"}'
|
/api/show |
POST | Show model information | curl -X POST http://localhost:11434/api/show -d '{"name":"llama3.2"}'
|
/api/list |
GET | List installed models | curl http://localhost:11434/api/list
|
Ollama supports a wide range of open-source language models, continuously updated as new models are released:[24]
| Model Name | Parameters | Category | Description | Creator |
|---|---|---|---|---|
| Llama 3.2 | 1B, 3B, 11B, 90B | Source/Fine-Tuned | Meta's latest open-source LLM with improved reasoning and tool support | Meta AI |
| Gemma 2 | 2B, 9B, 27B | Source | Google's lightweight models for efficient on-device inference | |
| DeepSeek-R1 | 7B, 67B | Tools/Multimodal | Hybrid model supporting thinking modes for complex tasks | DeepSeek AI |
| Mistral / Mixtral | 7B, 8x7B, 8x22B | Source | High-efficiency models, often outperforming larger models | Mistral AI |
| Qwen2.5 | Up to 72B | Source | Alibaba's multilingual models supporting 128K tokens | Alibaba |
| Phi 4 | 3B, 14B | Fine-Tuned | Microsoft's small language models for efficient reasoning | Microsoft |
| CodeLlama | 7B, 13B, 34B | Code | Specialized for code generation and programming tasks | Meta AI |
| LLaVA | 7B, 13B | Multimodal | Visual language model for text and image understanding | Various |
| Snowflake Arctic Embed | 568M | Embedding | Multilingual embedding model for retrieval tasks | Snowflake |
| Firefunction | 7B | Tools | Function-calling model for automation and integration | Various |
| Vicuna, Alpaca | Various | Fine-Tuned | LLaMA derivatives with specialized capabilities | Various |
| Platform | Minimum Version | Installation Method |
|---|---|---|
| macOS | 11 Big Sur or later | Download .dmg from official website
|
| Linux | Ubuntu 18.04 or equivalent | sh |
| Windows | Windows 10 22H2 or later | Download .exe installer
|
| Docker | Any platform | docker pull ollama/ollama
|
| Model Size | RAM Required | Storage | GPU VRAM (Optional) |
|---|---|---|---|
| 3B parameters | 8GB | 10GB+ | 4GB |
| 7B parameters | 16GB | 20GB+ | 8GB |
| 13B parameters | 32GB | 40GB+ | 16GB |
| 70B parameters | 64GB+ | 100GB+ | 48GB+ |
Ollama provides official client libraries:[25]
pip install ollamanpm install ollamaBy default, Ollama operates entirely locally:[21]
127.0.0.1:11434 (loopback interface only)To expose on a network, users must explicitly set OLLAMA_HOST environment variable (for example OLLAMA_HOST=0.0.0.0:11434).
Ollama has addressed several security vulnerabilities:[29]
| CVE | Description | Affected Versions | Status |
|---|---|---|---|
| CVE-2024-37032 | Remote code execution via API misconfiguration (Probllama) | <0.1.34 | Fixed[30] |
| CVE-2025-0312 | Malicious GGUF model exploitation | ≤0.3.14 | Fixed[31] |
| CNVD-2025-04094 | Unauthorized access due to improper configuration | Various | Configuration issue[32] |
Users are advised to keep Ollama updated and configure securely, especially when exposing APIs.
| Feature | Ollama | LM Studio |
|---|---|---|
| Interface | CLI-focused | GUI-focused |
| License | MIT (open source) | Proprietary (free to use) |
| Model Sources | Ollama registry + GGUF | Hugging Face + GGUF |
| Concurrent Handling | Excellent (batching) | Limited |
| macOS Performance | Good | Better (MLX support) |
| Automation | Excellent | Limited |
| API | Built-in REST API | Available |
Ollama is preferred for its flexibility in integrations and developer-centric approach, while LM Studio offers a more user-friendly GUI.[33]
Ollama has a vibrant community on GitHub, with over 90,000 stars and active contributions focusing on performance and compatibility.[34] It has been praised for advancing local AI technologies, cost savings, and accessibility.[35]
Some criticism has arisen regarding licensing compliance issues with dependencies like llama.cpp, with community members raising concerns about proper attribution.[36] The Ollama team has been working to address these concerns.
Ollama has been a key driver in the democratization of large language models by:[37]
The tool is widely used in education, research, and enterprise for privacy-sensitive applications and has become a foundational tool in the open-source AI movement.