See also: LLMs, GPT-4, Llama, Apache 2.0 License
The split between proprietary and open-source large language models (LLMs) is one of the defining structural choices in modern artificial intelligence. On one side are closed, API-only systems built and operated by companies like OpenAI, Anthropic, and Google DeepMind. On the other are downloadable models with publicly released weights, often produced by companies like Meta AI, Mistral AI, DeepSeek, and Alibaba's Qwen team, as well as nonprofit groups like Allen AI and EleutherAI. The two camps differ in licensing, deployment, cost structure, customization, transparency, and the kind of risk a user takes on. This article covers the differences in detail, surveys the major models in each category as of early 2026, and walks through the practical tradeoffs that engineering and product teams weigh when choosing between them.
what "proprietary" and "open source" actually mean
A proprietary LLM is a model whose weights are held privately by the developer. The model is normally accessed through a paid API, a chat product, or a partner cloud (for example, Anthropic's Claude on Amazon Bedrock). The customer does not download the network parameters and cannot run the model on their own hardware. Use is governed by the provider's terms of service, which usually include rate limits, content policies, and per-token pricing.
An open-source LLM (more accurately, an open-weights LLM) is a model whose trained parameters are publicly downloadable, normally from a hub like Hugging Face. The user can run the model on their own infrastructure, fine-tune it, redistribute it (subject to its license), and inspect its size and architecture. Some open releases also include the training code, training data, intermediate checkpoints, or the alignment recipe; the most thorough releases publish all of these together.
The boundary is not always clean. A model can be:
- Open weights only: weights are downloadable, but the training code, data mix, and recipe are private. Most popular "open" models fall here, including all current Llama and Qwen releases.
- Source-available: weights and code are published but the license restricts who can use the model or for what purpose. Llama is the canonical example.
- Fully open: weights, training code, training data, and detailed methodology are released under a permissive license. This is much rarer; Allen AI's OLMo family and EleutherAI's Pythia are the clearest examples.
- Proprietary: nothing of substance is released except an API.
the terminology problem
The phrase "open source" is used loosely in the LLM space, in ways that would not pass muster for traditional software. The Open Source Initiative (OSI) addressed this in October 2024 with the Open Source AI Definition version 1.0. To qualify as open-source AI under the definition, a system must give users freedom to use, study, modify, and share it for any purpose, and must publish three things together: the model parameters (weights), the source code used to train and run the system, and sufficiently detailed information about the training data so that a skilled person could build a substantially equivalent system. Crucially, the definition does not require releasing the raw training data itself, partly because training corpora often include copyrighted, private, or personally identifiable material. It does require enough documentation that the data could in principle be reconstructed.
By this strict bar, very few "open" LLMs qualify. Llama, Qwen, Mistral's open releases, and DeepSeek typically publish the weights but withhold the training data and detailed data documentation. Allen AI's OLMo 2 and OLMo 3, along with EleutherAI's Pythia, are among the few well-known model families that publish the weights, code, training data, intermediate checkpoints, and training logs.
A more accurate label for most "open" LLMs is open-weights or source-available. The looser usage continues mostly because "open source" carries strong positive associations from the software world, and because most practitioners care about being able to download and run the model rather than about the philosophical purity of the release.
license taxonomy
LLM licenses fall into a small number of recognizable buckets. The license matters: it determines whether you can use the model commercially, whether you can fine-tune it on your own data, whether you can use it to train a competing model, and whether your downstream users have any obligations. The table below summarizes the most common license categories.
| License type | Examples | Commercial use | Modification and fine-tuning | Notable restrictions |
|---|
| Permissive open-source (Apache 2.0, MIT, BSD) | Mistral 7B, Mixtral 8x7B, Falcon, Pythia, OLMo, gpt-oss, DeepSeek-V3.1 weights | Allowed | Allowed | Patent grant (Apache 2.0) and attribution requirements |
| Restricted source-available (Llama Community License, Gemma Terms of Use) | Llama 2, Llama 3, Llama 3.1, Llama 3.3, Llama 4, Gemma family | Allowed with conditions | Allowed | 700 million MAU cap (Llama); ban on training competing AI models with outputs; Acceptable Use Policy |
| Responsible AI (OpenRAIL family) | BLOOM, several Stability AI releases, original DeepSeek-V3 model weights | Allowed | Allowed | Behavioral restrictions on harmful uses, military uses, etc. |
| Research / non-commercial | Llama 1 (original release), OPT, several academic releases | Not allowed | Allowed for research | Output is for research only; cannot be deployed in products |
| Custom restricted | Various smaller releases | Varies | Varies | Read each license individually |
The Llama license is the most commercially important non-permissive license in the LLM ecosystem. It allows broad commercial use but contains three notable carve-outs: a monthly active user cap of 700 million above which the licensee must request a separate license from Meta at Meta's discretion, a competitor restriction preventing the use of Llama or its outputs to improve any other large language model (other than a Llama derivative), and adherence to Meta's Acceptable Use Policy. The 700-million-user cap is rarely binding in practice (it targets a handful of hyperscalers) but the no-train-competing-model clause has significant downstream effects. It means a company cannot legally use Llama outputs to distill into a non-Llama base model.
major proprietary LLMs
The proprietary tier is dominated by a small number of well-funded labs. As of April 2026, the most notable closed-weight models in production are:
| Model family | Provider | Latest release (date) | Modalities | Access |
|---|
| GPT-4, GPT-4o, GPT-4.1, o-series, GPT-5 | OpenAI | GPT-5 (Aug 7, 2025); GPT-4o (May 13, 2024) | Text, image, audio (GPT-4o); reasoning (o-series) | API; ChatGPT; Microsoft Copilot |
| Claude 3, 3.5, 4, 4.5, 4.6, Opus 4.7 | Anthropic | Claude Opus 4.7 (Apr 16, 2026) | Text, vision | API; Claude.ai; AWS Bedrock; Google Vertex AI; Microsoft Foundry |
| Gemini 1.5, 2.0, 2.5, 3 | Google DeepMind | Gemini 3 Pro (Nov 18, 2025) | Text, image, audio, video | Gemini app; AI Studio; Vertex AI |
| Grok | xAI | Grok 4 (2025) | Text, vision | X (Twitter); xAI API |
Proprietary models tend to lead the published frontier benchmarks, particularly on hard reasoning and agentic-coding tasks. As of April 6, 2026, Anthropic's Claude Opus 4.6 Thinking topped the LMArena (formerly Chatbot Arena) leaderboard at an Elo of 1504, with Gemini 3.1 Pro Preview at 1493 and GPT-5.4 High at 1484. The dispersion among the top closed models is small.
major open-weight LLMs
Open-weight releases multiplied rapidly after the Llama 1 leak in March 2023. The table below covers the most widely used open-weight families as of early 2026.
| Model family | Developer | Notable releases | License | Architecture notes |
|---|
| Llama | Meta AI | Llama 1 (Feb 2023), Llama 2 (Jul 2023), Llama 3 (Apr 2024), Llama 3.1, Llama 3.3 (Dec 6, 2024, 70B), Llama 4 Scout/Maverick (Apr 5, 2025) | Llama Community License | Llama 4 introduced Mixture-of-Experts: Scout has 17B active / 109B total / 16 experts; Maverick has 17B active / 400B total / 128 experts |
| Mistral | Mistral AI | Mistral 7B, Mixtral 8x7B, Mixtral 8x22B, Mistral Large 2, Mistral 3 family and Mistral Large 3 (Dec 2025, 41B active / 675B total) | Apache 2.0 (most open releases); Mistral Research License (some larger models) | Sliding window attention; sparse MoE in larger releases |
| DeepSeek | DeepSeek (China) | DeepSeek-V2, DeepSeek-V3 (Dec 26, 2024, 671B / 37B active), DeepSeek-R1 (Jan 20, 2025, MIT-licensed reasoning model), V3.1 | MIT (R1, V3.1 weights); custom OpenRAIL-derived (original V3 weights) | MoE with multi-head latent attention; R1 trained with reinforcement learning for chain-of-thought reasoning |
| Qwen | Alibaba | Qwen 2, Qwen 2.5, Qwen 3 (Apr 28, 2025) | Apache 2.0 | Dense (0.6B, 1.7B, 4B, 8B, 14B, 32B) and MoE (30B-A3B, 235B-A22B); 36T training tokens; 119 languages |
| Gemma | Google | Gemma, Gemma 2, Gemma 3 (Mar 12, 2025), Gemma 3 270M | Gemma Terms of Use (permissive but custom) | Gemma 3 sizes are 1B, 4B, 12B, 27B; 4B and up are vision-language; 128k context |
| Falcon | Technology Innovation Institute (UAE) | Falcon-7B, Falcon-40B, Falcon-180B | Apache 2.0 (most variants) | Early high-profile permissive open release in 2023 |
| OLMo | Allen Institute for AI | OLMo, OLMo 2 (7B, 13B, 32B), OLMo 3 | Apache 2.0 | Fully open: weights, training data, code, intermediate checkpoints, and training logs all released |
| Pythia / GPT-NeoX | EleutherAI | Pythia (70M to 12B suite) | Apache 2.0 | Designed primarily for interpretability and scaling research; full data and checkpoints |
| gpt-oss | OpenAI | gpt-oss-120b, gpt-oss-20b (Aug 5, 2025) | Apache 2.0 | OpenAI's first open-weight LLMs since GPT-2; 120B variant runs on a single 80 GB GPU |
The open-weight side has closed much of the quality gap with proprietary frontier models since late 2024. DeepSeek-V3 and DeepSeek-R1 in particular drew attention because R1 reached performance comparable to OpenAI's o1 on math, code, and reasoning benchmarks despite being trained on a fraction of the rumored proprietary budget. According to the LMArena leaderboard, the gap between the top open and the top closed model briefly narrowed to about 4 Elo points in February 2025 before widening again as proprietary labs shipped new versions.
cost structure
The two delivery models price differently, and the choice has compounding effects on a product's unit economics.
| Cost item | Proprietary API | Self-hosted open weights |
|---|
| Per-token cost | Direct, posted price (for example, Claude Opus 4.7 at $5 per million input tokens and $25 per million output tokens) | Effectively zero variable cost on top of GPU amortization |
| Infrastructure capex | None (provider absorbs it) | GPU servers, networking, racks, or long reservation contracts on a cloud |
| Inference engineering | None required | Need to operate vLLM, TGI, SGLang, or similar; manage batching, KV cache, autoscaling |
| Idle cost | None | Reserved GPUs cost the same whether or not you serve requests |
| Scaling spikes | Handled by provider, subject to rate limits | You must provision for peak; bursting is harder |
| Compliance and SOC2 | Provided | You inherit the entire stack |
The right answer depends on volume. A thin product that handles a few million tokens per month almost always costs less on a proprietary API than on rented H100s. A heavy product with predictable, sustained throughput (an internal copilot used by 50,000 employees, a search reranker, a customer support classifier) often crosses over and becomes cheaper to self-host. Managed open-weight inference, offered by companies like Together AI, Anyscale, Fireworks AI, and Replicate, sits in between: per-token billing on open models, often at one-tenth or less of the proprietary price for similar quality tiers, with the operations handled by the provider.
A 2025 industry analysis by Stripe, after migrating its inference stack to vLLM, reported a 73 percent reduction in inference costs for the affected workloads. That kind of saving is real but only materializes at scale. For most teams the choice is less binary than it looks; running a permissive open model behind a managed inference provider gives most of the cost savings without the operational burden.
benchmarks and quality
Four evaluations dominate public discussion of LLM quality:
- LMArena (formerly Chatbot Arena) uses anonymous pairwise human preference voting and reports an Elo rating. It captures perceived helpfulness on open-ended chat better than fixed benchmarks.
- MMLU measures multitask knowledge across 57 subjects, mostly at undergraduate level. Frontier models now score above 85 percent.
- GPQA is a graduate-level science Q&A benchmark designed to resist memorization. Frontier reasoning models reached human-expert performance during 2024 and 2025.
- SWE-bench evaluates a model's ability to resolve real GitHub issues in open-source repositories. It has become the headline benchmark for coding agents.
The pattern that has emerged is that proprietary models lead the absolute peak by a small margin, while the best open models trail by months rather than years. The frontier reasoning capabilities introduced by OpenAI's o1 in late 2024 were matched by DeepSeek-R1 within a few months, on permissive weights. Coding agents based on Claude and Gemini 3 still lead their respective leaderboards, but the open side narrows that gap with each release cycle.
pros and cons compared
No summary fully captures every team's situation, but the table below covers the recurring tradeoffs.
| Dimension | Proprietary advantage | Open-weights advantage |
|---|
| Peak quality | Usually slightly ahead at the absolute frontier | Catching up; close enough for most workloads |
| Time to first token | Provider handles inference; you call an API | Requires GPU provisioning and a serving stack |
| Data residency | Sent to provider's servers; subject to their policies | Stays on your infrastructure; full control |
| Fine-tuning | Limited or unavailable; even when offered, restrictive | Full control: LoRA, QLoRA, full SFT, RLHF, distillation |
| Cost at low volume | Cheaper (no capex, no idle GPUs) | More expensive (you pay for GPUs whether you use them or not) |
| Cost at high volume | More expensive (per-token markup) | Cheaper (amortized GPU cost over many tokens) |
| Vendor lock-in | High; switching providers means re-prompting and re-evaluation | Low; you own the weights |
| Auditability | Limited; the provider's behavior can change without notice | High; you can inspect, freeze, and version the model |
| Safety and alignment | Provider-managed; behavior is shaped by their policies | Your responsibility; base models are often less aligned than chat models |
| Long-term availability | Subject to deprecation; older models get retired | Permanent; you keep the weights as long as you keep the storage |
| Operational burden | Low | High; you run the inference stack, monitor it, and patch it |
The "sudden deprecation" risk is worth calling out. OpenAI, Anthropic, and Google all retire older models on published schedules, and applications calibrated to one version often need re-evaluation when forced to upgrade. With open weights, deprecation is your decision.
hybrid strategies
Most mature production stacks use both. A few patterns recur:
- Tiered routing. A cheaper model (often an open-weight one) handles routine requests; a frontier proprietary model handles hard ones. Routers can use simple heuristics, classifiers, or learned policies.
- Proprietary as judge. Many teams use a top-tier closed model to grade outputs from cheaper models during evaluation, even if the cheap model serves production. This decouples evaluation quality from inference cost.
- Distillation. A proprietary teacher generates training data for a smaller open student, which is then deployed. License terms matter here: the Llama license forbids using outputs to train non-Llama models, and many proprietary providers prohibit using their API outputs to train competing models, which constrains who can do this legally.
- Open base, proprietary fallback. Self-host an open model for the common case and call a proprietary API only when the open model abstains or expresses low confidence.
- Sensitive-data isolation. Use an open model on-premises for any request that touches regulated or confidential data; use proprietary APIs for everything else.
regulatory considerations
The regulatory picture is starting to differentiate between proprietary and open models, in ways that often favor open releases.
The EU AI Act, whose obligations for general-purpose AI (GPAI) model providers became applicable on August 2, 2025, contains a partial exemption for open-source GPAI models. Providers that release a model under a free and open license, with weights, parameters, architecture, and usage information publicly available, are exempt from the technical documentation and downstream-information requirements that apply to closed providers. They still must publish a copyright policy and a sufficiently detailed summary of training data. The exemption does not apply to models with systemic risk, defined as those trained with more than 10^25 floating-point operations of compute. The largest open releases (such as Llama 3.1 405B and the largest Llama 4 variants) cross this threshold and so receive no exemption.
US export controls also affect the open-weights ecosystem. Restrictions on advanced GPU exports to China have not stopped Chinese labs from releasing capable open weights (DeepSeek, Qwen, Yi), but they shape what hardware can train them and where the resulting models can be served. Some US legislators have proposed restrictions on the open release of model weights above certain capability thresholds; as of early 2026, no such restriction has become law.
notable historical events
A few episodes shaped the current landscape:
- Llama 1 leak, March 2023. Meta released Llama 1 in February 2023 under a non-commercial research license, restricted to approved researchers. On March 3, 2023, the weights were posted as a torrent on 4chan and spread rapidly. Within weeks, Stanford released Alpaca (Llama 1 fine-tuned on instruction-following data) and the Vicuna team produced a chat model that approached GPT-3.5 quality. The leak is widely credited with catalyzing the open-weights ecosystem and pushed Meta to release Llama 2 in July 2023 under a commercial license.
- Mistral 7B, September 2023. A small French startup released a 7B model under Apache 2.0 that outperformed Llama 2 13B on most benchmarks. It demonstrated that competitive permissive releases were possible from outside the US hyperscalers.
- DeepSeek-V3 and R1, late 2024 to early 2025. DeepSeek-V3 (December 26, 2024) and DeepSeek-R1 (January 20, 2025) showed that open weights could match closed-model frontier performance on reasoning tasks. R1 was released under MIT license. The releases triggered a sharp drop in some US AI stocks in late January 2025 and prompted reassessments of the assumed lead held by US labs.
- OpenAI's gpt-oss, August 2025. OpenAI released gpt-oss-120b and gpt-oss-20b under Apache 2.0 on August 5, 2025. These were OpenAI's first open-weight language models since GPT-2 in 2019. The 120B variant runs on a single 80 GB GPU and reaches near-parity with OpenAI's own o4-mini on core reasoning benchmarks, signaling that even the closed labs see strategic value in shipping some open weights.
- Mistral 3 and Large 3, December 2025. Mistral released a frontier-tier open-weights model (Mistral Large 3, 41B active / 675B total parameters) and a family of small dense models (3B, 8B, 14B), all under Apache 2.0. The release demonstrated that competitive frontier models can ship under fully permissive licenses.
how to choose
The right answer depends on your constraints, not on philosophy. A few rules of thumb:
- If your data cannot leave your infrastructure for legal, regulatory, or security reasons, you need open weights, period.
- If your volume is small and you want to ship quickly, use a proprietary API. The dollar cost is rarely the binding constraint at low volume.
- If your volume is large and your workload is predictable, run an open model on rented or owned GPUs. Per-token economics will dominate.
- If you need maximum capability on the hardest tasks, the top proprietary models still tend to lead, though by less than they used to.
- If you need full control over fine-tuning, alignment, or behavior, open weights are the only credible option.
- For most teams, the best stack uses both: open weights for routine work and the most capable proprietary model for hard requests, evaluation, and synthesis.
The gap between closed and open is no longer a chasm. It is a moving frontier of months, sometimes weeks, that any serious LLM-using team will have to keep watching.
references
- Open Source Initiative. "The Open Source AI Definition 1.0." October 2024. https://opensource.org/ai/open-source-ai-definition
- Meta. "Llama 3 Community License Agreement." https://www.llama.com/llama3/license/
- Meta. "Llama 3.1 Community License Agreement." https://www.llama.com/llama3_1/license/
- Meta. "Introducing Llama 4: Maverick and Scout." April 5, 2025. https://www.llama.com/models/llama-4/
- DeepSeek. "DeepSeek-R1 Release." January 20, 2025. https://api-docs.deepseek.com/news/news250120
- DeepSeek-AI. "DeepSeek-V3 Technical Report." December 2024. https://arxiv.org/abs/2412.19437
- Qwen Team. "Qwen3: Think Deeper, Act Faster." April 28, 2025. https://qwenlm.github.io/blog/qwen3/
- Google DeepMind. "Gemma 3 Technical Report." March 2025. https://storage.googleapis.com/deepmind-media/gemma/Gemma3Report.pdf
- Google. "Introducing Gemini 3." November 18, 2025. https://blog.google/products/gemini/gemini-3/
- OpenAI. "Introducing GPT-5." August 7, 2025. https://openai.com/index/introducing-gpt-5/
- OpenAI. "Introducing gpt-oss." August 5, 2025. https://openai.com/index/introducing-gpt-oss/
- OpenAI. "Hello GPT-4o." May 13, 2024. https://openai.com/index/hello-gpt-4o/
- Anthropic. "Introducing Claude Opus 4.7." April 16, 2026. https://www.anthropic.com/news/claude-opus-4-7
- Mistral AI. "Introducing Mistral 3." December 2, 2025. https://mistral.ai/news/mistral-3
- Allen Institute for AI. "OLMo 2: The best fully open language model to date." https://allenai.org/blog/olmo2
- European Union. "Article 53: Obligations for Providers of General-Purpose AI Models." EU Artificial Intelligence Act. https://artificialintelligenceact.eu/article/53/
- European Commission. "Guidelines for providers of general-purpose AI models." https://digital-strategy.ec.europa.eu/en/policies/guidelines-gpai-providers
- Wikipedia. "LLaMA." https://en.wikipedia.org/wiki/llama
- The Register. "LLaMA drama as Meta's mega language model files leak." March 8, 2023. https://www.theregister.com/2023/03/08/meta_llama_ai_leak/
- LMArena. "Arena Leaderboard." https://huggingface.co/spaces/lmarena-ai/arena-leaderboard
- Hugging Face. "Welcome Llama 4 Maverick & Scout on Hugging Face." April 2025. https://huggingface.co/blog/llama4-release
- Simon Willison. "Meta AI release Llama 3.3." December 6, 2024. https://simonwillison.net/2024/Dec/6/llama-33/