See also: LLMs, GPT-4, Llama, Apache 2.0 License
The split between proprietary and open-source large language models (LLMs) is one of the defining structural choices in modern artificial intelligence. On one side are closed, API-only systems built and operated by companies like OpenAI, Anthropic, and Google DeepMind. On the other are downloadable models with publicly released weights, often produced by companies like Meta AI, Mistral AI, DeepSeek, and Alibaba's Qwen team, as well as nonprofit groups like Allen AI and EleutherAI. The two camps differ in licensing, deployment, cost structure, customization, transparency, and the kind of risk a user takes on. This article covers the differences in detail, surveys the major models in each category as of early 2026, and walks through the practical tradeoffs that engineering and product teams weigh when choosing between them.
A proprietary LLM is a model whose weights are held privately by the developer. The model is normally accessed through a paid API, a chat product, or a partner cloud (for example, Anthropic's Claude on Amazon Bedrock). The customer does not download the network parameters and cannot run the model on their own hardware. Use is governed by the provider's terms of service, which usually include rate limits, content policies, and per-token pricing.
An open-source LLM (more accurately, an open-weights LLM) is a model whose trained parameters are publicly downloadable, normally from a hub like Hugging Face. The user can run the model on their own infrastructure, fine-tune it, redistribute it (subject to its license), and inspect its size and architecture. Some open releases also include the training code, training data, intermediate checkpoints, or the alignment recipe; the most thorough releases publish all of these together.
The boundary is not always clean. A model can be:
The phrase "open source" is used loosely in the LLM space, in ways that would not pass muster for traditional software. The Open Source Initiative (OSI) addressed this in October 2024 with the Open Source AI Definition version 1.0. To qualify as open-source AI under the definition, a system must give users freedom to use, study, modify, and share it for any purpose, and must publish three things together: the model parameters (weights), the source code used to train and run the system, and sufficiently detailed information about the training data so that a skilled person could build a substantially equivalent system. Crucially, the definition does not require releasing the raw training data itself, partly because training corpora often include copyrighted, private, or personally identifiable material. It does require enough documentation that the data could in principle be reconstructed.
By this strict bar, very few "open" LLMs qualify. Llama, Qwen, Mistral's open releases, and DeepSeek typically publish the weights but withhold the training data and detailed data documentation. Allen AI's OLMo 2 and OLMo 3, along with EleutherAI's Pythia, are among the few well-known model families that publish the weights, code, training data, intermediate checkpoints, and training logs.
A more accurate label for most "open" LLMs is open-weights or source-available. The looser usage continues mostly because "open source" carries strong positive associations from the software world, and because most practitioners care about being able to download and run the model rather than about the philosophical purity of the release.
LLM licenses fall into a small number of recognizable buckets. The license matters: it determines whether you can use the model commercially, whether you can fine-tune it on your own data, whether you can use it to train a competing model, and whether your downstream users have any obligations. The table below summarizes the most common license categories.
| License type | Examples | Commercial use | Modification and fine-tuning | Notable restrictions |
|---|---|---|---|---|
| Permissive open-source (Apache 2.0, MIT, BSD) | Mistral 7B, Mixtral 8x7B, Falcon, Pythia, OLMo, gpt-oss, DeepSeek-V3.1 weights | Allowed | Allowed | Patent grant (Apache 2.0) and attribution requirements |
| Restricted source-available (Llama Community License, Gemma Terms of Use) | Llama 2, Llama 3, Llama 3.1, Llama 3.3, Llama 4, Gemma family | Allowed with conditions | Allowed | 700 million MAU cap (Llama); ban on training competing AI models with outputs; Acceptable Use Policy |
| Responsible AI (OpenRAIL family) | BLOOM, several Stability AI releases, original DeepSeek-V3 model weights | Allowed | Allowed | Behavioral restrictions on harmful uses, military uses, etc. |
| Research / non-commercial | Llama 1 (original release), OPT, several academic releases | Not allowed | Allowed for research | Output is for research only; cannot be deployed in products |
| Custom restricted | Various smaller releases | Varies | Varies | Read each license individually |
The Llama license is the most commercially important non-permissive license in the LLM ecosystem. It allows broad commercial use but contains three notable carve-outs: a monthly active user cap of 700 million above which the licensee must request a separate license from Meta at Meta's discretion, a competitor restriction preventing the use of Llama or its outputs to improve any other large language model (other than a Llama derivative), and adherence to Meta's Acceptable Use Policy. The 700-million-user cap is rarely binding in practice (it targets a handful of hyperscalers) but the no-train-competing-model clause has significant downstream effects. It means a company cannot legally use Llama outputs to distill into a non-Llama base model.
The proprietary tier is dominated by a small number of well-funded labs. As of April 2026, the most notable closed-weight models in production are:
| Model family | Provider | Latest release (date) | Modalities | Access |
|---|---|---|---|---|
| GPT-4, GPT-4o, GPT-4.1, o-series, GPT-5 | OpenAI | GPT-5 (Aug 7, 2025); GPT-4o (May 13, 2024) | Text, image, audio (GPT-4o); reasoning (o-series) | API; ChatGPT; Microsoft Copilot |
| Claude 3, 3.5, 4, 4.5, 4.6, Opus 4.7 | Anthropic | Claude Opus 4.7 (Apr 16, 2026) | Text, vision | API; Claude.ai; AWS Bedrock; Google Vertex AI; Microsoft Foundry |
| Gemini 1.5, 2.0, 2.5, 3 | Google DeepMind | Gemini 3 Pro (Nov 18, 2025) | Text, image, audio, video | Gemini app; AI Studio; Vertex AI |
| Grok | xAI | Grok 4 (2025) | Text, vision | X (Twitter); xAI API |
Proprietary models tend to lead the published frontier benchmarks, particularly on hard reasoning and agentic-coding tasks. As of April 6, 2026, Anthropic's Claude Opus 4.6 Thinking topped the LMArena (formerly Chatbot Arena) leaderboard at an Elo of 1504, with Gemini 3.1 Pro Preview at 1493 and GPT-5.4 High at 1484. The dispersion among the top closed models is small.
Open-weight releases multiplied rapidly after the Llama 1 leak in March 2023. The table below covers the most widely used open-weight families as of early 2026.
| Model family | Developer | Notable releases | License | Architecture notes |
|---|---|---|---|---|
| Llama | Meta AI | Llama 1 (Feb 2023), Llama 2 (Jul 2023), Llama 3 (Apr 2024), Llama 3.1, Llama 3.3 (Dec 6, 2024, 70B), Llama 4 Scout/Maverick (Apr 5, 2025) | Llama Community License | Llama 4 introduced Mixture-of-Experts: Scout has 17B active / 109B total / 16 experts; Maverick has 17B active / 400B total / 128 experts |
| Mistral | Mistral AI | Mistral 7B, Mixtral 8x7B, Mixtral 8x22B, Mistral Large 2, Mistral 3 family and Mistral Large 3 (Dec 2025, 41B active / 675B total) | Apache 2.0 (most open releases); Mistral Research License (some larger models) | Sliding window attention; sparse MoE in larger releases |
| DeepSeek | DeepSeek (China) | DeepSeek-V2, DeepSeek-V3 (Dec 26, 2024, 671B / 37B active), DeepSeek-R1 (Jan 20, 2025, MIT-licensed reasoning model), V3.1 | MIT (R1, V3.1 weights); custom OpenRAIL-derived (original V3 weights) | MoE with multi-head latent attention; R1 trained with reinforcement learning for chain-of-thought reasoning |
| Qwen | Alibaba | Qwen 2, Qwen 2.5, Qwen 3 (Apr 28, 2025) | Apache 2.0 | Dense (0.6B, 1.7B, 4B, 8B, 14B, 32B) and MoE (30B-A3B, 235B-A22B); 36T training tokens; 119 languages |
| Gemma | Gemma, Gemma 2, Gemma 3 (Mar 12, 2025), Gemma 3 270M | Gemma Terms of Use (permissive but custom) | Gemma 3 sizes are 1B, 4B, 12B, 27B; 4B and up are vision-language; 128k context | |
| Falcon | Technology Innovation Institute (UAE) | Falcon-7B, Falcon-40B, Falcon-180B | Apache 2.0 (most variants) | Early high-profile permissive open release in 2023 |
| OLMo | Allen Institute for AI | OLMo, OLMo 2 (7B, 13B, 32B), OLMo 3 | Apache 2.0 | Fully open: weights, training data, code, intermediate checkpoints, and training logs all released |
| Pythia / GPT-NeoX | EleutherAI | Pythia (70M to 12B suite) | Apache 2.0 | Designed primarily for interpretability and scaling research; full data and checkpoints |
| gpt-oss | OpenAI | gpt-oss-120b, gpt-oss-20b (Aug 5, 2025) | Apache 2.0 | OpenAI's first open-weight LLMs since GPT-2; 120B variant runs on a single 80 GB GPU |
The open-weight side has closed much of the quality gap with proprietary frontier models since late 2024. DeepSeek-V3 and DeepSeek-R1 in particular drew attention because R1 reached performance comparable to OpenAI's o1 on math, code, and reasoning benchmarks despite being trained on a fraction of the rumored proprietary budget. According to the LMArena leaderboard, the gap between the top open and the top closed model briefly narrowed to about 4 Elo points in February 2025 before widening again as proprietary labs shipped new versions.
The two delivery models price differently, and the choice has compounding effects on a product's unit economics.
| Cost item | Proprietary API | Self-hosted open weights |
|---|---|---|
| Per-token cost | Direct, posted price (for example, Claude Opus 4.7 at $5 per million input tokens and $25 per million output tokens) | Effectively zero variable cost on top of GPU amortization |
| Infrastructure capex | None (provider absorbs it) | GPU servers, networking, racks, or long reservation contracts on a cloud |
| Inference engineering | None required | Need to operate vLLM, TGI, SGLang, or similar; manage batching, KV cache, autoscaling |
| Idle cost | None | Reserved GPUs cost the same whether or not you serve requests |
| Scaling spikes | Handled by provider, subject to rate limits | You must provision for peak; bursting is harder |
| Compliance and SOC2 | Provided | You inherit the entire stack |
The right answer depends on volume. A thin product that handles a few million tokens per month almost always costs less on a proprietary API than on rented H100s. A heavy product with predictable, sustained throughput (an internal copilot used by 50,000 employees, a search reranker, a customer support classifier) often crosses over and becomes cheaper to self-host. Managed open-weight inference, offered by companies like Together AI, Anyscale, Fireworks AI, and Replicate, sits in between: per-token billing on open models, often at one-tenth or less of the proprietary price for similar quality tiers, with the operations handled by the provider.
A 2025 industry analysis by Stripe, after migrating its inference stack to vLLM, reported a 73 percent reduction in inference costs for the affected workloads. That kind of saving is real but only materializes at scale. For most teams the choice is less binary than it looks; running a permissive open model behind a managed inference provider gives most of the cost savings without the operational burden.
Four evaluations dominate public discussion of LLM quality:
The pattern that has emerged is that proprietary models lead the absolute peak by a small margin, while the best open models trail by months rather than years. The frontier reasoning capabilities introduced by OpenAI's o1 in late 2024 were matched by DeepSeek-R1 within a few months, on permissive weights. Coding agents based on Claude and Gemini 3 still lead their respective leaderboards, but the open side narrows that gap with each release cycle.
No summary fully captures every team's situation, but the table below covers the recurring tradeoffs.
| Dimension | Proprietary advantage | Open-weights advantage |
|---|---|---|
| Peak quality | Usually slightly ahead at the absolute frontier | Catching up; close enough for most workloads |
| Time to first token | Provider handles inference; you call an API | Requires GPU provisioning and a serving stack |
| Data residency | Sent to provider's servers; subject to their policies | Stays on your infrastructure; full control |
| Fine-tuning | Limited or unavailable; even when offered, restrictive | Full control: LoRA, QLoRA, full SFT, RLHF, distillation |
| Cost at low volume | Cheaper (no capex, no idle GPUs) | More expensive (you pay for GPUs whether you use them or not) |
| Cost at high volume | More expensive (per-token markup) | Cheaper (amortized GPU cost over many tokens) |
| Vendor lock-in | High; switching providers means re-prompting and re-evaluation | Low; you own the weights |
| Auditability | Limited; the provider's behavior can change without notice | High; you can inspect, freeze, and version the model |
| Safety and alignment | Provider-managed; behavior is shaped by their policies | Your responsibility; base models are often less aligned than chat models |
| Long-term availability | Subject to deprecation; older models get retired | Permanent; you keep the weights as long as you keep the storage |
| Operational burden | Low | High; you run the inference stack, monitor it, and patch it |
The "sudden deprecation" risk is worth calling out. OpenAI, Anthropic, and Google all retire older models on published schedules, and applications calibrated to one version often need re-evaluation when forced to upgrade. With open weights, deprecation is your decision.
Most mature production stacks use both. A few patterns recur:
The regulatory picture is starting to differentiate between proprietary and open models, in ways that often favor open releases.
The EU AI Act, whose obligations for general-purpose AI (GPAI) model providers became applicable on August 2, 2025, contains a partial exemption for open-source GPAI models. Providers that release a model under a free and open license, with weights, parameters, architecture, and usage information publicly available, are exempt from the technical documentation and downstream-information requirements that apply to closed providers. They still must publish a copyright policy and a sufficiently detailed summary of training data. The exemption does not apply to models with systemic risk, defined as those trained with more than 10^25 floating-point operations of compute. The largest open releases (such as Llama 3.1 405B and the largest Llama 4 variants) cross this threshold and so receive no exemption.
US export controls also affect the open-weights ecosystem. Restrictions on advanced GPU exports to China have not stopped Chinese labs from releasing capable open weights (DeepSeek, Qwen, Yi), but they shape what hardware can train them and where the resulting models can be served. Some US legislators have proposed restrictions on the open release of model weights above certain capability thresholds; as of early 2026, no such restriction has become law.
A few episodes shaped the current landscape:
The right answer depends on your constraints, not on philosophy. A few rules of thumb:
The gap between closed and open is no longer a chasm. It is a moving frontier of months, sometimes weeks, that any serious LLM-using team will have to keep watching.