Proprietary vs. Open Source Large Language Models (LLMs)

See also: LLMs, GPT-4, Llama, Apache 2.0 License

The split between proprietary and open-source large language models (LLMs) is one of the defining structural choices in modern artificial intelligence. On one side are closed, API-only systems built and operated by companies like OpenAI, Anthropic, and Google DeepMind. On the other are downloadable models with publicly released weights, often produced by companies like Meta AI, Mistral AI, DeepSeek, and Alibaba's Qwen team, as well as nonprofit groups like Allen AI and EleutherAI. The two camps differ in licensing, deployment, cost structure, customization, transparency, and the kind of risk a user takes on. This article covers the differences in detail, surveys the major models in each category as of early 2026, and walks through the practical tradeoffs that engineering and product teams weigh when choosing between them.

what "proprietary" and "open source" actually mean

A proprietary LLM is a model whose weights are held privately by the developer. The model is normally accessed through a paid API, a chat product, or a partner cloud (for example, Anthropic's Claude on Amazon Bedrock). The customer does not download the network parameters and cannot run the model on their own hardware. Use is governed by the provider's terms of service, which usually include rate limits, content policies, and per-token pricing.

An open-source LLM (more accurately, an open-weights LLM) is a model whose trained parameters are publicly downloadable, normally from a hub like Hugging Face. The user can run the model on their own infrastructure, fine-tune it, redistribute it (subject to its license), and inspect its size and architecture. Some open releases also include the training code, training data, intermediate checkpoints, or the alignment recipe; the most thorough releases publish all of these together.

The boundary is not always clean. A model can be:

Open weights only: weights are downloadable, but the training code, data mix, and recipe are private. Most popular "open" models fall here, including all current Llama and Qwen releases.
Source-available: weights and code are published but the license restricts who can use the model or for what purpose. Llama is the canonical example.
Fully open: weights, training code, training data, and detailed methodology are released under a permissive license. This is much rarer; Allen AI's OLMo family and EleutherAI's Pythia are the clearest examples.
Proprietary: nothing of substance is released except an API.

the terminology problem

The phrase "open source" is used loosely in the LLM space, in ways that would not pass muster for traditional software. The Open Source Initiative (OSI) addressed this in October 2024 with the Open Source AI Definition version 1.0. To qualify as open-source AI under the definition, a system must give users freedom to use, study, modify, and share it for any purpose, and must publish three things together: the model parameters (weights), the source code used to train and run the system, and sufficiently detailed information about the training data so that a skilled person could build a substantially equivalent system. Crucially, the definition does not require releasing the raw training data itself, partly because training corpora often include copyrighted, private, or personally identifiable material. It does require enough documentation that the data could in principle be reconstructed.

By this strict bar, very few "open" LLMs qualify. Llama, Qwen, Mistral's open releases, and DeepSeek typically publish the weights but withhold the training data and detailed data documentation. Allen AI's OLMo 2 and OLMo 3, along with EleutherAI's Pythia, are among the few well-known model families that publish the weights, code, training data, intermediate checkpoints, and training logs.

A more accurate label for most "open" LLMs is open-weights or source-available. The looser usage continues mostly because "open source" carries strong positive associations from the software world, and because most practitioners care about being able to download and run the model rather than about the philosophical purity of the release.

license taxonomy

LLM licenses fall into a small number of recognizable buckets. The license matters: it determines whether you can use the model commercially, whether you can fine-tune it on your own data, whether you can use it to train a competing model, and whether your downstream users have any obligations. The table below summarizes the most common license categories.

License type	Examples	Commercial use	Modification and fine-tuning	Notable restrictions
Permissive open-source (Apache 2.0, MIT, BSD)	Mistral 7B, Mixtral 8x7B, Falcon, Pythia, OLMo, gpt-oss, DeepSeek-V3.1 weights	Allowed	Allowed	Patent grant (Apache 2.0) and attribution requirements
Restricted source-available (Llama Community License, Gemma Terms of Use)	Llama 2, Llama 3, Llama 3.1, Llama 3.3, Llama 4, Gemma family	Allowed with conditions	Allowed	700 million MAU cap (Llama); ban on training competing AI models with outputs; Acceptable Use Policy
Responsible AI (OpenRAIL family)	BLOOM, several Stability AI releases, original DeepSeek-V3 model weights	Allowed	Allowed	Behavioral restrictions on harmful uses, military uses, etc.
Research / non-commercial	Llama 1 (original release), OPT, several academic releases	Not allowed	Allowed for research	Output is for research only; cannot be deployed in products
Custom restricted	Various smaller releases	Varies	Varies	Read each license individually

The Llama license is the most commercially important non-permissive license in the LLM ecosystem. It allows broad commercial use but contains three notable carve-outs: a monthly active user cap of 700 million above which the licensee must request a separate license from Meta at Meta's discretion, a competitor restriction preventing the use of Llama or its outputs to improve any other large language model (other than a Llama derivative), and adherence to Meta's Acceptable Use Policy. The 700-million-user cap is rarely binding in practice (it targets a handful of hyperscalers) but the no-train-competing-model clause has significant downstream effects. It means a company cannot legally use Llama outputs to distill into a non-Llama base model.

major proprietary LLMs

The proprietary tier is dominated by a small number of well-funded labs. As of April 2026, the most notable closed-weight models in production are:

Model family	Provider	Latest release (date)	Modalities	Access
GPT-4, GPT-4o, GPT-4.1, o-series, GPT-5	OpenAI	GPT-5 (Aug 7, 2025); GPT-4o (May 13, 2024)	Text, image, audio (GPT-4o); reasoning (o-series)	API; ChatGPT; Microsoft Copilot
Claude 3, 3.5, 4, 4.5, 4.6, Opus 4.7	Anthropic	Claude Opus 4.7 (Apr 16, 2026)	Text, vision	API; Claude.ai; AWS Bedrock; Google Vertex AI; Microsoft Foundry
Gemini 1.5, 2.0, 2.5, 3	Google DeepMind	Gemini 3 Pro (Nov 18, 2025)	Text, image, audio, video	Gemini app; AI Studio; Vertex AI
Grok	xAI	Grok 4 (2025)	Text, vision	X (Twitter); xAI API

Proprietary models tend to lead the published frontier benchmarks, particularly on hard reasoning and agentic-coding tasks. As of April 6, 2026, Anthropic's Claude Opus 4.6 Thinking topped the LMArena (formerly Chatbot Arena) leaderboard at an Elo of 1504, with Gemini 3.1 Pro Preview at 1493 and GPT-5.4 High at 1484. The dispersion among the top closed models is small.

major open-weight LLMs

Open-weight releases multiplied rapidly after the Llama 1 leak in March 2023. The table below covers the most widely used open-weight families as of early 2026.

Model family	Developer	Notable releases	License	Architecture notes
Llama	Meta AI	Llama 1 (Feb 2023), Llama 2 (Jul 2023), Llama 3 (Apr 2024), Llama 3.1, Llama 3.3 (Dec 6, 2024, 70B), Llama 4 Scout/Maverick (Apr 5, 2025)	Llama Community License	Llama 4 introduced Mixture-of-Experts: Scout has 17B active / 109B total / 16 experts; Maverick has 17B active / 400B total / 128 experts
Mistral	Mistral AI	Mistral 7B, Mixtral 8x7B, Mixtral 8x22B, Mistral Large 2, Mistral 3 family and Mistral Large 3 (Dec 2025, 41B active / 675B total)	Apache 2.0 (most open releases); Mistral Research License (some larger models)	Sliding window attention; sparse MoE in larger releases
DeepSeek	DeepSeek (China)	DeepSeek-V2, DeepSeek-V3 (Dec 26, 2024, 671B / 37B active), DeepSeek-R1 (Jan 20, 2025, MIT-licensed reasoning model), V3.1	MIT (R1, V3.1 weights); custom OpenRAIL-derived (original V3 weights)	MoE with multi-head latent attention; R1 trained with reinforcement learning for chain-of-thought reasoning
Qwen	Alibaba	Qwen 2, Qwen 2.5, Qwen 3 (Apr 28, 2025)	Apache 2.0	Dense (0.6B, 1.7B, 4B, 8B, 14B, 32B) and MoE (30B-A3B, 235B-A22B); 36T training tokens; 119 languages
Gemma	Google	Gemma, Gemma 2, Gemma 3 (Mar 12, 2025), Gemma 3 270M	Gemma Terms of Use (permissive but custom)	Gemma 3 sizes are 1B, 4B, 12B, 27B; 4B and up are vision-language; 128k context
Falcon	Technology Innovation Institute (UAE)	Falcon-7B, Falcon-40B, Falcon-180B	Apache 2.0 (most variants)	Early high-profile permissive open release in 2023
OLMo	Allen Institute for AI	OLMo, OLMo 2 (7B, 13B, 32B), OLMo 3	Apache 2.0	Fully open: weights, training data, code, intermediate checkpoints, and training logs all released
Pythia / GPT-NeoX	EleutherAI	Pythia (70M to 12B suite)	Apache 2.0	Designed primarily for interpretability and scaling research; full data and checkpoints
gpt-oss	OpenAI	gpt-oss-120b, gpt-oss-20b (Aug 5, 2025)	Apache 2.0	OpenAI's first open-weight LLMs since GPT-2; 120B variant runs on a single 80 GB GPU

The open-weight side has closed much of the quality gap with proprietary frontier models since late 2024. DeepSeek-V3 and DeepSeek-R1 in particular drew attention because R1 reached performance comparable to OpenAI's o1 on math, code, and reasoning benchmarks despite being trained on a fraction of the rumored proprietary budget. According to the LMArena leaderboard, the gap between the top open and the top closed model briefly narrowed to about 4 Elo points in February 2025 before widening again as proprietary labs shipped new versions.

cost structure

The two delivery models price differently, and the choice has compounding effects on a product's unit economics.

Cost item	Proprietary API	Self-hosted open weights
Per-token cost	Direct, posted price (for example, Claude Opus 4.7 at $5 per million input tokens and $25 per million output tokens)	Effectively zero variable cost on top of GPU amortization
Infrastructure capex	None (provider absorbs it)	GPU servers, networking, racks, or long reservation contracts on a cloud
Inference engineering	None required	Need to operate vLLM, TGI, SGLang, or similar; manage batching, KV cache, autoscaling
Idle cost	None	Reserved GPUs cost the same whether or not you serve requests
Scaling spikes	Handled by provider, subject to rate limits	You must provision for peak; bursting is harder
Compliance and SOC2	Provided	You inherit the entire stack

The right answer depends on volume. A thin product that handles a few million tokens per month almost always costs less on a proprietary API than on rented H100s. A heavy product with predictable, sustained throughput (an internal copilot used by 50,000 employees, a search reranker, a customer support classifier) often crosses over and becomes cheaper to self-host. Managed open-weight inference, offered by companies like Together AI, Anyscale, Fireworks AI, and Replicate, sits in between: per-token billing on open models, often at one-tenth or less of the proprietary price for similar quality tiers, with the operations handled by the provider.

A 2025 industry analysis by Stripe, after migrating its inference stack to vLLM, reported a 73 percent reduction in inference costs for the affected workloads. That kind of saving is real but only materializes at scale. For most teams the choice is less binary than it looks; running a permissive open model behind a managed inference provider gives most of the cost savings without the operational burden.

benchmarks and quality

Four evaluations dominate public discussion of LLM quality:

LMArena (formerly Chatbot Arena) uses anonymous pairwise human preference voting and reports an Elo rating. It captures perceived helpfulness on open-ended chat better than fixed benchmarks.
MMLU measures multitask knowledge across 57 subjects, mostly at undergraduate level. Frontier models now score above 85 percent.
GPQA is a graduate-level science Q&A benchmark designed to resist memorization. Frontier reasoning models reached human-expert performance during 2024 and 2025.
SWE-bench evaluates a model's ability to resolve real GitHub issues in open-source repositories. It has become the headline benchmark for coding agents.

The pattern that has emerged is that proprietary models lead the absolute peak by a small margin, while the best open models trail by months rather than years. The frontier reasoning capabilities introduced by OpenAI's o1 in late 2024 were matched by DeepSeek-R1 within a few months, on permissive weights. Coding agents based on Claude and Gemini 3 still lead their respective leaderboards, but the open side narrows that gap with each release cycle.

pros and cons compared

No summary fully captures every team's situation, but the table below covers the recurring tradeoffs.

Dimension	Proprietary advantage	Open-weights advantage
Peak quality	Usually slightly ahead at the absolute frontier	Catching up; close enough for most workloads
Time to first token	Provider handles inference; you call an API	Requires GPU provisioning and a serving stack
Data residency	Sent to provider's servers; subject to their policies	Stays on your infrastructure; full control
Fine-tuning	Limited or unavailable; even when offered, restrictive	Full control: LoRA, QLoRA, full SFT, RLHF, distillation
Cost at low volume	Cheaper (no capex, no idle GPUs)	More expensive (you pay for GPUs whether you use them or not)
Cost at high volume	More expensive (per-token markup)	Cheaper (amortized GPU cost over many tokens)
Vendor lock-in	High; switching providers means re-prompting and re-evaluation	Low; you own the weights
Auditability	Limited; the provider's behavior can change without notice	High; you can inspect, freeze, and version the model
Safety and alignment	Provider-managed; behavior is shaped by their policies	Your responsibility; base models are often less aligned than chat models
Long-term availability	Subject to deprecation; older models get retired	Permanent; you keep the weights as long as you keep the storage
Operational burden	Low	High; you run the inference stack, monitor it, and patch it

The "sudden deprecation" risk is worth calling out. OpenAI, Anthropic, and Google all retire older models on published schedules, and applications calibrated to one version often need re-evaluation when forced to upgrade. With open weights, deprecation is your decision.

hybrid strategies

Most mature production stacks use both. A few patterns recur:

Tiered routing. A cheaper model (often an open-weight one) handles routine requests; a frontier proprietary model handles hard ones. Routers can use simple heuristics, classifiers, or learned policies.
Proprietary as judge. Many teams use a top-tier closed model to grade outputs from cheaper models during evaluation, even if the cheap model serves production. This decouples evaluation quality from inference cost.
Distillation. A proprietary teacher generates training data for a smaller open student, which is then deployed. License terms matter here: the Llama license forbids using outputs to train non-Llama models, and many proprietary providers prohibit using their API outputs to train competing models, which constrains who can do this legally.
Open base, proprietary fallback. Self-host an open model for the common case and call a proprietary API only when the open model abstains or expresses low confidence.
Sensitive-data isolation. Use an open model on-premises for any request that touches regulated or confidential data; use proprietary APIs for everything else.

regulatory considerations

The regulatory picture is starting to differentiate between proprietary and open models, in ways that often favor open releases.

The EU AI Act, whose obligations for general-purpose AI (GPAI) model providers became applicable on August 2, 2025, contains a partial exemption for open-source GPAI models. Providers that release a model under a free and open license, with weights, parameters, architecture, and usage information publicly available, are exempt from the technical documentation and downstream-information requirements that apply to closed providers. They still must publish a copyright policy and a sufficiently detailed summary of training data. The exemption does not apply to models with systemic risk, defined as those trained with more than 10^25 floating-point operations of compute. The largest open releases (such as Llama 3.1 405B and the largest Llama 4 variants) cross this threshold and so receive no exemption.

US export controls also affect the open-weights ecosystem. Restrictions on advanced GPU exports to China have not stopped Chinese labs from releasing capable open weights (DeepSeek, Qwen, Yi), but they shape what hardware can train them and where the resulting models can be served. Some US legislators have proposed restrictions on the open release of model weights above certain capability thresholds; as of early 2026, no such restriction has become law.

notable historical events

A few episodes shaped the current landscape:

Llama 1 leak, March 2023. Meta released Llama 1 in February 2023 under a non-commercial research license, restricted to approved researchers. On March 3, 2023, the weights were posted as a torrent on 4chan and spread rapidly. Within weeks, Stanford released Alpaca (Llama 1 fine-tuned on instruction-following data) and the Vicuna team produced a chat model that approached GPT-3.5 quality. The leak is widely credited with catalyzing the open-weights ecosystem and pushed Meta to release Llama 2 in July 2023 under a commercial license.
Mistral 7B, September 2023. A small French startup released a 7B model under Apache 2.0 that outperformed Llama 2 13B on most benchmarks. It demonstrated that competitive permissive releases were possible from outside the US hyperscalers.
DeepSeek-V3 and R1, late 2024 to early 2025. DeepSeek-V3 (December 26, 2024) and DeepSeek-R1 (January 20, 2025) showed that open weights could match closed-model frontier performance on reasoning tasks. R1 was released under MIT license. The releases triggered a sharp drop in some US AI stocks in late January 2025 and prompted reassessments of the assumed lead held by US labs.
OpenAI's gpt-oss, August 2025. OpenAI released gpt-oss-120b and gpt-oss-20b under Apache 2.0 on August 5, 2025. These were OpenAI's first open-weight language models since GPT-2 in 2019. The 120B variant runs on a single 80 GB GPU and reaches near-parity with OpenAI's own o4-mini on core reasoning benchmarks, signaling that even the closed labs see strategic value in shipping some open weights.
Mistral 3 and Large 3, December 2025. Mistral released a frontier-tier open-weights model (Mistral Large 3, 41B active / 675B total parameters) and a family of small dense models (3B, 8B, 14B), all under Apache 2.0. The release demonstrated that competitive frontier models can ship under fully permissive licenses.

how to choose

The right answer depends on your constraints, not on philosophy. A few rules of thumb:

If your data cannot leave your infrastructure for legal, regulatory, or security reasons, you need open weights, period.
If your volume is small and you want to ship quickly, use a proprietary API. The dollar cost is rarely the binding constraint at low volume.
If your volume is large and your workload is predictable, run an open model on rented or owned GPUs. Per-token economics will dominate.
If you need maximum capability on the hardest tasks, the top proprietary models still tend to lead, though by less than they used to.
If you need full control over fine-tuning, alignment, or behavior, open weights are the only credible option.
For most teams, the best stack uses both: open weights for routine work and the most capable proprietary model for hard requests, evaluation, and synthesis.

The gap between closed and open is no longer a chasm. It is a moving frontier of months, sometimes weeks, that any serious LLM-using team will have to keep watching.

references

Open Source Initiative. "The Open Source AI Definition 1.0." October 2024. https://opensource.org/ai/open-source-ai-definition
Meta. "Llama 3 Community License Agreement." https://www.llama.com/llama3/license/
Meta. "Llama 3.1 Community License Agreement." https://www.llama.com/llama3_1/license/
Meta. "Introducing Llama 4: Maverick and Scout." April 5, 2025. https://www.llama.com/models/llama-4/
DeepSeek. "DeepSeek-R1 Release." January 20, 2025. https://api-docs.deepseek.com/news/news250120
DeepSeek-AI. "DeepSeek-V3 Technical Report." December 2024. https://arxiv.org/abs/2412.19437
Qwen Team. "Qwen3: Think Deeper, Act Faster." April 28, 2025. https://qwenlm.github.io/blog/qwen3/
Google DeepMind. "Gemma 3 Technical Report." March 2025. https://storage.googleapis.com/deepmind-media/gemma/Gemma3Report.pdf
Google. "Introducing Gemini 3." November 18, 2025. https://blog.google/products/gemini/gemini-3/
OpenAI. "Introducing GPT-5." August 7, 2025. https://openai.com/index/introducing-gpt-5/
OpenAI. "Introducing gpt-oss." August 5, 2025. https://openai.com/index/introducing-gpt-oss/
OpenAI. "Hello GPT-4o." May 13, 2024. https://openai.com/index/hello-gpt-4o/
Anthropic. "Introducing Claude Opus 4.7." April 16, 2026. https://www.anthropic.com/news/claude-opus-4-7
Mistral AI. "Introducing Mistral 3." December 2, 2025. https://mistral.ai/news/mistral-3
Allen Institute for AI. "OLMo 2: The best fully open language model to date." https://allenai.org/blog/olmo2
European Union. "Article 53: Obligations for Providers of General-Purpose AI Models." EU Artificial Intelligence Act. https://artificialintelligenceact.eu/article/53/
European Commission. "Guidelines for providers of general-purpose AI models." https://digital-strategy.ec.europa.eu/en/policies/guidelines-gpai-providers
Wikipedia. "LLaMA." https://en.wikipedia.org/wiki/llama
The Register. "LLaMA drama as Meta's mega language model files leak." March 8, 2023. https://www.theregister.com/2023/03/08/meta_llama_ai_leak/
LMArena. "Arena Leaderboard." https://huggingface.co/spaces/lmarena-ai/arena-leaderboard
Hugging Face. "Welcome Llama 4 Maverick & Scout on Hugging Face." April 2025. https://huggingface.co/blog/llama4-release
Simon Willison. "Meta AI release Llama 3.3." December 6, 2024. https://simonwillison.net/2024/Dec/6/llama-33/

what "proprietary" and "open source" actually mean

the terminology problem

license taxonomy

major proprietary LLMs

major open-weight LLMs

cost structure

benchmarks and quality

pros and cons compared

hybrid strategies

regulatory considerations

notable historical events

how to choose

references

Improve this article

Related Articles

Open-source AI

TensorFlow

Llama 3

Weaviate

Chroma

pgvector

what "proprietary" and "open source" actually mean

the terminology problem

license taxonomy

major proprietary LLMs

major open-weight LLMs

cost structure

benchmarks and quality

pros and cons compared

hybrid strategies

regulatory considerations

notable historical events

how to choose

references

Related Articles

Open-source AI

TensorFlow

Llama 3

Weaviate

Chroma

pgvector