Llama 3.3 is an instruction-tuned large language model developed by Meta and released on December 6, 2024. With 70 billion parameters, Llama 3.3 achieves benchmark performance comparable to Meta's own Llama 3.1 405B model on a range of reasoning, coding, and instruction-following tasks, while requiring a fraction of the hardware resources. It is text-only, supports eight languages, and operates with a 128,000-token context window. Llama 3.3 was Meta's final major model release of 2024, arriving roughly three months after the Llama 3.2 family introduced multimodal and lightweight variants.
The release attracted attention from the open-source AI community because it demonstrated that advances in post-training, rather than additional parameters or pre-training compute, could close much of the capability gap between a 70B model and one roughly six times its size. On the IFEval instruction-following benchmark, Llama 3.3 scored 92.1, exceeding both Llama 3.1 405B (88.6) and GPT-4o (84.6), establishing it as a new reference point for efficient, deployable open-weights language models.
Meta's public release of large language models under the LLaMA name began in February 2023. The original LLaMA models were research artifacts released to the scientific community; they were not instruction-tuned and arrived before Meta had established a framework for broad commercial licensing. LLaMA 2 followed in July 2023 with instruction-tuned variants and a license that permitted commercial use for most developers, though still with restrictions based on monthly active users.
LLaMA 3, announced in April 2024, marked a significant capability jump. The 8B and 70B models outperformed earlier Llama 2 variants by wide margins and showed strong performance relative to contemporary proprietary models. The pre-training corpus was substantially enlarged, reaching roughly 15 trillion tokens, and the vocabulary was expanded from 32,000 tokens (in Llama 2) to 128,256 tokens, improving coverage of code and non-English text.
Llama 3.1 arrived in July 2024 as the most significant expansion of the Llama 3 family. For the first time, Meta released a 405B-parameter model as an open-weights artifact, giving developers access to a frontier-scale model under the Llama 3.1 Community License. The 405B model also introduced support for a 128K context window and tool calling capabilities that brought it closer to proprietary API models in terms of agentic use. The 8B and 70B variants were refreshed alongside it, carrying over the expanded context window and improved multilingual support.
Llama 3.2 released in September 2024 expanded in two different directions simultaneously. Lightweight text models at 1B and 3B parameters targeted on-device deployment on smartphones and edge hardware. Vision models at 11B and 90B parameters added image understanding for the first time in the Llama 3 series. These were the first Llama models with multimodal input capability.
Llama 3.3 arrived on December 6, 2024, as a focused, single-model release: a new 70B instruct checkpoint that incorporated improved post-training relative to Llama 3.1 70B, yielding performance that closely tracks Llama 3.1 405B at a deployable size.
Meta released only one model variant under the Llama 3.3 designation: the 70B Instruct checkpoint. There was no corresponding 8B variant, no Llama 3.3 405B, and no multimodal (vision) model. This single-variant strategy was deliberate rather than an omission.
The 70B parameter count occupies a practical middle ground in the open-weights landscape. It is large enough to handle complex reasoning and long-context tasks with high accuracy, and small enough to be served on infrastructure that does not require the largest GPU configurations. A single A100 80GB instance cannot fit a 70B model in full precision, but with 4-bit quantization it runs on a pair of consumer-grade GPUs such as two RTX 4090s (48GB combined), or on Apple Silicon Mac Studio configurations with 64GB or more of unified memory. The Llama 3.1 405B model, by contrast, requires multi-node deployments or large server-class GPU instances for reasonable inference throughput.
Meta's stated rationale was efficiency: advances in post-training had brought the 70B model close enough to 405B performance that the larger model was no longer necessary for most real-world applications. The 8B update was not released as Llama 3.3 because the smaller model did not achieve comparable performance gains relative to its predecessor. Vision capabilities were already covered by the Llama 3.2 11B and 90B Vision models, which had been released just three months earlier and remained the current multimodal offering in the Llama family.
Llama 3.3 70B uses the same base transformer decoder architecture as Llama 3.1 70B. The architectural design was not changed between the two releases; the improvements in Llama 3.3 come entirely from the post-training pipeline, not from modifications to the network structure.
The model is an auto-regressive decoder-only transformer with 80 layers. The hidden dimension is 8,192 and the feed-forward intermediate dimension is 28,672. The model uses Grouped-Query Attention (GQA) with 64 attention heads and 8 key-value heads, a configuration that substantially reduces the memory footprint of the key-value cache during inference compared to full multi-head attention, enabling more efficient long-context decoding.
The vocabulary size is 128,256 tokens, using a byte-level byte-pair encoding tokenizer (tiktoken-compatible). This vocabulary was chosen to provide good coverage of English, the seven other supported languages, and code across multiple programming languages without requiring excessive token splits for common programming keywords or non-ASCII characters.
Rotary Positional Embeddings (RoPE) are used for position encoding, with the base frequency scaled to support 128K context lengths. The context window of 128,000 tokens is native to the architecture and does not require custom extension techniques at inference time, though practical provider deployments sometimes cap available context at lower values.
Because the architecture is unchanged, any Llama 3.1 70B inference code, quantization pipeline, or serving infrastructure works identically with Llama 3.3 70B. The model uses the same prompt format as Llama 3.1, including the same special tokens and role delimiters, and prompts written for Llama 3.1 work without modification. This backward compatibility was an explicit design goal.
The differences between Llama 3.3 and Llama 3.1 at the 70B size are exclusively in the trained weights, which reflect improved post-training data and procedures.
Meta's post-training for Llama 3 models has been described in the Llama 3 technical report ("The Llama 3 Herd of Models," arXiv 2407.21783) as an iterative procedure combining supervised fine-tuning (SFT), rejection sampling (RS), and Direct Preference Optimization (DPO). For instruction-tuned models, the majority of SFT examples are generated synthetically, with human annotators involved in quality verification and feedback collection rather than primary data creation.
For Llama 3.3, Meta applied a refined version of this pipeline that increased the quality and diversity of post-training data, particularly in areas where Llama 3.1 70B lagged the 405B model. Specific improvements were targeted at mathematical reasoning, multilingual performance, coding, and tool use. The instruction-following improvements, which produced the notable IFEval score jump, resulted from expanded and more precisely calibrated SFT data for instruction-following tasks combined with reinforcement learning from human feedback (RLHF) stages weighted toward following complex multi-constraint instructions accurately.
The training was performed on H100-80GB GPUs. Meta reported a cumulative 39.3 million GPU hours of computation for Llama 3.3, a figure that covers the full pre-training (inherited from the Llama 3.1 base) and post-training pipeline.
Meta's headline claim for Llama 3.3 is that it achieves performance similar to Llama 3.1 405B on key benchmarks despite having roughly 83 percent fewer parameters. The comparison is most favorable in instruction following and coding, where Llama 3.3 meets or exceeds the 405B model, and least favorable in tasks requiring broad factual recall and deep reasoning chains, where the much larger model retains advantages.
| Benchmark | Llama 3.3 70B | Llama 3.1 70B | Llama 3.1 405B | GPT-4o | Notes |
|---|---|---|---|---|---|
| IFEval | 92.1 | 87.5 | 88.6 | 84.6 | Instruction following |
| MATH (0-shot, CoT) | 77.0 | 68.0 | 73.8 | 76.6 | Math competition problems |
| GPQA Diamond (0-shot, CoT) | 50.5 | 46.7 | 51.1 | 53.6 | Graduate-level science |
| HumanEval (0-shot) | 88.4 | 80.5 | 89.0 | 90.2 | Code generation |
| MBPP EvalPlus | 87.6 | 82.1 | 88.2 | n/a | Python coding |
| MMLU-Pro (5-shot, CoT) | 68.9 | 64.5 | 73.3 | 74.4 | Expert knowledge |
| MGSM (multilingual math) | 91.1 | 86.9 | 91.6 | n/a | 8-language math |
The IFEval result warrants particular attention. At 92.1, Llama 3.3 70B surpasses every model in the comparison including the 6x larger Llama 3.1 405B and GPT-4o. IFEval measures a model's ability to follow verifiable instructions, such as producing output in a specified format, avoiding certain words, or adhering to length constraints. This capability is directly relevant to production deployments where models must reliably conform to output schemas.
On MATH, the improvement over Llama 3.1 70B is approximately 9 percentage points (77.0 versus 68.0), bringing the 70B model within 3 points of the 405B. On MGSM, which evaluates mathematical reasoning across eight languages, Llama 3.3 scores 91.1 against the 405B's 91.6, a near-identical result.
The areas where Llama 3.1 405B retains a clear lead include MMLU-Pro (73.3 versus 68.9), where the much larger model's broader factual coverage and deeper reasoning contribute meaningfully, and GPQA Diamond, where the 405B's 51.1 edges the 70B's 50.5.
Groq, the inference accelerator company using Language Processing Units (LPUs), was among the first cloud providers to support Llama 3.3 and published independent speed benchmarks. Groq's standard Llama 3.3 70B Versatile endpoint achieves approximately 280 to 315 output tokens per second in measured throughput, making it the fastest publicly available inference endpoint for the model. A speculative decoding variant (Llama 3.3 70B SpecDec) achieves measured speeds above 1,600 tokens per second in peak throughput benchmarks, though at the cost of potential accuracy reduction on complex tasks.
For comparison, most GPU-based cloud inference providers (AWS Bedrock, Together.ai, Fireworks) deliver Llama 3.3 70B at roughly 40 to 100 output tokens per second.
Llama 3.3's 128,000-token context window, inherited from Llama 3.1, was at the time of release among the largest available in an open-weights model. The window supports roughly 100,000 words of text, approximately the length of a full novel, enabling use cases such as analyzing long technical documents, reviewing extended codebases without segmentation, and maintaining conversation histories across lengthy multi-turn dialogues.
The extended context is supported natively by the model architecture through RoPE frequency scaling. However, Meta and independent researchers have noted that, in common with other large language models, Llama 3.3 does not uniformly maintain performance across the full context window. Content placed in the middle of very long contexts is processed less accurately than content near the beginning or end, a phenomenon known as the "lost-in-the-middle" effect. Effective retrieval-augmented generation (RAG) pipelines typically account for this by placing the most relevant retrieved passages at the beginning or end of the prompt.
In practice, many cloud API providers offering Llama 3.3 enforce lower default context limits (8K or 32K) and require explicit configuration or a higher service tier to access the full 128K window. This reflects both cost management (longer contexts are more expensive to serve) and infrastructure constraints on some hardware configurations.
Llama 3.3 was trained with explicit multilingual coverage for eight languages: English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai. These eight languages were selected based on their global speaker populations and the availability of high-quality training data.
The multilingual improvement over Llama 3.1 70B is most clearly visible in the MGSM benchmark, where Llama 3.3 scores 91.1 compared to 86.9 for Llama 3.1 70B, a gain of approximately 4.2 points. This improvement was achieved through targeted multilingual SFT data expansion and better calibration of the post-training mix across languages.
Meta's model card explicitly notes that Llama 3.3 is not tested or evaluated for languages outside the eight supported ones, and that outputs in unsupported languages may be less accurate or inconsistent. Developers building applications for languages other than the eight supported ones are advised to fine-tune the model on domain-specific data in those languages and to implement system-level controls.
The eight-language scope is narrower than some competing open-weights models. Qwen 2.5 72B, released by Alibaba Cloud in September 2024, supports 29 languages. This difference reflects different research and product priorities between the two organizations, with Meta focusing on concentrated improvements in a smaller set of major world languages.
Llama 3.3 retains and improves upon the tool calling format introduced in Llama 3.1. The model can invoke user-defined functions by generating structured JSON payloads in a specific format, and it supports both single-tool and parallel tool calling (calling multiple functions in a single inference pass). The prompt format for tool calling is identical between Llama 3.1 and Llama 3.3, ensuring that agentic frameworks and orchestration tools built for Llama 3.1 (such as those using LangChain, LlamaIndex, or custom function-calling frameworks) work without modification.
The improvements in tool use in Llama 3.3 are primarily behavioral rather than architectural. The model more reliably identifies when a tool call is appropriate versus when to respond directly, reduces spurious tool calls on prompts that should be answered without tool use, and produces more accurately structured JSON payloads with fewer formatting errors. These improvements are the result of expanded and higher-quality tool-use training data in the post-training pipeline.
Llama 3.3's tool calling format is compatible with the Llama 3.2 format, providing a consistent interface across recent Llama generations. Developers using the model card prompt format can rely on a stable specification for tool invocation.
Llama 3.3 70B is available through all major cloud and inference API providers. Pricing varies considerably across providers, reflecting different infrastructure, caching policies, and service tiers.
| Provider | Input (per 1M tokens) | Output (per 1M tokens) | Notes |
|---|---|---|---|
| Groq | ~$0.59 | ~$0.79 | Fastest throughput; LPU-based |
| AWS Bedrock | ~$0.72 | ~$0.72 | Managed service; enterprise compliance |
| Google Vertex AI | ~$0.36 | ~$0.36 | Competitive managed pricing |
| Together.ai | ~$0.88 | ~$0.88 | Developer-friendly; flexible batching |
| Fireworks.ai | ~$0.70 | ~$0.70 | Low latency; JSON mode support |
| DeepInfra | ~$0.23 | ~$0.40 | Budget option; strong throughput |
| OpenRouter (paid) | ~$0.20 | ~$0.20 | Routing service; free tier available |
Note: Cloud provider pricing changes frequently. Figures above represent approximate rates as of mid-2025 and may have changed.
Llama 3.3 70B is also available for free in limited quantities through providers including OpenRouter's free tier, Hugging Face Inference Endpoints, and GitHub Models (available since December 13, 2024, when GitHub added the model in general availability).
Relative to Llama 3.1 405B, Llama 3.3 70B is substantially cheaper to run. The 405B model requires proportionally more GPU memory and compute per token, making it roughly 3x to 5x more expensive on a per-token basis at most providers.
| Llama 3.3 70B | Llama 3.1 70B | Llama 3.1 405B | Qwen 2.5 72B | |
|---|---|---|---|---|
| Release date | Dec 2024 | Jul 2024 | Jul 2024 | Sep 2024 |
| Parameters | 70B | 70B | 405B | 72B |
| Context window | 128K | 128K | 128K | 128K |
| Languages (trained) | 8 | 8 | 8 | 29 |
| IFEval | 92.1 | 87.5 | 88.6 | ~88.0 |
| MATH | 77.0 | 68.0 | 73.8 | 83.1 |
| GPQA Diamond | 50.5 | 46.7 | 51.1 | 49.0 |
| HumanEval | 88.4 | 80.5 | 89.0 | 86.6 |
| MMLU-Pro | 68.9 | 64.5 | 73.3 | 71.1 |
| Tool calling | Yes | Yes | Yes | Yes |
| Vision | No | No | No | No |
| License | Llama 3.3 Community | Llama 3.1 Community | Llama 3.1 Community | Apache 2.0 |
Llama 3.3 is a strict improvement over Llama 3.1 70B across all major benchmarks. The gains are largest in instruction following (+4.6 on IFEval), mathematics (+9.0 on MATH), multilingual math (+4.2 on MGSM), and coding (+7.9 on HumanEval). Because the architecture and prompt format are identical, upgrading from Llama 3.1 70B to Llama 3.3 requires only a model weight swap, with no changes to inference infrastructure.
Llama 3.3 70B exceeds Llama 3.1 405B on IFEval (92.1 versus 88.6) and performs within a few points on most other benchmarks. The 405B retains advantages on tasks requiring deep factual recall (MMLU-Pro: 73.3 versus 68.9) and complex multi-step reasoning. In practice, Meta's own documentation positions Llama 3.3 70B as the recommended model for most production deployments that were previously using 405B, with 405B reserved for tasks such as synthetic data generation or knowledge distillation where maximum capability matters more than serving cost.
Qwen 2.5 72B, from Alibaba Cloud's Qwen research team, is the most direct competitor at a similar parameter count. The two models are competitive with no clear overall winner. Llama 3.3 outperforms Qwen 2.5 72B on IFEval and HumanEval. Qwen 2.5 72B leads significantly on MATH (83.1 versus 77.0) and modestly on MMLU-Pro (71.1 versus 68.9). Qwen 2.5 supports 29 languages versus Llama 3.3's 8. Llama 3.3 carries the more permissive heritage in terms of Western enterprise adoption, with broader native integration into AWS, Azure, and Google Cloud. Qwen 2.5 is distributed under the Apache 2.0 license, which is more permissive than the Llama 3.3 Community License for applications involving large user bases.
Llama 3.3 is distributed under the Llama 3.3 Community License Agreement, a custom license published at llama.com/llama3_3/license/. This license is similar in structure to the Llama 3.1 and Llama 3.2 Community Licenses, with some modifications.
The license grants a non-exclusive, worldwide, royalty-free right to use, reproduce, distribute, copy, create derivative works of, and modify the Llama 3.3 model weights. Commercial use is permitted for most developers without a separate agreement.
Key restrictions include:
User threshold: Developers whose products or services built using Llama 3.3 exceed 700 million monthly active users must request a separate commercial license from Meta. In practice, this threshold affects only the largest global technology platforms.
Attribution requirement: Any distribution of the model or derivative works must include an attribution notice stating that the product uses Llama 3.3 and is licensed under the Llama 3.3 Community License.
Use restrictions: The license prohibits using Llama 3.3 to train models intended to compete with Meta's own AI products, illegal activities, and actions that violate applicable law.
EU geographic restriction: One notable addition in the Llama 3.3 license, compared to some earlier Llama licenses, is a clause stating that the license's rights do not automatically extend to individuals domiciled in or companies with a principal place of business in the European Union. This restriction does not apply to end users of products or services that incorporate the model; it applies to developers seeking to download, modify, or redistribute the weights. This clause appears to relate to EU AI Act compliance considerations, and developers operating primarily in the EU should review the license and seek legal guidance.
The Open Source Initiative (OSI) has stated that Meta's Llama licenses, including the Llama 3.3 Community License, do not meet the OSI definition of open source software, primarily because of the use restrictions and attribution requirements. The model is commonly described as "open weights" rather than "open source" in contexts where this distinction matters, such as license-compliance discussions in enterprise procurement.
Llama 3.3's combination of near-405B performance at 70B size makes it well-suited for several categories of applications.
Agentic workflows: The improved tool calling capabilities and high IFEval score make Llama 3.3 a strong backbone for agent frameworks that must reliably follow structured instructions, invoke tools in response to user queries, and format outputs according to schema requirements. Frameworks such as LangChain, CrewAI, and LlamaIndex added first-class support for Llama 3.3 promptly after its release.
Code generation and review: HumanEval performance of 88.4 and MBPP EvalPlus of 87.6 position Llama 3.3 as a capable coding assistant. The model can generate, review, debug, and explain code across major programming languages. Its open-weights nature makes it appealing for locally deployed developer tooling where data privacy is a concern.
Multilingual customer support: The eight supported languages and strong multilingual math reasoning (MGSM 91.1) make Llama 3.3 appropriate for customer-facing applications across English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai.
Synthetic data generation: Meta explicitly lists synthetic data generation as an intended use case. Large-scale data generation pipelines can use Llama 3.3 to produce labeled training data for smaller, task-specific models. Because the 70B model is substantially cheaper to run than the 405B, synthetic data generation at scale is more cost-effective.
Knowledge distillation: Similarly, Llama 3.3 can serve as a teacher model for knowledge distillation into smaller student models (such as the Llama 3.2 1B and 3B checkpoints), allowing practitioners to produce specialized lightweight models with the benefit of Llama 3.3's improved instruction following.
RAG and document analysis: The 128K context window accommodates full-length documents in many professional domains, including legal contracts, financial filings, and technical manuals, without chunking. Combined with the model's strong instruction following, it is effective for structured information extraction and summarization from long documents.
Local deployment: With 4-bit quantization, Llama 3.3 70B can run on a dual-RTX 4090 workstation or a high-memory Apple Silicon Mac (M2 Ultra or M3 Ultra with 192GB, or M2/M3 Max with 64GB), enabling fully offline deployments for organizations with data sovereignty requirements.
Full-precision (BF16) Llama 3.3 70B requires approximately 140 GB of VRAM, which is only achievable with multi-node GPU setups or very large server configurations. In practice, quantized variants are used for most deployments.
| Precision | Approximate VRAM | Typical hardware |
|---|---|---|
| BF16 (full) | ~140 GB | 2x H100 80GB or similar |
| 8-bit (INT8) | ~70 GB | 1x H100 80GB |
| 4-bit (Q4_K_M) | ~40-43 GB | 2x RTX 4090, Apple M2/M3 Ultra |
| 4-bit (Q4_0) | ~35 GB | 2x RTX 3090/4090 |
| 2-bit quantized | ~20 GB | Single high-end consumer GPU |
Ollama supports Llama 3.3 70B natively and automatically splits layers across multiple GPUs when available. LM Studio and llama.cpp both provide GGUF-format quantized versions. For 4-bit quantization at context lengths up to 16K, a single RTX 4090 24GB is sufficient. Longer contexts at the same quantization require more memory.
Llama 3.3's release on December 6, 2024 was covered by TechCrunch, InfoQ, SiliconAngle, and Simon Willison's technology blog, among others. The reaction from developers and researchers was generally positive, with the efficiency argument being widely acknowledged: a 70B model that approaches 405B performance on most practical benchmarks represents a meaningful advance in the cost-effectiveness of open-weights AI.
Developer platform adoption was rapid. GitHub added Llama 3.3 70B Instruct to GitHub Models in general availability on December 13, 2024, one week after the model release. Microsoft Azure AI Foundry announced support shortly after. AWS Bedrock, Google Vertex AI, Groq, Fireworks, and Together.ai all had Llama 3.3 available within days of the release.
Groq published an analysis framing Llama 3.3 as a challenge to the "death of scaling laws" narrative, arguing that the model demonstrated continued capability growth from post-training refinement rather than pure parameter scaling. This framing resonated with ongoing industry debates about whether pre-training scaling alone remained the dominant lever for AI capability improvement.
The IFEval result, which showed Llama 3.3 70B exceeding Llama 3.1 405B on instruction following, was frequently cited in community discussions on Hacker News and AI Twitter/X as evidence that careful post-training could compensate for reduced model size in instruction-following-intensive use cases.
Some developers noted that while the benchmark numbers were compelling, real-world performance on complex multi-step tasks showed the 405B model still outperforming the 70B in practice, particularly on tasks requiring recall of obscure factual information and long chains of dependent reasoning steps. The benchmark results for MMLU-Pro (68.9 versus 73.3) and GPQA Diamond (50.5 versus 51.1) indicate that the 405B's advantage in knowledge-intensive tasks, while narrowed, was not eliminated.
Llama 3.3 carries limitations common to large instruction-tuned language models, as well as some that are specific to its scope and training.
Parameter count ceiling: Despite outperforming Llama 3.1 405B on some benchmarks, Llama 3.3 70B does not match the 405B on tasks that directly benefit from breadth of world knowledge or very long chains of multi-step reasoning. MMLU-Pro performance of 68.9 versus the 405B's 73.3 reflects this gap.
Hallucinations: The model generates incorrect factual claims at rates consistent with models in its capability class. Like all contemporary large language models, it can produce plausible-sounding but fabricated citations, statistics, or biographical details. Production deployments should implement retrieval-augmented generation or factual grounding where accuracy on specific factual claims is required.
Context degradation: While the model supports 128K tokens of context, accuracy on retrieval tasks degrades for content placed in the middle of very long contexts. Long-context tasks requiring precise recall of all information throughout a document should be approached with awareness of this limitation and with prompt engineering strategies that place critical content near context boundaries.
Limited language support: The eight supported languages are a subset of global language needs. Applications targeting languages outside English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai may encounter inconsistent output quality, and Meta explicitly advises against relying on Llama 3.3 for these languages without additional fine-tuning.
No vision capability: Llama 3.3 is a text-only model. Applications requiring image understanding must use a different model, such as Llama 3.2 11B Vision or 90B Vision from the Llama family, or a proprietary multimodal model.
Safety and alignment: Meta's responsible use guidelines note that Llama 3.3 may produce harmful, offensive, or biased outputs in adversarial or edge-case inputs. The model is aligned through RLHF but is not immune to jailbreak attempts or prompt injection in agentic settings. Meta provides Llama Guard and Prompt Guard as companion safety classifiers intended to be deployed alongside the model in production applications.
EU license restriction: The geographic restriction in the Llama 3.3 Community License may create compliance complexity for developers and organizations whose principal operations are in the European Union. End users in the EU can use applications built on Llama 3.3, but EU-based developers building, distributing, or modifying the model weights are subject to the license's explicit exclusion and need to obtain separate authorization.
Hardware demands for local deployment: Full-precision deployment requires enterprise-class GPU hardware. Even 4-bit quantized deployment at moderate context lengths requires high-end consumer hardware configurations, which limits genuinely local deployment to well-resourced developer environments.