Groq

AI Companies AI Hardware Artificial Intelligence

15 min read

Updated Jun 21, 2026

Suggest edit History Talk

RawGraph

Last edited

Jun 21, 2026

Fact-checked

In review queue

Sources

19 citations

Revision

v6 · 3,006 words

Fact-checks are independent of edits: a reviewer re-verifies the article against its sources and stamps the date. How we verify

Groq is an American artificial intelligence hardware company, founded in 2016 by former Google Tensor Processing Unit engineer Jonathan Ross, that designs custom silicon called the Language Processing Unit (LPU) for ultra-low-latency AI inference.^[1] Groq's defining feature is a deterministic computing architecture in which every operation completes in a fixed, known number of clock cycles, with the compiler scheduling all instruction timing, memory access, and data movement before execution begins.^[4] The company gained widespread attention in February 2024 when public demos of its inference speed went viral, and in December 2025 NVIDIA agreed to a non-exclusive licensing deal for Groq's inference technology reportedly valued at roughly $20 billion.^[7]

This article covers Groq Inc., the company, its history, products, and business operations. For an in-depth technical description of the chip itself, including the Tensor Streaming Processor design, deterministic compiler scheduling, on-chip SRAM, and rack-level architecture, see Groq LPU.

Who founded Groq and when?

Groq was founded in 2016 by Jonathan Ross along with several other former Google engineers.^[1] Ross had been one of the key architects behind Google's Tensor Processing Unit (TPU), the custom AI accelerator that Google developed internally to handle the computational demands of its machine learning workloads.^[1] The experience of building the TPU gave Ross insight into the limitations of existing processor architectures for AI workloads, particularly for inference, where latency and predictability matter more than raw training throughput.

Ross founded Groq with the thesis that inference workloads required a fundamentally different architectural approach than what GPUs or even Google's TPUs provided. While GPUs excel at parallel computation for training, their complex memory hierarchies, caches, and scheduling mechanisms introduce unpredictable latency during inference.^[4] Ross wanted to build a chip where execution time could be determined at compile time, not at runtime.^[4]

The early years focused on chip design and compiler development, reflecting the company's philosophy that hardware and software must be co-designed. Ross has described this approach in public talks as treating the compiler as a first-class citizen of the architecture rather than an afterthought layered on top of existing hardware.

The company's name, Groq, is unrelated to Elon Musk's AI chatbot Grok (developed by xAI), which launched later.^[1] The similarity in names has been a source of occasional confusion, though the two companies operate in entirely different segments of the AI market.

Key acquisitions

Groq has expanded its capabilities through two acquisitions:

Year	Target	Purpose
March 2022	Maxeler Technologies	UK-based dataflow computing firm; added HPC and financial services expertise
March 2024	Definitive Intelligence	Accelerated development of GroqCloud platform

Products

Groq's product portfolio centers on its custom inference silicon and the cloud platform that exposes it to developers.

Language Processing Unit (LPU)

The Language Processing Unit is Groq's custom-designed processor, originally called the Tensor Streaming Processor (TSP) before being rebranded to reflect its strengths in language model inference.^[5] The LPU represents a fundamental departure from both GPU and TPU architectures.

Key design choices:

Deterministic execution. All instruction scheduling, memory access, and data movement are determined by the compiler before execution begins. The hardware contains no dynamic scheduling, branch prediction, or cache coherence logic, so every operation completes in a fixed, known number of clock cycles.^[4]
On-chip SRAM as primary memory. The LPU has no external HBM. The first-generation GroqChip1 holds 230 MB of SRAM that delivers up to 80 TB/s of bandwidth, roughly 24 times the memory bandwidth of an NVIDIA H100.^[5] The trade-off is per-chip capacity, which forces large models to be sharded across many chips.
Functionally sliced microarchitecture. Memory units are interleaved with vector and matrix compute units across the chip in vertical columns, allowing data to flow through the chip in a single direction without backtracking to a central memory.^[5]
Single-core design. Unlike GPUs with thousands of cores or TPUs with systolic arrays, the LPU is fundamentally a single-core processor, simplifying programming and synchronization.^[5]

The GroqChip1 was fabricated by GlobalFoundries on a 14 nm process. The second-generation Groq 3 LPU (LP30) is fabricated by Samsung on a 4 nm process, with 500 MB of SRAM and 150 TB/s of memory bandwidth.^[13] Detailed coverage of the architecture, the GroqRack system, the multi-chip deterministic fabric, and benchmark results lives in the Groq LPU article.

GroqCloud

GroqCloud is Groq's cloud-based API platform that provides developers with access to LPU-powered inference. The platform supports a range of open-source models and offers an API that is compatible with OpenAI's API format for ease of integration.^[9]

As of early 2026, GroqCloud supports the following model families:

Model Family	Variants	Context Length
Llama 3.x	8B, 70B (Llama 3.1, 3.3)	Up to 128K
OpenAI gpt-oss	20B, 120B	128K
DeepSeek	Various	Model-dependent
Qwen 3	Various	Model-dependent
Mistral	Various	Model-dependent
Whisper (speech-to-text)	large-v3, large-v3-turbo	Audio input

GroqCloud uses a pay-as-you-go pricing model with three tiers: Free, Developer, and Enterprise.^[10]

Model	Input price (per M tokens)	Output price (per M tokens)
gpt-oss-120B	$0.15	$0.75
gpt-oss-20B	$0.10	$0.50
Whisper large-v3	$0.111/hour	-
Whisper large-v3-turbo	$0.04/hour	-

GroqCloud also offers batch processing at 50% lower cost for asynchronous workloads, and prompt caching provides an additional 50% discount on cached input tokens.^[10]

The GroqCloud developer base has grown rapidly: from roughly 356,000 developers in 2024 to more than two million developers and multiple Fortune 500 companies by September 2025, and to more than 3.5 million developers by February 2026.^[17]^[3]

Compound AI System

In 2025, Groq launched Compound, its first agent and compound AI system, on GroqCloud.^[6] Compound integrates agentic AI capabilities with server-side tool use, allowing developers to build systems that can conduct research, execute code, control browsers, and navigate the web.^[6] All tool calls run server-side on Groq's inference fleet, keeping latency low. The orchestration layer determines which tools (web search, Wolfram Alpha, code execution, browsers) are needed and manages iterative reasoning loops where the model consumes tool outputs and refines its responses.^[6]

Compound moved to general availability on October 1, 2025, delivering approximately 25 percent higher accuracy and roughly 50 percent fewer errors across evaluation benchmarks compared to its preview version.^[6]

How fast is Groq inference?

Groq gained massive public attention in February 2024 when demonstrations of its inference speed went viral on social media.^[1] Users reported receiving responses from large language models at speeds that felt instantaneous, with tokens appearing faster than they could be read.

Groq has published and demonstrated inference speeds across multiple popular open-source models:

Model	Tokens/second	Date	Notes
Llama 2 70B Chat	241	Feb 2024	Early viral demos
Mixtral 8x7B	500+	Feb 2024	Mixture-of-experts model
Llama 3 8B	800+	Apr 2024	Day-zero launch support
Llama 3 70B	300+	Apr 2024	Standard decoding
Llama 3 70B (speculative)	1,660+	Late 2024	With speculative decoding
Llama 3.3 70B	Record-setting	Jan 2025	New speed benchmark

In February 2024, Groq's LPU Inference Engine led the first independent LLM benchmark conducted by ArtificialAnalysis.ai, besting eight participants on key indicators including latency versus throughput, throughput over time, and total response time.^[12] The benchmark measured Groq's Llama 2 Chat (70B) API at 241 tokens per second, more than double the speed of other hosting providers, with total time to receive 100 output tokens of about 0.8 seconds.^[12] The deterministic architecture also provides identical median and tail latency, an unusual property for production AI infrastructure.^[4]

How does Groq differ from NVIDIA GPUs?

Groq competes in the AI inference accelerator market against several players, each with different architectural approaches:

Competitor	Architecture	Focus	Key Differentiator
NVIDIA	GPU (H100, Blackwell)	Training and inference	Ecosystem breadth, CUDA
Cerebras	Wafer-scale engine	Training and inference	On-chip SRAM bandwidth
Google	TPU	Training and inference	Vertical integration
AMD	GPU (MI300X)	Training and inference	Price-performance ratio
Amazon	Inferentia/Trainium	Cloud inference	AWS integration
SambaNova	Reconfigurable dataflow	Enterprise AI	Dataflow architecture

Groq's primary differentiator is its focus on inference-only workloads. While competitors like NVIDIA and Google design chips that handle both training and inference, Groq has optimized its architecture exclusively for inference, betting that the inference market will grow substantially larger than the training market as deployed AI models serve billions of users.^[8] The company's deterministic latency guarantee is particularly valuable for real-time applications and agentic AI systems that require predictable response times.^[8] In the GPU model, complex caches, branch predictors, and dynamic scheduling make per-request latency variable; the LPU removes that reactive hardware entirely so the same workload runs in the same number of cycles every time.^[4]

Inference market thesis

Groq's strategic bet rests on the observation that while training a model happens once (or a few times), inference happens billions of times as the model serves users.^[8] As AI moves from the research and training phase into mass deployment, the ratio of inference compute to training compute is expected to shift dramatically in favor of inference. Groq estimates that inference will eventually consume 10x or more compute than training, making inference-specialized hardware increasingly valuable.^[8]

Infrastructure and Deployment

Groq planned to deploy over 108,000 LPUs manufactured by GlobalFoundries by the end of Q1 2025, which would represent the largest AI inference compute deployment by any non-hyperscaler.^[1] The company has built data centers across North America, Europe, and the Middle East.

In February 2025, at the LEAP 2025 technology conference in Riyadh, the Kingdom of Saudi Arabia announced a $1.5 billion commitment (through HUMAIN) to expand Groq's LPU-based AI inference infrastructure within the country.^[11] The commitment was made jointly by Groq CEO Jonathan Ross and representatives from Saudi Aramco.^[11] The investment is tied to Saudi Arabia's Vision 2030 economic diversification program. Groq's Dammam data center, which went live in December 2024, hosts inference workloads for Saudi Aramco's Norous generative AI assistant and supports development of ALLaM, a large language model developed by the Saudi Data and Artificial Intelligence Authority (SDAIA).^[11]

IBM partnership

On October 20, 2025, IBM and Groq announced a strategic go-to-market and technology partnership that integrates GroqCloud inference with IBM's watsonx Orchestrate platform, plans to combine Red Hat's open-source vLLM with Groq's LPU architecture, and brings IBM Granite models to GroqCloud.^[18] IBM stated that GroqCloud delivers over 5x faster and more cost-efficient inference than traditional GPU systems for the targeted enterprise agentic workloads.^[18] "With Groq's speed and IBM's enterprise expertise, we're making agentic AI real for business," said Jonathan Ross, then CEO of Groq.^[18]

Why did NVIDIA license Groq's technology?

In December 2025, NVIDIA and Groq announced a landmark agreement reportedly valued at approximately $20 billion.^[7] The deal involved NVIDIA licensing Groq's AI inference technology through a non-exclusive licensing agreement signed on December 24, 2025.^[7] The agreement was structured to deliver $17 billion in cash payments across three installments by the end of 2026, with several senior Groq executives, including founder Jonathan Ross and president Sunny Madra, transferring to NVIDIA as part of the arrangement.^[7] CFO Simon Edwards stepped up as CEO of the continuing independent Groq entity.^[7]^[16]

The deal was widely interpreted as an acknowledgment from NVIDIA that Groq's deterministic inference architecture offered capabilities that NVIDIA's GPU-based approach could not easily replicate.^[7] Analysts noted that structuring the agreement as a non-exclusive license rather than an outright acquisition was likely intended to reduce antitrust scrutiny, given NVIDIA's roughly 85 to 90 percent share of the AI accelerator market.^[7] For Groq, the deal provided substantial capital while allowing the company to continue operating independently and licensing its technology non-exclusively.

Groq 3 LPU

The first tangible result of the NVIDIA partnership emerged at GTC 2026 in March, just three months after the licensing agreement. NVIDIA unveiled the Groq 3 LPU (designated LP30), along with the LPX server node:^[15]

Specification	GroqChip1	Groq 3 (LP30)
On-chip SRAM	230 MB	500 MB
SRAM bandwidth	80 TB/s	150 TB/s
Fabrication	GlobalFoundries 14 nm	Samsung 4 nm
Integration	Standalone	Pairs with Vera Rubin GPU platform

The Groq 3 LPX server rack packs 128 LPUs and, when paired with NVIDIA's Vera Rubin CPU-GPU super-rack, promises 35x higher throughput per megawatt than previous-generation inference solutions.^[14] Industry analysts expect NVIDIA to integrate Groq's deterministic inference logic into its upcoming Vera Rubin architecture, creating a hybrid chip that combines the massive parallel processing of a traditional GPU with a dedicated inference engine powered by Groq's SRAM-based IP.^[14]

How much funding has Groq raised?

Groq has raised significant capital across multiple funding rounds:

Round	Date	Amount	Valuation	Key Investors
Seed	2017	$10M	-	Social Capital
Series B	2018	$52M	-	Social Capital, D1 Capital
Series C	April 2021	$300M	~$1B	Tiger Global, D1 Capital
Series D	August 2024	$640M	$2.8B	BlackRock Private Equity Partners
Growth round	September 2024	$750M	$6.9B	Disruptive, BlackRock, Neuberger Berman, DTCP

The rapid growth in valuation from $2.8 billion in August 2024 to $6.9 billion by September 2024 reflected the surging demand for inference infrastructure and investor confidence in Groq's differentiated technology.^[3] Including the $20 billion NVIDIA licensing payment, Groq's total capital base grew dramatically through 2025 and 2026. In May 2026, following the NVIDIA deal, Groq sought to raise an additional $650 million from existing investors to fund its pivot from AI chipmaker to an AI inference "neocloud" business, an effort the company internally framed as "Groq 2.0."^[19]

Competition and Competitive Positioning

Groq's competitive landscape and architectural positioning are summarized in the comparison table above. The company's deterministic, inference-only LPU sits apart from the dual-purpose training-and-inference designs of NVIDIA, Google, and AMD, and from the wafer-scale and reconfigurable-dataflow approaches of Cerebras and SambaNova.^[8]

Use Cases and Applications

Groq's low-latency inference is particularly well-suited to several application categories:

Real-time conversational AI. Chatbots and voice assistants that require sub-second response times benefit from the LPU's deterministic latency, ensuring consistent user experience regardless of system load.
Agentic AI workflows. Multi-step reasoning systems that make sequential LLM calls (tool use, chain-of-thought, reflection) accumulate latency at each step. The LPU's speed advantage compounds across multiple inference calls, making complex agent workflows feasible in real-time.^[8]
Financial trading. Applications requiring real-time analysis of news, earnings calls, or market data benefit from guaranteed latency bounds.
Healthcare decision support. Clinical decision support systems that need rapid model inference during patient encounters.
Content moderation. High-volume content platforms requiring real-time classification of user-generated content.
Robotics and live audio. Innate Robotics uses Groq's inference for service robots that must process sensor data in real time, and Groq's Whisper pipeline handles speech transcription at 228 times real-time speed for live captioning and voice-controlled interfaces.

Current State

As of early 2026, Groq has established itself as a leading inference infrastructure provider. The company powers more than 3.5 million developers, operates data centers on three continents, and has secured partnerships with major sovereign entities and technology companies, including IBM.^[17]^[18] The NVIDIA licensing deal validated the value of Groq's deterministic architecture, while continued funding rounds have provided capital for expansion.

The unveiling of the Groq 3 LPU at GTC 2026 marks a new chapter for the technology, with NVIDIA's manufacturing and distribution capabilities potentially bringing Groq's inference architecture to a far wider audience than the company could reach independently.^[15] GroqCloud continues to add model support and features, with the Compound AI system enabling more sophisticated agentic applications.

Groq's bet on inference as the dominant AI compute workload appears to be paying off, as the industry shifts from a training-focused phase to a deployment and scaling phase where inference costs and latency become the primary concerns.

References

"Groq." Wikipedia. https://en.wikipedia.org/wiki/Groq ↩
"Groq Raises $640M To Meet Soaring Demand for Fast AI Inference." Groq Newsroom. https://groq.com/newsroom/groq-raises-640m-to-meet-soaring-demand-for-fast-ai-inference
"Groq Raises $750 Million as Inference Demand Surges." Groq Newsroom. https://groq.com/newsroom/groq-raises-750-million-as-inference-demand-surges ↩
"Groq's Deterministic Architecture is Rewriting the Physics of AI Inference." Medium. https://medium.com/the-low-end-disruptor/groqs-deterministic-architecture-is-rewriting-the-physics-of-ai-inference-bb132675dce4 ↩
"Inside the LPU: Deconstructing Groq's Speed." Groq Blog. https://groq.com/blog/inside-the-lpu-deconstructing-groq-speed ↩
"Introducing the Next Generation of Compound on GroqCloud." Groq Blog. https://groq.com/blog/introducing-the-next-generation-of-compound-on-groqcloud ↩
"Nvidia's $20B Groq Acquisition: Why It Paid 2.9x Valuation for LPU Tech." IntuitionLabs. https://intuitionlabs.ai/articles/nvidia-groq-ai-inference-deal ↩
"Groq: Low-Latency AI Infrastructure in the Agentic Economy." Eleatiche. https://eleatiche.substack.com/p/groq-low-latency-ai-infrastructure ↩
"GroqCloud Supported Models." Groq Documentation. https://console.groq.com/docs/models ↩
"Groq On-Demand Pricing." Groq. https://groq.com/pricing ↩
"Saudi Arabia Announces $1.5 Billion Expansion to Fuel AI-powered Economy with AI Tech Leader Groq." Groq Newsroom (February 10, 2025). ↩
"Groq LPU Inference Engine Leads in First Independent LLM Benchmark." Groq Newsroom. https://groq.com/newsroom/groq-lpu-inference-engine-leads-in-first-independent-llm-benchmark ↩
"NVIDIA Introduces Groq LP30 and LPX Nodes." More Than Moore. https://morethanmoore.substack.com/p/nvidia-introduces-groq-lp30-and-lpx ↩
"Decoding the Future of Inference At NVIDIA: Groq LPUs Join Vera Rubin Platform." ServeTheHome. https://www.servethehome.com/decoding-the-future-of-inference-at-nvidia-groq-lpus-join-vera-rubin-platform-for-low-latency-inference/ ↩
"Nvidia Debuts Groq 3 Language Processing Unit for Multiagent Workloads." San Jose Today (March 2026). ↩
"Groq and Nvidia Enter Non-Exclusive Inference Technology Licensing Agreement to Accelerate AI Inference at Global Scale." Groq Newsroom (December 2025). https://groq.com/newsroom/groq-and-nvidia-enter-non-exclusive-inference-technology-licensing-agreement-to-accelerate-ai-inference-at-global-scale ↩
"Groq revenue, valuation and funding." Sacra. https://sacra.com/c/groq/ ↩
"IBM and Groq Partner to Accelerate Enterprise AI Deployment with Speed and Scale." IBM Newsroom (October 20, 2025). https://newsroom.ibm.com/2025-10-20-ibm-and-groq-partner-to-accelerate-enterprise-ai-deployment-with-speed-and-scale ↩
"Groq Seeks $650 Million Amid Shift to AI Inference Neocloud Business." PYMNTS (May 28, 2026). https://www.pymnts.com/news/investment-tracker/2026/groq-seeks-650-million-amid-shift-to-ai-inference-neocloud-business/ ↩

Improve this article

Add missing citations, update stale details, or suggest a clearer explanation. Every suggestion is reviewed for sourcing before it goes live.

5 revisions by 1 contributors · full history

Suggest edit

Groq

Who founded Groq and when?

Key acquisitions

Products

Language Processing Unit (LPU)

GroqCloud

Compound AI System

How fast is Groq inference?

How does Groq differ from NVIDIA GPUs?

Inference market thesis

Infrastructure and Deployment

IBM partnership

Why did NVIDIA license Groq's technology?

Groq 3 LPU

How much funding has Groq raised?

Competition and Competitive Positioning

Use Cases and Applications

Current State

See also

References

Improve this article

What links here (24 of 27)

What links here (24 of 27)

Who founded Groq and when?

Key acquisitions

Products

Language Processing Unit (LPU)

GroqCloud

Compound AI System

How fast is Groq inference?

How does Groq differ from NVIDIA GPUs?

Inference market thesis

Infrastructure and Deployment

IBM partnership

Why did NVIDIA license Groq's technology?

Groq 3 LPU

How much funding has Groq raised?

Competition and Competitive Positioning

Use Cases and Applications

Current State

See also

References

Improve this article

Related Articles

Nvidia

SmolVLA

Graphics processing unit

Cerebras Systems

Edge AI

NVIDIA Digits

What links here (24 of 27)

Related Articles

Nvidia

SmolVLA

Graphics processing unit

Cerebras Systems

Edge AI

NVIDIA Digits

What links here (24 of 27)