Groq
Last reviewed
Sources
22 citations
Review status
Source-backed
Revision
v3 ยท 2,259 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
Sources
22 citations
Review status
Source-backed
Revision
v3 ยท 2,259 words
Add missing citations, update stale details, or suggest a clearer explanation.
Groq, Inc. is an American semiconductor and cloud computing company, headquartered in Mountain View, California, that builds processors and data center infrastructure for AI inference, the serving of trained models rather than their training. Founded in 2016 by Jonathan Ross, one of the creators of Google's tensor processing unit (TPU), Groq designed the language processing unit (LPU), a deterministic, SRAM-only chip that Groq calls "a new category of processor" built "from the ground up to meet the unique needs of AI" [4]. The LPU became known in 2024 for serving large language models at record speeds, more than 300 tokens per second per user on Llama 2 70B, through its GroqCloud platform [1][2][4]. In December 2025, Nvidia agreed to pay approximately $20 billion in cash, the largest transaction in its history, for a non-exclusive license to Groq's inference technology; Ross and most of Groq's senior leadership joined Nvidia, while Groq itself remained an independent company operating GroqCloud [14][15]. Groq is unrelated to Grok, the chatbot from Elon Musk's xAI, a similarity the company has publicly contested [22].
Groq's pitch inverted the usual accelerator playbook: instead of chasing peak training throughput, it optimized a chip and compiler stack for low-latency, predictable token generation. The company claims its LPU-based systems run large language models substantially faster and up to 10 times more energy-efficiently, on an architectural level, than GPU-based serving [4][6]. After pivoting from hardware sales to a tokens-as-a-service cloud business in 2024, Groq grew from roughly 356,000 registered developers in late 2024 to more than two million by September 2025, alongside Fortune 500 customers [6][7]. Its valuation rose in step: $2.8 billion in August 2024, $6.9 billion in September 2025, and an effective price near $20 billion in the Nvidia deal three months later [5][6][14]. "Inference is defining this era of AI," founder Jonathan Ross said when Groq raised $750 million in September 2025, "and we're building the American infrastructure that delivers it with high speed and low cost" [7].
Ross began the project that became Google's TPU as a 20 percent side effort and helped deploy it across Google's data centers before leaving in 2016 to start Groq with Douglas Wightman, a former Google X engineer [1][2]. Chamath Palihapitiya's Social Capital provided early backing of about $10 million, disclosed in 2017, and the startup spent its first years in relative stealth [1].
Groq's first chip, described in a 2020 paper at the ISCA computer architecture conference, was called the Tensor Streaming Processor (TSP); the company rebranded the architecture as the language processing unit after ChatGPT made LLM serving the dominant accelerator workload [1][2]. Along the way Groq acquired Maxeler Technologies, a dataflow computing firm, in March 2022, and announced in August 2023 that its next-generation chip would be fabricated on Samsung's 4 nm process at the foundry's Taylor, Texas plant [1].
The breakout came in early 2024, when public demos of Llama 2 70B generating more than 300 tokens per second per user went viral and the company soft-launched GroqCloud [3][4]. In March 2024 Groq acquired Definitive Intelligence, whose co-founder Sunny Madra went on to run GroqCloud and later became Groq's president [1][6].
The LPU departs from GPU design in two linked ways: determinism and memory. The chip is a single large core with a "functionally sliced" layout, in which memory units are interleaved with vector and matrix units, and it omits the speculative machinery of conventional processors: there are no branch predictors, caches, or reorder buffers [2][3]. Because the hardware's timing is fully predictable, Groq's compiler statically schedules every operation, memory access, and inter-chip packet down to the clock cycle, which the company argues removes the tail latency that batching and dynamic scheduling create on GPUs [3][4].
Instead of external high-bandwidth memory, each first-generation LPU carries 230 MB of on-chip SRAM as its primary weight storage, delivering on the order of 80 TB/s of internal bandwidth, far more than HBM-based GPU stacks [3]. The trade-off is capacity: a single chip holds only a fraction of a modern model's weights, so production deployments shard one model across hundreds of interconnected LPUs acting as a synchronized assembly line [3][21]. That makes individual answers very fast but requires large fleets per model. (Groq's chip and rack engineering is covered in more detail at Groq hardware.)
| First-generation GroqChip (TSP) | Specification |
|---|---|
| Process node | GlobalFoundries 14 nm [2] |
| Die size / transistors | 25 mm by 29 mm, 26.8 billion transistors [2] |
| Nominal clock | 900 MHz [2] |
| Peak compute | 820 teraoperations per second [2] |
| On-chip SRAM | 230 MB at roughly 80 TB/s [3] |
| External DRAM/HBM | None [3] |
| Early benchmark | ResNet-50 at 20,400 images per second, batch size 1 [2] |
On LLM workloads, Groq reported over 300 tokens per second on Llama 2 70B and about 480 tokens per second on Mixtral 8x7B in early 2024, several times faster than contemporary GPU endpoints [4][20]. A planned second-generation LPU moves from 14 nm to Samsung's 4 nm node [1].
GroqCloud, launched in February 2024, exposes LPU clusters through an OpenAI-compatible API and a self-serve console with free and paid tiers, selling inference as metered tokens rather than hardware [1][4][6]. The catalog focuses on open-weight models, including Meta's Llama family, Mistral's Mixtral, Google's Gemma, Whisper for speech recognition, and OpenAI's gpt-oss models [4][21].
The platform became the company's growth engine: its $640 million Series D was explicitly raised to expand tokens-as-a-service capacity [5]. By May 2025 Groq operated data centers across North America, Europe, and the Middle East, including new sites in Houston (with DataBank) and Dallas (with Equinix), and said its network could serve more than 20 million tokens per second [12]. TechCrunch reported more than two million developers on the platform as of September 2025 [6].
| Date | Round | Amount | Valuation | Lead / notable investors |
|---|---|---|---|---|
| 2017 | Early venture | ~$10M | n/a | Social Capital [1] |
| April 2021 | Series C | $300M | >$1B | Tiger Global Management, D1 Capital Partners [1] |
| August 2024 | Series D | $640M | $2.8B | BlackRock Private Equity Partners; Neuberger Berman, Type One Ventures, Cisco Investments, KDDI's Global Brain fund, Samsung Catalyst Fund [5] |
| September 2025 | Growth round | $750M | $6.9B post-money | Disruptive; BlackRock, Neuberger Berman, Deutsche Telekom Capital Partners, Samsung, Cisco, D1, Altimeter, 1789 Capital, Infinitum [6][7] |
| May 2026 | Internal round | up to $650M | undisclosed | Existing investors pro rata, backstopped by Disruptive and Infinitum [18][19] |
Separately, the Kingdom of Saudi Arabia announced a $1.5 billion commitment at the LEAP 2025 conference in February 2025 to fund expanded delivery of Groq's inference infrastructure in the country [8].
Growth was not linear. In July 2025, The Information reported that Groq had cut its 2025 revenue projection from more than $2 billion to a bit over $500 million, citing delays in securing data center capacity, shortly after sharing the higher figure with investors [13].
Groq's flagship international deployment is in Saudi Arabia. After signing a memorandum of understanding with Aramco Digital at LEAP 2024, the companies announced progress that September on what they described as the world's largest AI inferencing data center [9]. In December 2024, Groq airlifted racks to Dammam and brought a cluster of about 19,000 LPUs online in eight days, which the partners called the largest AI compute hub in the EMEA region and Groq's second GroqCloud region globally [10]. The February 2025 announcement of $1.5 billion in Saudi backing extended that buildout, which also supports the Saudi Data and Artificial Intelligence Authority's Arabic language model ALLaM [8]. In May 2025, HUMAIN, the AI holding company backed by Saudi Arabia's Public Investment Fund and chaired by Crown Prince Mohammed bin Salman, selected Groq as its inference provider [11].
In Canada, Groq became the exclusive inference provider for Bell Canada's "Bell AI Fabric" sovereign AI network, announced in May 2025: a planned six-site buildout targeting 500 MW of hydro-powered capacity, beginning with a 7 MW Groq facility in Kamloops, British Columbia [12].
On December 24, 2025, CNBC reported that Nvidia would acquire assets from Groq for about $20 billion in cash, by far the largest deal in Nvidia's history and roughly three times Groq's September valuation [14]. Both companies framed the transaction as a non-exclusive licensing agreement for Groq's inference technology rather than an acquisition: Nvidia said it was "not an acquisition of the company," and Groq retained the right to keep using and licensing its own technology [15][16]. Ross, president Sunny Madra, and other senior leaders moved to Nvidia, while Groq said it would continue as an independent company with chief financial officer Simon Edwards stepping up as CEO and GroqCloud operating without interruption [15][16]. Analysts widely read the structure as an acqui-hire that removed a competitor while sidestepping merger review [16].
The aftermath reshaped the remaining company. Edwards departed in April 2026 to become CFO of Bloom Energy [17]. By late May 2026, Groq was led by interim CEO Adam Winter and interim CFO Matt Eng and was raising up to $650 million from existing investors, with Disruptive and Infinitum committed to backstopping the round, to fund a "second act" as an inference neocloud, a managed cloud built on its existing LPU fleet rather than on new chip development [18][19].
Groq competed most directly with two other US inference-chip startups, Cerebras Systems, whose wafer-scale engines take the opposite approach of one enormous chip, and SambaNova Systems, whose reconfigurable dataflow units pair SRAM with HBM and DRAM to fit larger models per node, as well as with the Nvidia GPU clouds that dominate the market [20]. The three challengers fought a public "token war" on Artificial Analysis leaderboards: in late 2024 measurements on Llama 3.1 8B, Groq served about 750 tokens per second against SambaNova's 1,084 and Cerebras's 1,800, while Nvidia H100-based clouds ranged from 72 to 257; on Llama 3.1 70B the three were closely matched, with Groq at 544 tokens per second [20]. Cerebras has since claimed multi-fold speed advantages over Groq on newer models such as gpt-oss-120B, citing the same benchmarking firm [21].
Groq's counterarguments centered on cost per token, deterministic latency, and deployment speed, exemplified by the eight-day Dammam installation [10]. The December 2025 Nvidia license was widely interpreted as validation that the LPU's deterministic, SRAM-centric design mattered enough for the GPU incumbent to pay a record sum for it [14][16].
Groq's name, like that of xAI's Grok chatbot released in November 2023, derives from the verb "to grok" coined by science-fiction author Robert A. Heinlein. Groq registered its trademark when it was founded in 2016, and in November 2023 it published a tongue-in-cheek public cease-and-desist blog post titled "Hey Elon: It's Time To Cease & De-Grok," asserting its prior claim to the name [22]. The two companies remain unaffiliated, and no public resolution of the dispute has been reported.