Taalas

AI Companies AI Hardware

9 min read

Updated Jun 8, 2026

Suggest edit History Talk

RawGraph

Last edited

Jun 8, 2026

Fact-checked

In review queue

Sources

10 citations

Revision

v1 · 1,795 words

Fact-checks are independent of edits: a reviewer re-verifies the article against its sources and stamps the date. How we verify

Taalas is a Toronto, Canada based AI chip startup that builds "hardwired" or model-specific silicon, in which an entire trained neural network is etched directly into a custom chip rather than executed as software on a general purpose processor. The company describes its approach as "casting intelligence directly into silicon": the weights and architecture of one specific model, such as Meta's Llama 3.1 8B, are baked into the hardware, eliminating most data movement and external memory and, the company claims, delivering very large gains in throughput, cost, and energy efficiency for that model. Taalas was founded in 2023 by Ljubisa Bajic, a co-founder and former chief executive of Tenstorrent and earlier an architect at AMD and ATI. ^[1]^[2]^[3] In February 2026 the company announced a $169 million funding round, bringing its total raised to roughly $219 million, and unveiled its first product, the HC1, a chip purpose-built to run Llama 3.1 8B. ^[1]^[4]

Overview

Taalas pursues what observers have called a "model-as-hardware" or "model-in-silicon" strategy, an extreme version of the model-specific application-specific integrated circuit (ASIC). Conventional AI accelerators, including NVIDIA GPUs, are programmable: they load a model's parameters from memory and stream them through compute units at inference time. Taalas instead fixes a single model permanently into the transistors of a chip, so that the parameters are physically part of the circuit and never have to be fetched from external high-bandwidth memory (HBM). The company argues that this removes the "memory wall," the bottleneck created by moving weights between memory and compute, which dominates the cost and power of modern AI inference. ^[3]^[5]

The central trade-off is flexibility. A Taalas chip can run only the one model it was manufactured for, so the approach is suited to high-volume, stable workloads where a popular model is served at scale, rather than to research or rapidly changing deployments. The company mitigates this somewhat: it says the HC1 retains a configurable context-window size and supports fine-tuning through low-rank adapters (LoRAs), so the underlying weights are fixed but limited customization remains possible. ^[5]^[6]

Founding (Ljubisa Bajic)

Taalas was founded in 2023 and emerged from stealth on March 5, 2024. ^[2] Its CEO and lead founder is Ljubisa Bajic, who in 2016 co-founded the AI hardware company Tenstorrent and served as its chief executive, and who earlier worked as a chip architect at AMD and its graphics predecessor ATI. ^[1]^[2] At Taalas he is joined by two co-founders who were also early engineering leaders at Tenstorrent and have backgrounds at AMD and NVIDIA: Lejla Bajic, who serves as chief operating officer, and Drago Ignjatovic, who serves as chief technology officer. ^[2]^[3]

The company's founding thesis is captured in a statement Bajic made at launch: "We should not be simulating intelligence on general purpose computers, but casting intelligence directly into silicon." ^[2] As of early 2026 Taalas was a small team of roughly 25 people and reported having filed 14 patents covering its approach. ^[3] It is headquartered in Toronto. ^[1]

Technology (hardcoded model silicon)

Taalas describes an automated design flow that converts a deep-learning model into a working chip. Rather than designing a new chip from scratch for each model, the company starts from a largely pre-fabricated base wafer, built in partnership with TSMC, and performs the final, model-specific customization on only a small number of layers. According to reporting, the chip has on the order of 100 layers, and Taalas customizes roughly two of the metal layers to encode a given model; the company says this allows TSMC to complete a model-specific chip in about two months, compared with roughly six months for a conventional processor design. ^[3]^[7]^[8]

To store the model's weights on-chip at high density, the design uses a fixed-function memory fabric (described in coverage as a "mask ROM recall fabric") in which weight values are encoded directly into the silicon, with compute and storage unified on the same die at close to DRAM-level density. ^[4]^[6] Because the weights are physically present where the computation happens, the chip does not require the external HBM stacks and surrounding support circuitry that GPUs depend on. To fit a model into a single chip, Taalas applies aggressive quantization: for the HC1, reporting describes a roughly 3-bit base format for most weights, with selected weights stored at higher precision such as 6 bits. ^[5]

The HC1 chip

The HC1 is Taalas's first product and is hardwired to run Meta's Llama 3.1 8B. ^[1]^[4] It is manufactured on TSMC's 6-nanometer (N6) process and is a large die, reported at about 815 square millimeters with roughly 53 billion transistors. ^[6] Taalas presented the HC1 publicly in February 2026 as a working technology demonstrator, including a live online chatbot, rather than as a generally available commercial product. ^[6]

Taalas makes strong, company-attributed performance claims for the HC1, which independent reviewers caution apply to a single small model under favorable, low-concurrency conditions and have not been independently benchmarked. ^[9] The headline claim is up to about 17,000 output tokens per second per user on Llama 3.1 8B, which the company states is roughly 73 times the throughput of an NVIDIA H200 GPU while drawing about one-tenth the power; demonstrations showed sustained real-world rates of roughly 14,000 to 16,000 tokens per second and end-to-end response latencies under 200 milliseconds. ^[1]^[4]^[5]^[6] Taalas also says the HC1 is about 10 times faster than a Cerebras wafer-scale system, costs roughly 20 times less to build than comparable offerings from NVIDIA, Groq, SambaNova, and Cerebras, and consumes around one-tenth the power, with a single HC1 board drawing on the order of 200 watts to a few kilowatts versus the tens to hundreds of kilowatts of a GPU rack. ^[3]^[6] These figures originate with the company.

The table below summarizes the figures Taalas has reported for the HC1; the comparisons are the company's own and should be treated as vendor claims.

Attribute	Reported value
Model hardwired	Meta Llama 3.1 8B
Foundry / process	TSMC, 6 nm (N6)
Die size / transistors	~815 mm2 / ~53 billion
Peak throughput	up to ~17,000 tokens/sec per user
Claimed vs NVIDIA H200	~73x throughput at ~1/10 the power
Response latency	under 200 ms
Quantization	~3-bit base, selected weights at 6-bit
Customization	~2 of ~100 layers; ~2-month TSMC turnaround
Status (early 2026)	technology demonstrator / beta

Roadmap

Taalas has outlined a roadmap that moves from small models toward frontier-scale systems. In the second quarter of 2026 (described as "this summer"), the company said it expected to bring a larger Llama model with roughly 20 billion parameters, and a mid-sized reasoning model, onto its existing HC1-generation silicon. ^[1]^[4]^[6] It also described a second-generation architecture, HC2, with higher density, aimed at running frontier-class large language models across a collection of HC cards, with deployments targeted to begin by the end of 2026; company statements have cited the ability to host a model of GPT-5.2 scale as the goal for that generation. ^[7]^[8]

Funding

Taalas has raised approximately $219 million in total. ^[3]^[4] The company first emerged from stealth on March 5, 2024 with $50 million raised across two early rounds, led by veteran semiconductor investor Pierre Lamond and the venture firm Quiet Capital. ^[2] On February 19, 2026 it announced a further $169 million, with investors including Quiet Capital, Fidelity, and Pierre Lamond. ^[1]^[4] As of early 2026 the company reported having spent only about $30 million of its capital and holding reserves of more than $170 million. ^[3]

The starting figures sometimes cited for Taalas, a roughly $169 million round and "$200 million-plus" total, are broadly correct but the most precise reported total is approximately $219 million, the sum of the $50 million seed-stage funding and the $169 million 2026 round. ^[3]^[4]

Competition and significance

Taalas occupies one end of a spectrum of companies trying to displace general-purpose GPUs for AI inference. The closest comparison is Etched, a startup whose Sohu chip similarly bakes the transformer architecture into an ASIC for high inference throughput; Etched hardwires the transformer architecture broadly, whereas Taalas hardwires one specific trained model, making its approach more specialized still. Other inference-focused challengers take less extreme positions: Groq and Cerebras build deterministic or wafer-scale architectures that remain programmable across many models, MatX is building a programmable, LLM-first accelerator (and raised a much larger round, reported at $500 million, for that approach), and companies such as SambaNova and d-Matrix pursue their own inference silicon. NVIDIA remains the dominant incumbent and the baseline against which all of these are measured. ^[3]^[5]^[9]^[10]

Commentators frame the contrast as a strategic bet: programmable accelerators like MatX preserve flexibility and avoid one-model-per-chip commitments, while Taalas wagers that for a small number of extremely high-volume models, fixing the model into silicon yields order-of-magnitude advantages in cost and energy that a programmable chip cannot match. ^[10] The significance of Taalas lies less in any single product than in testing how far the "model-in-silicon" idea can be pushed. If popular open models are served at enough scale to justify a dedicated chip, and the two-month customization cycle holds, the approach could reshape commodity AI inference; if models change too quickly or the performance claims do not survive independent scrutiny, the loss of flexibility may confine it to niche workloads. As of mid-2026 the HC1 stood as a demonstrated but not yet broadly commercialized proof of concept, and its more ambitious claims remained company-attributed and independently unverified. ^[6]^[9]

References

Robert Hof, "Taalas raises $169M in funding to develop model-specific AI chips," SiliconANGLE, February 19, 2026. https://siliconangle.com/2026/02/19/taalas-raises-169m-funding-develop-model-specific-ai-chips/ ↩
"Taalas emerges from stealth with $50 million in funding and a groundbreaking silicon AI technology," PR Newswire, March 5, 2024. https://www.prnewswire.com/news-releases/taalas-emerges-from-stealth-with-50-million-in-funding-and-a-groundbreaking-silicon-ai-technology-302079053.html ↩
"Chip designer Taalas bets on hard-wired AI chips," SDxCentral, March 2026. https://www.sdxcentral.com/news/chip-designer-taalas-bets-on-hard-wired-ai-chips/ ↩
"AI chip startup Taalas raises $169m, unveils HC1 processor optimized for Llama 3.1 8B," Data Center Dynamics, February 2026. https://www.datacenterdynamics.com/en/news/ai-chip-startup-taalas-raises-169m-unveils-hc1-processor-optimized-for-llama-31-8b/ ↩
"AI 101: The Inference Chip Wars - MatX, Taalas, and the Cracks in the GPU Era," Turing Post, 2026. https://www.turingpost.com/p/taalas ↩
"Taalas HC1 hardwired Llama-3.1 8B AI accelerator delivers up to 17,000 tokens/s," CNX Software, February 22, 2026. https://www.cnx-software.com/2026/02/22/taalas-hc1-hardwired-llama-3-1-8b-ai-accelerator-delivers-up-to-17000-tokens-s/ ↩
"Chip startup Taalas raises $169 million to help build AI chips to take on Nvidia," Reuters via Yahoo Finance, February 19, 2026. https://finance.yahoo.com/news/chip-startup-taalas-raises-169-160249219.html ↩
"Toronto Startup Taalas Raises $169 Million For Hard-Wired AI Chips," Techgolly, February 2026. https://techgolly.com/toronto-startup-taalas-raises-169-million-for-hard-wired-ai-chips ↩
"Taalas is replacing programmable GPUs with hardwired AI chips to achieve 17,000 tokens per second for ubiquitous inference," MarkTechPost, February 22, 2026. https://www.marktechpost.com/2026/02/22/taalas-is-replacing-programmable-gpus-with-hardwired-ai-chips-to-achieve-17000-tokens-per-second-for-ubiquitous-inference/ ↩
"Taalas raises $169m to tailor chips for AI models," Electronics Weekly, February 2026. https://www.electronicsweekly.com/news/business/taalas-raises-169m-to-tailor-chips-for-ai-models-2026-02/ ↩

Improve this article

Add missing citations, update stale details, or suggest a clearer explanation. Every suggestion is reviewed for sourcing before it goes live.

Suggest edit

What links here

Etched Sohu