Taalas
Last reviewed
Sources
10 citations
Review status
Source-backed
Revision
v1 · 1,795 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
Sources
10 citations
Review status
Source-backed
Revision
v1 · 1,795 words
Add missing citations, update stale details, or suggest a clearer explanation.
Taalas is a Toronto, Canada based AI chip startup that builds "hardwired" or model-specific silicon, in which an entire trained neural network is etched directly into a custom chip rather than executed as software on a general purpose processor. The company describes its approach as "casting intelligence directly into silicon": the weights and architecture of one specific model, such as Meta's Llama 3.1 8B, are baked into the hardware, eliminating most data movement and external memory and, the company claims, delivering very large gains in throughput, cost, and energy efficiency for that model. Taalas was founded in 2023 by Ljubisa Bajic, a co-founder and former chief executive of Tenstorrent and earlier an architect at AMD and ATI. [1][2][3] In February 2026 the company announced a $169 million funding round, bringing its total raised to roughly $219 million, and unveiled its first product, the HC1, a chip purpose-built to run Llama 3.1 8B. [1][4]
Taalas pursues what observers have called a "model-as-hardware" or "model-in-silicon" strategy, an extreme version of the model-specific application-specific integrated circuit (ASIC). Conventional AI accelerators, including NVIDIA GPUs, are programmable: they load a model's parameters from memory and stream them through compute units at inference time. Taalas instead fixes a single model permanently into the transistors of a chip, so that the parameters are physically part of the circuit and never have to be fetched from external high-bandwidth memory (HBM). The company argues that this removes the "memory wall," the bottleneck created by moving weights between memory and compute, which dominates the cost and power of modern AI inference. [3][5]
The central trade-off is flexibility. A Taalas chip can run only the one model it was manufactured for, so the approach is suited to high-volume, stable workloads where a popular model is served at scale, rather than to research or rapidly changing deployments. The company mitigates this somewhat: it says the HC1 retains a configurable context-window size and supports fine-tuning through low-rank adapters (LoRAs), so the underlying weights are fixed but limited customization remains possible. [5][6]
Taalas was founded in 2023 and emerged from stealth on March 5, 2024. [2] Its CEO and lead founder is Ljubisa Bajic, who in 2016 co-founded the AI hardware company Tenstorrent and served as its chief executive, and who earlier worked as a chip architect at AMD and its graphics predecessor ATI. [1][2] At Taalas he is joined by two co-founders who were also early engineering leaders at Tenstorrent and have backgrounds at AMD and NVIDIA: Lejla Bajic, who serves as chief operating officer, and Drago Ignjatovic, who serves as chief technology officer. [2][3]
The company's founding thesis is captured in a statement Bajic made at launch: "We should not be simulating intelligence on general purpose computers, but casting intelligence directly into silicon." [2] As of early 2026 Taalas was a small team of roughly 25 people and reported having filed 14 patents covering its approach. [3] It is headquartered in Toronto. [1]
Taalas describes an automated design flow that converts a deep-learning model into a working chip. Rather than designing a new chip from scratch for each model, the company starts from a largely pre-fabricated base wafer, built in partnership with TSMC, and performs the final, model-specific customization on only a small number of layers. According to reporting, the chip has on the order of 100 layers, and Taalas customizes roughly two of the metal layers to encode a given model; the company says this allows TSMC to complete a model-specific chip in about two months, compared with roughly six months for a conventional processor design. [3][7][8]
To store the model's weights on-chip at high density, the design uses a fixed-function memory fabric (described in coverage as a "mask ROM recall fabric") in which weight values are encoded directly into the silicon, with compute and storage unified on the same die at close to DRAM-level density. [4][6] Because the weights are physically present where the computation happens, the chip does not require the external HBM stacks and surrounding support circuitry that GPUs depend on. To fit a model into a single chip, Taalas applies aggressive quantization: for the HC1, reporting describes a roughly 3-bit base format for most weights, with selected weights stored at higher precision such as 6 bits. [5]
The HC1 is Taalas's first product and is hardwired to run Meta's Llama 3.1 8B. [1][4] It is manufactured on TSMC's 6-nanometer (N6) process and is a large die, reported at about 815 square millimeters with roughly 53 billion transistors. [6] Taalas presented the HC1 publicly in February 2026 as a working technology demonstrator, including a live online chatbot, rather than as a generally available commercial product. [6]
Taalas makes strong, company-attributed performance claims for the HC1, which independent reviewers caution apply to a single small model under favorable, low-concurrency conditions and have not been independently benchmarked. [9] The headline claim is up to about 17,000 output tokens per second per user on Llama 3.1 8B, which the company states is roughly 73 times the throughput of an NVIDIA H200 GPU while drawing about one-tenth the power; demonstrations showed sustained real-world rates of roughly 14,000 to 16,000 tokens per second and end-to-end response latencies under 200 milliseconds. [1][4][5][6] Taalas also says the HC1 is about 10 times faster than a Cerebras wafer-scale system, costs roughly 20 times less to build than comparable offerings from NVIDIA, Groq, SambaNova, and Cerebras, and consumes around one-tenth the power, with a single HC1 board drawing on the order of 200 watts to a few kilowatts versus the tens to hundreds of kilowatts of a GPU rack. [3][6] These figures originate with the company.
The table below summarizes the figures Taalas has reported for the HC1; the comparisons are the company's own and should be treated as vendor claims.
| Attribute | Reported value |
|---|---|
| Model hardwired | Meta Llama 3.1 8B |
| Foundry / process | TSMC, 6 nm (N6) |
| Die size / transistors | ~815 mm2 / ~53 billion |
| Peak throughput | up to ~17,000 tokens/sec per user |
| Claimed vs NVIDIA H200 | ~73x throughput at ~1/10 the power |
| Response latency | under 200 ms |
| Quantization | ~3-bit base, selected weights at 6-bit |
| Customization | ~2 of ~100 layers; ~2-month TSMC turnaround |
| Status (early 2026) | technology demonstrator / beta |
Taalas has outlined a roadmap that moves from small models toward frontier-scale systems. In the second quarter of 2026 (described as "this summer"), the company said it expected to bring a larger Llama model with roughly 20 billion parameters, and a mid-sized reasoning model, onto its existing HC1-generation silicon. [1][4][6] It also described a second-generation architecture, HC2, with higher density, aimed at running frontier-class large language models across a collection of HC cards, with deployments targeted to begin by the end of 2026; company statements have cited the ability to host a model of GPT-5.2 scale as the goal for that generation. [7][8]
Taalas has raised approximately $219 million in total. [3][4] The company first emerged from stealth on March 5, 2024 with $50 million raised across two early rounds, led by veteran semiconductor investor Pierre Lamond and the venture firm Quiet Capital. [2] On February 19, 2026 it announced a further $169 million, with investors including Quiet Capital, Fidelity, and Pierre Lamond. [1][4] As of early 2026 the company reported having spent only about $30 million of its capital and holding reserves of more than $170 million. [3]
The starting figures sometimes cited for Taalas, a roughly $169 million round and "$200 million-plus" total, are broadly correct but the most precise reported total is approximately $219 million, the sum of the $50 million seed-stage funding and the $169 million 2026 round. [3][4]
Taalas occupies one end of a spectrum of companies trying to displace general-purpose GPUs for AI inference. The closest comparison is Etched, a startup whose Sohu chip similarly bakes the transformer architecture into an ASIC for high inference throughput; Etched hardwires the transformer architecture broadly, whereas Taalas hardwires one specific trained model, making its approach more specialized still. Other inference-focused challengers take less extreme positions: Groq and Cerebras build deterministic or wafer-scale architectures that remain programmable across many models, MatX is building a programmable, LLM-first accelerator (and raised a much larger round, reported at $500 million, for that approach), and companies such as SambaNova and d-Matrix pursue their own inference silicon. NVIDIA remains the dominant incumbent and the baseline against which all of these are measured. [3][5][9][10]
Commentators frame the contrast as a strategic bet: programmable accelerators like MatX preserve flexibility and avoid one-model-per-chip commitments, while Taalas wagers that for a small number of extremely high-volume models, fixing the model into silicon yields order-of-magnitude advantages in cost and energy that a programmable chip cannot match. [10] The significance of Taalas lies less in any single product than in testing how far the "model-in-silicon" idea can be pushed. If popular open models are served at enough scale to justify a dedicated chip, and the two-month customization cycle holds, the approach could reshape commodity AI inference; if models change too quickly or the performance claims do not survive independent scrutiny, the loss of flexibility may confine it to niche workloads. As of mid-2026 the HC1 stood as a demonstrated but not yet broadly commercialized proof of concept, and its more ambitious claims remained company-attributed and independently unverified. [6][9]