Untether AI
Last reviewed
May 21, 2026
Sources
No citations yet
Review status
Needs citations
Revision
v1 ยท 4,848 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
May 21, 2026
Sources
No citations yet
Review status
Needs citations
Revision
v1 ยท 4,848 words
Add missing citations, update stale details, or suggest a clearer explanation.
Untether AI Corp. was a Toronto-based fabless semiconductor startup that designed energy-efficient inference accelerators based on a proprietary "at-memory compute" architecture, in which large arrays of small processing elements are placed directly inside on-chip SRAM banks to minimize the energy cost of moving weights between memory and arithmetic units.[^1][^2] Founded in 2018 by Martin Snelgrove, Darrick Wiebe, and Raymond Chik with seed funding led by Intel Capital, the company built two generations of inference silicon: the runAI200 (TSMC 16 nm, INT8, shipping inside the tsunAImi tsn200 PCIe card) and the speedAI240 (TSMC 7 nm, FP8, codenamed "Boqueria"), the latter delivering roughly 2 PetaFLOPS of FP8 throughput and approximately 30 TFLOPS per watt at sample stage.[^3][^4] Untether AI raised approximately US$152 million in cumulative equity funding from investors including Intel Capital, GM Ventures, Tracker Capital Management, Canada Pension Plan Investment Board (CPPIB), and Radical Ventures.[^5][^6] In June 2025 the company shut down its hardware business: AMD's competitor AMD hired the engineering team and licensed selected intellectual property in an acquihire-style strategic transaction, while the speedAI processors and the imAIgine software development kit (SDK) were discontinued and Untether's separate corporate entity wound down sales and support.[^7][^8][^9] The shutdown was widely covered as one of the more visible failures of the 2018-2022 generation of AI-inference silicon startups, and the deal followed a similar pattern AMD had used a week earlier with Brium and Enosemi.[^8][^10]
| Field | Value |
|---|---|
| Company | Untether AI Corp. |
| Headquarters | Toronto, Ontario, Canada |
| Founded | 2018 |
| Founders | Martin Snelgrove (CTO), Darrick Wiebe, Raymond Chik |
| CEOs | Arun Iyengar (2019-2024); Chris Walker (Jan 2024-May 2025) |
| Total funding | approximately US$152 million across five rounds[^5][^6] |
| Key investors | Intel Capital, GM Ventures, Tracker Capital Management, CPP Investments, Radical Ventures[^5][^6] |
| Generation 1 chip | runAI200 (TSMC 16 nm, 502 INT8 TOPS, 200+ MB SRAM)[^3][^11] |
| Generation 1 card | tsunAImi tsn200 (4 runAI200, 2 PetaOPS INT8)[^11][^12] |
| Generation 2 chip | speedAI240 / "Boqueria" (TSMC 7 nm, 2 PFLOPS FP8, 238 MB SRAM, 1,458 RISC-V cores)[^2][^4] |
| Generation 2 card | speedAI240 Slim (October 2024 broad availability)[^13] |
| Software | imAIgine SDK (compiler + simulator + runtime)[^14][^15] |
| Shutdown | June 5, 2025: engineering team acquihired by AMD; products discontinued[^7][^8][^9] |
Untether AI was founded in early 2018 in Toronto by Martin Snelgrove, a long-time analog and digital circuits researcher and former professor at Carleton University and the University of Toronto; Darrick Wiebe, a software architect; and Raymond Chik, an analog and mixed-signal circuit designer.[^16][^17] The trio aimed to build a domain-specific processor for neural network inference whose key constraint was energy spent on data movement rather than peak arithmetic throughput. Snelgrove later articulated the motivation publicly at Hot Chips 2022: in a typical graphics processing unit or general-purpose accelerator, the dominant share of energy in convolutional and transformer inference workloads is consumed not in multiplying weights and activations but in moving those weights from off-chip datacenter-scale DRAM into on-chip caches and registers, with later analyses citing roughly 90% of total energy spent on data movement.[^4][^16]
The company emerged from stealth in April 2019 with an announcement that it had raised approximately C$13 million in seed financing led by Intel Capital, with participation from GM Ventures, Radical Ventures and angel investors.[^17][^18] In a separate add-on later in 2019, Radical Ventures and Intel Capital added a further US$7 million, lifting Untether's Series A to approximately US$20 million; the Series A round was officially announced on November 5, 2019, together with the appointment of Arun Iyengar (a former Altera, AMD, and Xilinx executive) as Untether's first CEO, with Snelgrove transitioning into the CTO role.[^16][^18][^19] At the time of the Series A announcement Untether described itself as having built a working prototype of its at-memory architecture in under five months following an initial seed investment.[^16][^18] The early board added Tomi Poutanen of Radical Ventures.[^16][^18]
Untether AI's second-generation funding milestone arrived on July 20, 2021, when the company announced an oversubscribed US$125 million Series B led by Tracker Capital Management and Intel Capital, with new investor CPP Investments and prior investor Radical Ventures participating. Untether had originally targeted US$100 million but accepted a 25% oversubscription. Dr. Shaygan Kheradpir of Tracker Capital joined the board as part of the round.[^5][^20] The Series B brought cumulative announced funding into the same range as several other independent inference-silicon startups of the period, including Tenstorrent, SambaNova Systems, and Groq.
In September 2023, Untether announced that former Intel corporate vice president and general manager Chris Walker had joined as president; he had previously led Intel's Mobile Client Platforms organization through Project Athena and the Intel Evo platform.[^21][^22] In January 2024, the company elevated Walker to CEO, while Arun Iyengar transitioned out of the role after roughly four years. Dr. Amir Salek, a former head of silicon for Google Technical Infrastructure and Google Cloud, joined the board.[^21][^22] EE Times described the leadership turnover as occurring at "a transition point" in the company's growth from chip development to commercial deployment.[^22]
Untether AI's central architectural claim was that placing weight storage adjacent to compute elements inside SRAM cells (rather than streaming weights from external DRAM or from large shared on-chip caches) could reduce the energy cost of inference by an order of magnitude versus conventional graphics processing units and GPU-style accelerators.[^2][^16] The company labelled this approach "at-memory compute" to distinguish it from "near-memory" designs (which keep memory physically close to compute but still move operands), and from "in-memory compute" approaches that perform analog multiply-accumulate operations inside memory arrays themselves, an approach pursued by competitors such as Mythic and EnCharge AI.[^2][^4] By contrast, Untether stored weights in standard digital SRAM cells and placed digital arithmetic units immediately beside each bank of SRAM so that weights never needed to traverse a long bus to reach the multipliers.[^2][^4]
The core building block of Untether's architecture is a "memory bank." In the first-generation runAI200 design, each bank contained approximately 385 kilobytes of SRAM organized as a two-dimensional array of 512 small processing elements (PEs), each holding its own portion of weights and operating in a single-instruction multiple-data (SIMD) fashion under the control of a per-bank RISC-V processor. Each runAI200 chip integrated 511 such banks, yielding roughly 200 megabytes of total on-die SRAM and approximately 261,000 processing elements per chip.[^2][^11][^23] The processing elements were optimized for low-precision integer (INT8) multiply-accumulate operations characteristic of neural network inference, and the per-bank RISC core handled control flow, layer fusion, and inter-bank movement.
For the second-generation speedAI240 "Boqueria" chip presented at Hot Chips 2022 on a TSMC 7 nm process, Untether retained the bank concept but doubled the per-bank RISC-V control density to two cores per bank, both clocked at approximately 1.35 GHz, yielding 1,458 RISC-V cores in total across the die. The chip carried 238 megabytes of SRAM at roughly 1 petabyte per second of aggregate on-chip memory bandwidth, four 1 MB scratchpads, and two 64-bit-wide LPDDR5 ports for up to 32 gigabytes of external DRAM. The die measured approximately 35 mm by 35 mm and was hosted on a PCIe Gen5 interface for connectivity to a CPU host and for chip-to-chip communication.[^2][^4][^24]
While runAI200 targeted INT8 inference, the speedAI240 generation added native support for low-precision floating-point types. Untether described two FP8 variants: FP8p ("precision"), a four-mantissa, three-exponent variant tuned for accuracy, and FP8r ("range"), a three-mantissa, four-exponent variant tuned for the dynamic range required by attention layers in transformer networks. Untether reported that across a representative inference workload mix, FP8 mode yielded less than one-tenth of one percent of accuracy loss versus BF16, while delivering roughly a four-fold improvement in throughput and energy efficiency over BF16.[^4][^24] The chip also retained BF16 capability for layers that required higher dynamic range, with reported peak performance of 2 PetaFLOPS FP8 and 1 PetaFLOPS BF16.[^4][^24]
A distinguishing feature of Untether's design relative to GPU-style accelerators was strong performance at batch size 1, the regime characteristic of latency-sensitive inference such as real-time video analytics, autonomous driving perception stacks, and chatbot-style transformer serving.[^16][^25] Because each PE held its own slice of weights and operated independently, a batch-1 workload could keep the array well-utilized without the wide batching needed to amortize weight-fetch costs on a Nvidia A100 or H100. Untether published BERT-base inference performance figures of approximately 750 queries per second per watt for the speedAI240, which it characterized as roughly fifteen times the energy efficiency of a then-current state-of-the-art GPU on the same workload.[^2][^4]
Untether's at-memory approach occupied a distinct point in the design space versus several other inference-silicon startups of the same era. Cerebras Systems pursued wafer-scale integration with the WSE-3, placing 900,000 cores and 44 GB of SRAM on a single wafer-sized die for training and inference of very large models. Groq built a deterministic, software-scheduled tensor streaming processor (the Groq LPU) targeted at low-latency LLM serving with on-chip SRAM rather than HBM. Tenstorrent adopted an array of RISC-V-controlled tensor cores connected via a flexible network on chip, with separate Wormhole and Blackhole generations targeting both training and inference. Mythic pursued analog in-memory compute using flash cells; EnCharge AI pursued capacitor-based analog matrix multiplication in SRAM. Untether's at-memory architecture was entirely digital, which Snelgrove argued offered superior numerical predictability and process portability versus the analog approaches, at some cost in theoretical density.[^2][^4][^16]
The runAI200 was Untether's first commercial silicon, fabricated on a TSMC 16-nanometer process. Each chip delivered up to 502 INT8 tera-operations per second and approximately 8 TOPS per watt of typical operating efficiency. The architecture employed 511 banks of approximately 385 KB SRAM each, integrating into roughly 200 megabytes of on-die SRAM.[^3][^11][^23]
In March 2021 Untether disclosed the tsunAImi accelerator card concept publicly: a single PCIe card carrying four runAI200 chips, delivering approximately 2 PetaOPS of aggregate INT8 inference throughput.[^11][^23] The shipping version of the card, branded "tsunAImi tsn200," was announced on September 12, 2023 in a low-profile PCIe form factor: a single-chip card targeting 500 INT8 TOPS at approximately 40 watts for real-time video analytics and edge inference workloads. Untether emphasized its compute density per watt and its small physical footprint, suitable for cost-constrained edge deployments rather than only datacenter racks.[^12][^25]
The speedAI240 chip (internal codename "Boqueria") was unveiled at Hot Chips 34 in August 2022 as Untether's second-generation at-memory compute device.[^4][^24] Fabricated on TSMC 7 nm, the chip targeted approximately 2 PetaFLOPS of FP8 peak performance and approximately 30 TFLOPS per watt, with 238 MB of SRAM, 1,458 RISC-V cores, and 32 GB of LPDDR5 external memory.[^2][^4]
Untether announced that sampling of speedAI240 devices to early-access customers would begin in the first half of 2023.[^4][^24] In June 2023 the IEEE Solid-State Circuits Society's Journal published a peer-reviewed paper, "speedAI240: A 2-Petaflop, 30-Teraflops/W At-Memory Inference Acceleration Device With 1456 RISC-V Cores," describing the second-generation chip.[^26] (Untether's marketing materials and the IEEE paper differ by two on the RISC-V core count, with Untether's press materials citing 1,458 cores and a separate technical paper citing 1,456.)
The speedAI family was packaged as PCIe accelerator cards in two SKUs that Untether began branding for distinct power and performance envelopes: a "Preview" card running the speedAI240 chip at higher power for datacenter deployments, and the "speedAI240 Slim" card targeting a constrained power envelope (typically described as 75-watt-class PCIe form factor) for edge AI and embedded server deployments.[^13][^27] Untether announced broad availability of the speedAI240 Slim in October 2024.[^13]
The imAIgine SDK was Untether's developer-facing software stack. It accepted neural network models exported from PyTorch or TensorFlow (and, through external paths, ONNX), and performed a sequence of compiler passes including model quantization, graph lowering, layer fusion, kernel mapping, and physical allocation across the chip's banks. It included a cycle-accurate simulator, a kernel-level compiler, a code profiler, and a runtime API. Untether emphasized multi-chip partitioning so that networks too large for a single chip's 200-to-238 MB SRAM could be split across multiple speedAI or runAI devices on a card.[^14][^15]
In March 2025 the company announced what it called a "generative compiler" feature inside the imAIgine SDK, claiming roughly four-fold expansion in the number of supported neural network models. The generative compiler automatically synthesized new low-level kernels for layer types that did not match existing hand-written kernels, reducing developer time-to-deployment from days to minutes for new networks.[^28][^15] (Following the AMD transaction in June 2025, Untether discontinued support for the imAIgine SDK; the IP was reported to be among the assets transferred to AMD in the strategic agreement.[^7][^8])
In August 2024 Untether AI submitted results to the MLPerf Inference v4.1 benchmark suite, the principal industry-standard third-party benchmark for inference systems coordinated by MLCommons. The MLCommons release on August 28, 2024 included 964 verified performance results from 22 submitting organizations including Nvidia, AMD, Intel, Google Cloud, Cisco Systems, and Dell Technologies. Untether submitted speedAI240 Slim results on the ResNet-50 v1.5 image classification workload in both the Datacenter and Edge categories; in the Edge category Untether reported the highest performance per accelerator, lowest latency, and best energy efficiency among submitted single-accelerator results on that workload.[^29][^13][^30] Untether subsequently used MLPerf results as one of its primary marketing artifacts in its 2024-2025 commercial push toward edge AI customers.[^13]
It is worth noting that Untether did not submit results on the generative AI workloads in MLPerf Inference v4.1 (Llama-2 70B, Mixtral 8x7B, Stable Diffusion XL), reflecting both the speedAI240's relatively limited per-chip memory (238 MB SRAM plus 32 GB LPDDR5) and its lack of high-bandwidth memory or high-bandwidth chip-to-chip interconnect that would be needed to host large language model weights spanning multiple devices.[^29][^31]
Untether's commercial strategy after the Series B emphasized edge AI and embedded inference rather than direct competition with Nvidia's datacenter GPUs for LLM training and serving. The two most public partnerships were with General Motors and Arm Holdings.
On April 28, 2022, Untether and General Motors jointly announced a collaboration to develop next-generation perception systems for autonomous vehicles based on Untether's at-memory computation. The project was supported by approximately C$1 million from the Ontario Vehicle Innovation Network (OVIN) R&D Partnership Fund, with an additional C$2.09 million of in-kind industry contribution. GM Ventures had been a strategic investor since 2018, and under the collaboration GM agreed to share neural network models from its existing AV perception stack so Untether could demonstrate equivalent or better inference performance at lower power.[^32][^33]
In 2023, Untether announced a separate collaboration with Arm Holdings aimed at integrating Untether's accelerators with Arm-based automotive system-on-chip platforms for advanced driver assistance systems (ADAS) and smart vehicle workloads. The collaboration positioned the speedAI family as a candidate AI accelerator for automotive customers building Arm-based ECUs, although no production design wins from the partnership were publicly disclosed before the company's wind-down.[^34]
Beyond automotive, Untether marketed the runAI200 / tsunAImi family to real-time video analytics customers (smart city camera deployments, retail analytics, industrial machine vision), and marketed the speedAI240 family to a mix of datacenter inference and embedded server applications including financial services, telecommunications edge, and defense. The "Tracker Capital" investment relationship was widely interpreted in the trade press as opening defense-adjacent customers, given Tracker Capital's stated focus on national security technology, but no specific defense customers were publicly disclosed.[^5][^20]
| Round | Date | Amount | Lead investor(s) | Other participants |
|---|---|---|---|---|
| Seed | April 2019 | approx C$13 million | Intel Capital | GM Ventures, Radical Ventures, angels[^17] |
| Series A | Nov 5, 2019 | US$20 million (cumulative) | Intel Capital, Radical Ventures | Existing investors[^18][^19] |
| Series A add-on | 2020 | C$9.4 million | Radical Ventures, Intel Capital | (extension)[^35] |
| Series B | July 20, 2021 | US$125 million (oversubscribed) | Tracker Capital Management, Intel Capital | CPP Investments (new), Radical Ventures[^5][^20] |
| Cumulative | Through 2024 | approximately US$152 million | (Five rounds, four lead/strategic investors)[^6] |
By the time the company entered the AMD transaction in June 2025, no further public funding round had been announced after the 2021 Series B, and trade-press reporting cited the company's inability to raise additional funding earlier in 2025 as a proximate cause of the wind-down.[^7][^8]
On June 5, 2025, AMD and Untether AI separately announced a "strategic agreement" under which AMD acquired Untether's engineering team and selected intellectual property, but did not acquire Untether's corporate entity, its product lines, or its commercial obligations. Untether simultaneously announced that it would discontinue sales and support of the speedAI processor family and the imAIgine SDK.[^7][^8][^9]
Reporting by Reuters, EE Times, SiliconANGLE, HPCwire, BetaKit, Tom's Hardware, and Data Center Dynamics consistently characterized the transaction as an acquihire of "an unknown number" of Untether's roughly 145 employees (per LinkedIn at the time of the announcement), focused on AI compiler and kernel engineers, RISC-V-class SoC designers, design-verification engineers, and product-integration staff. AMD's press statement said the transaction "brings a world-class team of engineers to AMD, focused on advancing the company's AI compiler and kernel development capabilities as well as enhancing our digital and SoC design, design verification, and product integration capabilities."[^7][^8][^9][^10][^36]
Several specific details about the deal that emerged from contemporaneous reporting:
The Untether shutdown drew commentary from industry analysts characterizing it as one of the first high-profile failures in the 2018-2022 cohort of inference-silicon startups: companies that had launched with substantial venture capital and credible technical teams but had been overtaken by Nvidia's pace on the transformer generation and by the abrupt shift in customer demand toward large language model training and serving rather than the convolutional vision workloads many startups had originally targeted.[^31][^7]
Several limitations of Untether's architecture and commercial strategy were noted both in contemporaneous reviews and in retrospective analyses after the June 2025 shutdown:
Industry analysts and at least one technical blog post noted that the speedAI240's memory architecture, while ideal for vision inference and batch-1 BERT-class transformer inference, was poorly suited for large language model training and serving. Specifically, the chip had approximately 238 MB of on-chip SRAM and 32 GB of LPDDR5 external memory but lacked HBM, had constrained chip-to-chip bandwidth versus contemporary GPUs with NVLink-class interconnects, and offered only PCIe Gen5 connectivity for multi-chip scale-out.[^31] Hosting a 70-billion-parameter LLM at typical 8-bit precision required roughly 70 GB of weight memory plus KV-cache, well beyond what a single speedAI240 card could support, and the lack of high-bandwidth chip-to-chip links made the multi-card scaling required for serving large transformers slow relative to NVLink or Infinity Fabric-class fabrics on Nvidia H100 and AMD MI300X systems.[^31]
Untether had been founded in 2018, several years before the November 2022 release of ChatGPT and the subsequent industry-wide shift in inference-silicon demand toward LLM serving. The company's original product hypothesis (vision inference and edge AI for autonomous vehicles, smart cities, and industrial automation) was articulated when the prevailing inference workloads were ResNet-class CNNs and BERT-class encoders rather than autoregressive decoder transformers. As a retrospective analysis published after the shutdown observed, the company's roadmap and its memory architecture were "frozen" in a pre-LLM era at exactly the moment customer demand pivoted.[^31]
Building two generations of leading-edge silicon (TSMC 16 nm and TSMC 7 nm) plus a custom compiler stack consumed Untether's roughly US$152 million in funding without a path to substantial commercial revenue by 2024-2025. Trade-press accounts reported that Untether sought additional funding in early 2025 and was unable to close a round under the rapidly tightening AI inference startup environment, in which Nvidia's market share and AMD's MI300X ramp had compressed valuations and customer interest in unproven silicon alternatives.[^7][^8]
Untether's most public customer-facing relationships (General Motors for automotive perception, Arm for ADAS reference platforms) were development collaborations rather than committed production design wins, with no public disclosure of bill-of-materials inclusion or production volumes. Without a anchor production customer to absorb at-scale silicon volume, the company faced a recurring difficulty common to inference-silicon startups in the 2018-2022 cohort.[^31][^9]
Untether operated in an unusually crowded competitive landscape during 2018-2025. Its closest peers, by architecture or customer overlap, included:
| Competitor | Approach | Status (mid-2025) |
|---|---|---|
| Cerebras Systems | Wafer-scale digital (WSE-3) | Independent; pivoted to inference cloud and IPO track |
| Groq | Software-scheduled deterministic tensor streaming (LPU) | Independent; LLM inference cloud focus |
| Tenstorrent | RISC-V-controlled tensor cores (Wormhole/Blackhole) | Independent; raised significant funding 2024-2025 |
| SambaNova Systems | Reconfigurable dataflow units (RDU; SN40L) | Independent; enterprise model serving |
| Mythic | Analog in-memory compute (flash) | Multiple restructurings; reduced footprint by 2025 |
| EnCharge AI | Analog in-memory compute (SRAM, capacitor-based) | Independent; defense and edge focus |
| Untether AI | Digital at-memory compute (SRAM-adjacent PEs) | Hardware business wound down June 2025; team to AMD |
In retrospective coverage, several analysts characterized 2024-2025 as the beginning of consolidation in the inference-silicon market, with Untether's June 2025 wind-down as one of the first prominent exits among VC-backed startup peers. AMD's strategy of acquiring engineering talent and selective IP from declining peers (Brium, Enosemi, Untether) within a single eight-day window in late May and early June 2025 was widely interpreted as a deliberate acquihire campaign to close engineering gaps in AMD's AI inference stack versus Nvidia.[^8][^10]
Despite its wind-down, Untether AI's seven-year run left several artifacts of technical significance for the broader history of AI accelerator design: