Etched Sohu

AI Hardware AI Inference

22 min read

Updated Jun 27, 2026

Suggest edit History Talk

RawGraph

Last edited

Jun 27, 2026

Fact-checked

In review queue

Sources

24 citations

Revision

v3 · 4,324 words

Fact-checks are independent of edits: a reviewer re-verifies the article against its sources and stamps the date. How we verify

Etched Sohu
General information
Manufacturer	Etched
Country of origin	United States
Announced	June 25, 2024
Status	Pre-production (not yet shipping as of May 2026)
Architecture	Transformer-only ASIC
Process node	TSMC 4nm (N4 family)
Memory	144 GB HBM3E
Memory bandwidth	Approximately 4,800 GB/s (claimed)
Server configuration	8x Sohu chips per node
Claimed throughput	500,000+ tokens/sec on Llama-3 70B (8-chip server)
Performance claim vs H100	One 8xSohu server replaces 160 H100s (~20x faster on transformer inference, Etched estimate)
Software stack	Open-source compiler, drivers, kernels, serving stack
Website	etched.com

Sohu is a transformer-specialized application-specific integrated circuit (ASIC) built by Etched, a Silicon Valley AI hardware startup founded in 2022 by Harvard dropouts Gavin Uberti, Chris Zhu, and Robert Wachen.^[1]^[2] Announced on June 25, 2024, Sohu is the first commercially marketed chip designed to run only one neural network architecture, the transformer, and Etched claims that a single server with eight Sohu chips delivers more than 500,000 tokens per second on Llama-3 70B and "replaces 160 H100s."^[1]^[24] Because it hardwires the transformer into silicon instead of offering general-purpose programmable compute, Etched estimates Sohu runs transformer inference roughly 20 times faster than an NVIDIA H100 GPU while using significantly less energy, though as of mid-2026 these figures are vendor claims that no independent third party has verified and the chip has not shipped in volume.^[1]^[4]^[5]^[11]^[19]

The chip is fabricated on TSMC 4 nanometer process technology and pairs the compute die with 144 GB of HBM3E memory.^[1]^[4] According to Etched's published claims, one server fitted with eight Sohu chips can sustain more than 500,000 Llama 70B tokens per second, compared with roughly 23,000 tokens per second for an eight-GPU H100 server and approximately 45,000 tokens per second for an eight-GPU Blackwell B200 server.^[24] Etched argues that this performance comes from achieving over 90 percent floating-point unit utilization, against approximately 30 percent for general-purpose GPUs running attention-heavy workloads.^[1]^[5]

Despite the headline numbers, as of May 2026 Sohu has not shipped in volume to external customers, and no independent third-party benchmarks have been published.^[11]^[19] A Manifold prediction market on whether Sohu would ship to customers within a year of announcement resolved "no" on July 2, 2025.^[19] The company has, however, attracted significant attention and capital. Etched raised a $120 million Series A in June 2024 led by Primary Venture Partners and Positive Sum Ventures with participation from Peter Thiel, Github CEO Thomas Dohmke, and former Coinbase CTO Balaji Srinivasan.^[2]^[3] In January 2026 it closed a $500 million round led by Stripes with participation from Peter Thiel, Positive Sum, and Ribbit Capital, valuing the company at approximately $5 billion and bringing total funding close to $1 billion.^[11]^[12]^[13]^[14]

What is Sohu?

Sohu is a piece of silicon that does exactly one thing: run the forward pass of transformer neural networks for inference. A conventional GPU is a programmable parallel processor that can execute any computation a compiler can map onto it, which is why the same NVIDIA chip can train a large language model, render graphics, or run a physics simulation. Sohu gives up all of that generality. It cannot run convolutional networks, classical recommender models, state space models, or non-transformer recurrent networks.^[4] In exchange, Etched claims it can devote almost its entire transistor budget to the specific arithmetic and memory-movement patterns of transformer attention and feed-forward layers, which the company says is what enables the headline throughput. The bet is that the transformer, introduced in Google's 2017 paper "Attention Is All You Need," is so dominant across large language models, diffusion models, and multimodal generation that a chip running nothing else has a large enough market to be worth building.^[1]

Who founded Etched and why?

Etched was founded in 2022 by Gavin Uberti, Chris Zhu, and Robert Wachen, three undergraduates who left Harvard University to build a dedicated transformer accelerator.^[2]^[15] Uberti, the company's chief executive officer, had previously worked at OctoML and Xnor.ai on inference optimization.^[2] Chris Zhu holds degrees in mathematics and computer science from Harvard and serves as the company's chief technology officer.^[8] Robert Wachen, prior to Etched, had co-founded a startup accelerator. The founders are alumni of the Thiel Fellowship and built early prototypes of Sohu before raising institutional capital.^[15]

Etched's design and verification efforts are led in part by Mark Ross, a former chief technology officer of Cypress Semiconductor, with additional engineering hires drawn from Broadcom, Apple, and NVIDIA.^[2] In June 2024, the company had approximately 35 employees.^[2]

The core thesis behind Sohu is that the transformer architecture, introduced by Google researchers in the 2017 paper "Attention Is All You Need," has become so dominant for large language models, diffusion models, and increasingly for vision and video generation, that there is enough scale to justify a chip that runs nothing else.^[1] Uberti framed the bet bluntly in an interview at the time of the Series A: "We're making the biggest bet in AI. If transformers go away, we'll die. But if they stick around, we're the biggest company of all time."^[8] On the day of the announcement Uberti put the same wager on X: "We're taking the biggest bet in AI, a chip that can only run transformers, but does so orders of magnitude faster than GPUs. Maybe attention is all you need."^[1] Etched's argument is that general-purpose accelerators waste most of their transistor budget on flexibility the market no longer needs, since modern frontier AI workloads are almost entirely transformer-based.^[1]

Etched is headquartered in Cupertino and San Jose, California.^[11] The company partners with TSMC's Emerging Businesses Group for fabrication and with Rambus on HBM controller IP, which Etched has credited with reducing the complexity of integrating a high-bandwidth memory subsystem.^[11]^[15] Sohu was unveiled on June 25, 2024, alongside Etched's $120 million Series A, with the marketing line "Meet Sohu, the fastest AI chip of all time."^[24] The announcement drew coverage from TechCrunch, CNBC, Tom's Hardware, The Register, and The Wall Street Journal.^[2]^[3]^[4]

How does a transformer ASIC work?

A conventional GPU such as the H100 or B200 is a programmable parallel processor with thousands of generic compute units (CUDA cores, tensor cores) plus a fixed-function memory hierarchy. The compiler maps any neural network onto those units at runtime. By contrast, Sohu's compute fabric is laid out as a pipeline of dedicated transformer blocks.^[1]^[5] Each block contains hardware tailored to one stage of the transformer forward pass:

Token embedding and positional encoding
Multi-head attention with QKV projection, scaled dot product, and output projection
Softmax over attention scores
KV-cache read and write paths
Feed-forward network with GELU or SiLU activation
Layer normalization or RMSNorm
Residual connections

Because these stages are fixed in hardware, Etched does not need to emit instructions to schedule them or pay for the area of programmable issue logic. The control path is dramatically simpler than that of a GPU, and the same transistor budget can host far more useful arithmetic units. Etched claims that this is what allows Sohu to reach more than 90 percent floating-point unit utilization on transformer workloads, against roughly 30 percent for GPUs where unit utilization is gated by memory bandwidth, kernel launch overheads, and attention sparsity.^[1]^[5]

A consequence Etched emphasizes is the chip's tolerance for very large inference batches. Where typical GPU serving paths see performance degrade beyond batch sizes of roughly 32 to 64, Etched claims Sohu can run "batch sizes in the thousands without any performance degradation," which is intended as a direct enabler of mixture of experts and speculative decoding workloads.^[4]^[5]

Precision and physical layout

Sohu's compute path is optimized for FP8.^[1]^[5] The headline 500,000 tokens per second figure is reported for Llama-3 70B running in FP8 with 2,048 input tokens and 128 output tokens.^[5]^[24] The chip also supports INT8 for quantized models. Etched has not published a peak TFLOPS number, instead reporting tokens-per-second on specific reference models, which has drawn criticism from analysts who prefer architecture-neutral metrics.^[5]^[7]

Sohu is described as a reticle-limit die fabricated on a 4 nanometer process node from TSMC.^[1]^[5] The die is connected via interposer to six HBM3E stacks totaling 144 GB of capacity, providing roughly 0.75x the capacity and bandwidth of an NVIDIA B200 and approximately 1.8x the capacity of an H100.^[5]^[24]

Specification	Value
Process node	TSMC 4nm (N4 family)
Die size	Reticle limit (~800 mm-squared)
On-package memory	144 GB HBM3E (six stacks)
Memory bandwidth	~4,800 GB/s (claimed)
Primary numeric format	FP8 (also INT8)
Claimed FLOPS utilization	>90% on transformer inference
Server configuration	8x Sohu per node
Headline benchmark	500,000+ tokens/sec on Llama-3 70B (8x Sohu)

The 144 GB memory budget is enough to fit a 70 billion parameter model at FP8 weights (about 70 GB) with substantial room for the KV cache and activations, which is essential for serving long context windows.^[5] An 8-chip server can hold a 400-billion to 600-billion parameter model with tensor parallelism, making Sohu well suited to the mixture of experts variants now dominant in frontier deployments.^[5]

What software does Sohu run?

Because Sohu only runs transformers, Etched's software is much narrower than that of a general accelerator. There is no equivalent of CUDA and no general kernel programming model.^[1] Instead, the stack consists of:

A transformer compiler that takes a model graph from PyTorch, Hugging Face Transformers, or ONNX and emits a fixed-pipeline configuration for Sohu's blocks.
Drivers and runtime that manage memory placement, KV-cache lifecycle, and request batching.
Optimized kernels for the fixed stages of attention, feed forward, normalization, and embedding.
A serving stack that supports continuous batching, speculative decoding, parallel reasoning, and mixture of experts routing.
Frontend bindings for popular inference engines, including drop-in adapters for serving frameworks such as vLLM, TensorRT-LLM, and Hugging Face's text generation inference.

Etched has emphasized that the stack is open source, which the company hopes will accelerate adoption by allowing frontier labs to audit and modify the compiler.^[1] The stack natively handles modern transformer variants, including grouped-query attention, multi-query attention, sliding window attention, rotary position embeddings, parallel attention and feed-forward layers, and mixture-of-experts routing.^[1] Etched has stated that future architectures that remain within the transformer family will be supported via firmware and compiler updates, but that a successor chip will be required for any architecturally distinct successor to the transformer, such as a pure state space model or recurrent style network.^[5]

In parallel with chip development, the company announced the Sohu Developer Cloud, an online preview environment intended to let prospective customers run transformer workloads against an emulated Sohu pipeline ahead of silicon availability.^[2]

How fast is Sohu?

Etched's headline performance figure is 500,000 tokens per second for Llama-3 70B running on a single 8x Sohu server in FP8 precision, with 2,048 input tokens and 128 output tokens per request.^[5]^[24] The same benchmark on equivalent eight-GPU servers yields, by Etched's measurements, roughly:

Platform	Llama-3 70B (FP8) tokens/sec	Ratio to Sohu
8x Sohu (server)	~500,000	1.0x
8x NVIDIA B200 (Blackwell)	~45,000	~0.09x
8x NVIDIA H100 (Hopper)	~23,000	~0.046x
8x NVIDIA A100	~9,000	~0.018x

Sources: Etched announcement, June 25, 2024.^[1]^[24]

Etched concludes that a single 8x Sohu server is equivalent to about 160 H100 GPUs on transformer inference.^[1]^[24] In its launch post the company stated plainly: "With over 500,000 tokens per second running Llama 70B, Sohu lets you build products that are impossible on GPUs. One 8xSohu server replaces 160 H100s."^[24] Uberti has described Sohu as "an order of magnitude faster and cheaper than even Nvidia's next generation of Blackwell GB200 GPUs."^[2] Critics have raised several concerns: the H100 numbers are for unoptimized stock paths while Sohu uses its own optimized stack; the benchmark uses input and output lengths favorable to high prefill throughput; no third party has measured Sohu in physical form; and Etched has not released power numbers.^[5]^[7] LessWrong analysts have noted that the chip's claimed memory bandwidth, while larger than the H100's, falls behind the H200, MI300X, and B200, which complicates the "20x" framing because decode-phase performance is typically bandwidth-bound.^[5] Independent commentators have nonetheless observed that even if Sohu delivers half the claimed throughput, it would still represent a meaningful architectural advance, since the 90 percent FLOPS utilization figure is consistent with what a fixed transformer pipeline could in principle achieve.^[5]

At announcement, Etched disclosed that the chip existed only as an FPGA emulation, with first silicon "less than two years away from tapeout."^[4] That timeline, taken literally, places production silicon in 2026.

How does Sohu compare with other AI accelerators?

Sohu enters a crowded field of AI accelerator startups, each of which has made a different bet about the right architectural specialization point.

Chip / system	Vendor	Strategy	Specialization
H100, B200, GB200	NVIDIA	General-purpose GPU plus tensor cores	Wide: training and inference, all model classes
TPU v5p / v6	Google	Systolic-array matrix engine	Training and inference, internal use plus cloud
Trainium 2 / Inferentia 2	AWS	Custom AI ASIC	Training and inference inside AWS
MTIA	Meta	Custom inference ASIC	Internal recommendation and language model inference
Groq LPU	Groq	Deterministic streaming compute with on-chip SRAM	Ultra-low-latency LLM inference
Cerebras WSE-3	Cerebras	Wafer-scale, in-memory compute	Training and inference for very large models
SambaNova SN40L	SambaNova	Reconfigurable dataflow architecture	Training and inference for foundation models
Taalas hardcoded models	Taalas	Each chip is a single trained model in silicon	One model per chip, extreme specialization
Sohu	Etched	Hardcoded transformer architecture (not model)	Transformer inference only

Several distinctions are worth highlighting:

Sohu versus general GPUs: Sohu trades flexibility for throughput. It cannot run convolutional networks, classical recommender models, state space models, or non-transformer recurrent networks.^[4] If a non-transformer architecture displaces transformers, Sohu becomes obsolete in a way that GPUs do not.^[5]
Sohu versus Groq LPU: Both target inference, but Groq's LPU uses a deterministic streaming dataflow with hundreds of megabytes of on-chip SRAM as primary weight storage to minimize latency. Groq excels at low latency and serves users through its own cloud service, while Sohu aims for raw throughput per server and is sold as silicon to be deployed by customers in their own datacenters.^[16]
Sohu versus Cerebras WSE-3: Cerebras's wafer scale engine is the largest chip ever built and supports both training and inference, with very high single-job throughput on models that fit entirely on-die. Sohu is a conventional reticle-size die packaged with HBM and is inference only.^[16]
Sohu versus Taalas: Taalas goes a step further than Etched by hardcoding individual trained models into silicon, not just the architecture. Etched argues that this is too rigid because frontier models are updated faster than custom silicon can be respun.^[16]
Sohu versus TPU and Trainium: Google TPU and AWS Trainium are general AI ASICs developed by hyperscalers for internal use. They are wider than Sohu architecturally (they handle training as well as inference) but less specialized for the transformer attention path specifically.

How much funding has Etched raised?

Etched has raised approximately $1 billion across publicly disclosed rounds.^[11]^[13]

Round	Date	Amount	Lead investor	Notable participants
Seed	March 2023	~$5.4 million	(undisclosed)	Multiple early-stage investors^[8]
Series A	June 25, 2024	$120 million	Primary Venture Partners, Positive Sum Ventures	Peter Thiel, Thomas Dohmke (Github), Balaji Srinivasan, Amjad Masad (Replit), Kyle Vogt (Cruise), Charlie Cheever (Quora)^[2]^[3]^[8]
Growth round	January 2026	~$500 million	Stripes	Peter Thiel, Positive Sum, Ribbit Capital^[11]^[12]^[13]^[14]

The January 2026 round, reported by The Information and corroborated by Reuters and Bloomberg, valued Etched at roughly $5 billion.^[11]^[13]^[14] At the time of the round, the company had been operating for about three and a half years and had not yet shipped Sohu to external customers.^[11] Investors cited as their primary rationale Etched's ability to lock in TSMC 4 nanometer capacity, the strength of its founder team, and the strategic value of an alternative to NVIDIA in transformer inference.^[11]^[13] The round brought Etched's total raised close to $1 billion.^[11] Peter Thiel participated personally in both rounds. Primary Venture Partners has positioned Etched as the centerpiece of its AI hardware portfolio.^[11]

At the time of the Series A announcement, Etched stated that customers had already reserved "tens of millions of dollars" worth of Sohu hardware, although none of the prospective customers were named publicly.^[2]

What is Oasis, the Decart partnership?

On October 31, 2024, Etched and Decart, an Israeli AI startup, jointly released Oasis, an interactive, playable world model that generates a Minecraft-style 3D environment frame by frame from keyboard and mouse inputs.^[9]^[10] Oasis is a diffusion transformer that performs next-frame prediction, treating each frame of the game as a token to be predicted given a short history of previous frames and the player's most recent inputs. There is no game engine and no procedural world; the world exists only in the model's predictions.^[10]

The model was trained on millions of hours of Minecraft gameplay and corresponding actions and runs at 20 FPS at 360p in its public demo.^[10]^[17] The architecture combines a vision transformer (ViT) based autoencoder with a DiT (Diffusion Transformer) backbone.^[9] The public demo ran on NVIDIA H100 GPUs and at launch had queues of more than 400 users with 15-minute waits, illustrating the cost-to-scale problem the companies argue Sohu addresses.^[17] Etched described Oasis as a demonstration of the kind of workload Sohu is meant to accelerate, with future Sohu-hosted versions targeting models exceeding 100 billion parameters at 4K resolution and supporting "more than an order of magnitude" additional concurrent users.^[10]^[17] Independent commentary characterized Oasis as a strategic exercise in market creation: rather than searching for workloads to fit Sohu, Etched is constructing a workload class (real-time, interactive multimodal generation) that is uneconomical without Sohu.^[17]

Has Sohu shipped to customers yet?

As of May 2026, Etched has not publicly confirmed that Sohu has shipped in volume to external customers.^[11]^[19] The company has discussed reservation commitments with major AI labs and cloud providers and has stated that early reference units are in the hands of select partners, but it has not named those partners or published deployment dates.^[11] Public commentary, including from Etched's own investors and the Manifold Markets prediction market, suggests that volume shipment slipped from the original "late 2024 to early 2025" target into 2026.^[19]

A Manifold prediction market titled "Will the Sohu AI chip ship to customers within a year?" resolved "no" on July 2, 2025; the resolution criterion required only that any chip called Sohu ship, not that it meet the headline performance claims.^[19]

Key gaps in the public record include:

Tapeout status: Etched has not publicly confirmed first or second silicon tapeouts. At the June 2024 announcement, the company stated that first chips were "less than two years away from tapeout," implying production silicon in 2026.^[4]
Sampling dates: Etched has not published a sampling schedule for evaluation units.
Customer announcements: No named customer deployment has been publicly disclosed.
Third-party benchmarks: No independent benchmarks of Sohu exist; all published figures come from Etched.^[5]^[11]
Power consumption: Etched has not disclosed Sohu's TDP, board power, or rack-level power numbers.^[5]

The absence of these data points has fueled both skepticism (Etched is selling a bet on silicon that may not ship) and excitement (whoever is using early Sohu units is presumably under non-disclosure with Etched and is one of the largest hyperscalers or AI labs).^[7]^[11] Etched has publicly stated that it is prioritizing volume readiness over early demonstrations, and has cited the example of Groq, which scaled deployment slowly to ensure reliability, as a precedent.

What are the risks and criticisms of Sohu?

Risk	Description
Architectural	Sohu only runs transformer-family models. If the dominant architecture shifts to a state space model, Mamba, or a hybrid successor, Sohu's value collapses.^[4]^[5] Etched argues the transformer has been dominant for nearly a decade and that even rumored hybrids still spend most of their compute inside transformer layers.^[1] Uberti has stated explicitly: "If transformers go away, we'll die."^[8]
Execution	Shipping a reticle-limit chip on TSMC 4nm with HBM3E typically requires hundreds of engineers and $200-300 million in NRE. Etched had roughly 35 employees at the time of the Series A.^[2] Slippage has historically killed AI chip startups (Wave Computing, others). The January 2026 round was widely interpreted as runway insurance.^[11]
Benchmark verification	Until independent third parties measure Sohu, 500,000 tokens per second is a marketing claim. Comparisons against unoptimized GPU baselines rather than current TensorRT-LLM or vLLM configurations are a recurring objection.^[5]
Bandwidth versus compute	LessWrong analysts noted Sohu's claimed memory bandwidth is below the H200, MI300X, and B200, which complicates the "20x" framing because decode-phase inference is typically bandwidth-bound.^[5]
Software porting	Customers must port inference workloads from CUDA to Etched's compiler and serving stack. Adoption of any non-NVIDIA accelerator has been slow at hyperscale.
Supply chain	Sohu depends on TSMC 4nm capacity, HBM3E supply from SK Hynix or Micron, and advanced CoWoS packaging. All three are in tight supply.^[7]

What is next for Etched and Sohu?

Reception of Sohu has been polarized. Bullish coverage from venture capital and AI infrastructure outlets has framed Etched as the most credible attempt to date to break the NVIDIA monopoly on AI inference.^[13]^[14] Skeptical coverage has focused on three themes: that Etched is selling a chip that has not shipped against benchmarks the company controls; that the transformer architecture, while dominant, is not necessarily permanent; and that the founders' age and lack of prior chip industry track record raises execution risk.^[5]^[7] Jon Peddie Research observed that competition for HBM3E and successor memory supply with NVIDIA's Rubin generation is a structural challenge for any merchant AI chip startup.^[7] The AI safety community on LessWrong and similar venues has noted that an order-of-magnitude reduction in inference cost would accelerate deployment of autonomous agents, reasoning systems, and multimodal applications, raising downstream questions about cheap-inference safety.^[5]

Etched has publicly stated that Sohu is the first chip in a multi-generation roadmap. The company has discussed a second-generation chip on a more advanced node targeting both inference and prefill-heavy training workloads, as well as a smaller, lower-power inference chip for edge and on-device transformer inference. Etched executives have suggested that the addressable market for transformer-specific silicon will exceed $100 billion annually by the end of the decade if inference volumes grow as expected.^[22]

References

"Etched is Making the Biggest Bet in AI." Etched, June 25, 2024. https://www.etched.com/announcing-etched ↩
"Etched is building an AI chip that only runs transformer models." TechCrunch, June 25, 2024. https://techcrunch.com/2024/06/25/etched-is-building-an-ai-chip-that-only-runs-transformer-models/ ↩
"Etched raises $120 million to build chip to take on Nvidia in AI." CNBC, June 25, 2024. https://www.cnbc.com/2024/06/25/etched-raises-120-million-to-build-chip-to-take-on-nvidia-in-ai.html ↩
"Etched scores $120M for an ASIC built for transformer models." The Register, June 26, 2024. https://www.theregister.com/2024/06/26/etched_asic_ai/ ↩
"New fast transformer inference ASIC: Sohu by Etched." LessWrong, July 2024. https://www.lesswrong.com/posts/qhpB9NjcCHjdNDsMG/new-fast-transformer-inference-asic-sohu-by-etched ↩
"Etched Sohu: Transformer-Only Inference ASIC." Awesome Agents Hardware Directory. https://awesomeagents.ai/hardware/etched-sohu/
"Welcome to the club, Etched." Jon Peddie Research, 2024. https://www.jonpeddie.com/news/welcome-to-the-club-etched/ ↩
"Transformer model chipmaker Etched.ai raises $120M to challenge Nvidia's market dominance." SiliconAngle, June 25, 2024. https://siliconangle.com/2024/06/25/transformer-model-chipmaker-etched-ai-raises-120m-challenge-nvidias-market-dominance/ ↩
"Etched and Decart Release Oasis, a New AI Model Transforming Gaming Worlds." InfoQ, November 2024. https://www.infoq.com/news/2024/11/decart-etched-oasis/ ↩
"This AI-generated Minecraft may represent the future of real-time video generation." MIT Technology Review, October 31, 2024. https://www.technologyreview.com/2024/10/31/1106461/this-ai-generated-minecraft-may-represent-the-future-of-real-time-video-generation/ ↩
"Etched.ai raises $500m for a $5bn valuation - report." DatacenterDynamics, January 2026. https://www.datacenterdynamics.com/en/news/etchedai-raises-500m-for-a-5bn-valuation-report/ ↩
"Etched Raises $500M, Valued at $5B for AI Chip Sohu." Asia Business Outlook, January 2026. https://www.asiabusinessoutlook.com/news/etched-raises-500m-valued-at-5b-for-ai-chip-sohu-nwid-11065.html ↩
"Harvard dropouts' Etched raises $500M at $5B valuation to challenge Nvidia." Tech Funding News, January 2026. https://techfundingnews.com/nvidia-rival-ai-chip-maker-etched-founded-by-harvard-dropouts-lands-500m-at-5b-valuation/ ↩
"AI Chip Startup Etched Raises $500 Million to Take on Nvidia." Bloomberg, January 13, 2026. https://www.bloomberg.com/news/articles/2026-01-13/ai-chip-startup-etched-raises-500-million-to-take-on-nvidia ↩
"From Dorm Room Beginnings to a Pioneer in the AI Chip Revolution." Rambus, 2024. https://www.rambus.com/blogs/from-dorm-room-beginnings-to-a-pioneer-in-the-ai-chip-revolution-how-etched-is-collaborating-with-rambus-to-achieve-their-vision/ ↩
"The AI Inference Wars: Comparing Taalas, Cerebras, Groq, Etched, and NVIDIA." The Menon Lab Blog, 2025. https://themenonlab.blog/blog/ai-inference-accelerators-compared ↩
"Etched's Oasis: Creating a Market For Sohu." Chipstrat, 2024. https://www.chipstrat.com/p/etcheds-oasis-creating-a-market-for ↩
"Etched (company)." Wikipedia. https://en.wikipedia.org/wiki/Etched.ai
"Will the Sohu AI chip ship to customers within a year?" Manifold Markets, June 2024 to July 2025 (resolved no). https://manifold.markets/ahalekelly/will-the-sohu-ai-chip-ship-to-custo ↩
"Sohu AI chip claimed to run models 20x faster and cheaper than Nvidia H100 GPUs." Tom's Hardware, June 26, 2024. https://www.tomshardware.com/tech-industry/artificial-intelligence/sohu-ai-chip-claimed-to-run-models-20x-faster-and-cheaper-than-nvidia-h100-gpus
"Etched: specialized Sohu chips for AI transformer architecture." Aurorean Horizon, 2024. https://blog.auroreanhorizon.com/p/etched
"Etched's $500 Million Raise: A Blueprint for Enterprise AI Inference in 2026." AI 2 Work, 2026. https://ai2.work/technology/etcheds-500-million-raise-a-blueprint-for-enterprise-ai-inference-in-2026/ ↩
"Etched chip 10x faster than Blackwell." Electronics Weekly, June 2024. https://www.electronicsweekly.com/news/business/etched-chip-an-order-of-magnitude-faster-than-blackwell-2024-06/
"Etched: Meet Sohu, the fastest AI chip of all time" (X/Twitter announcement). Etched on X, June 25, 2024. https://x.com/Etched/status/1805625693113663834 ↩

Improve this article

Add missing citations, update stale details, or suggest a clearer explanation. Every suggestion is reviewed for sourcing before it goes live.

2 revisions by 1 contributors · full history

Suggest edit

What links here

HBM3e Lightmatter Positron AI Rain AI SambaNova SN40L

What is Sohu?

Who founded Etched and why?

How does a transformer ASIC work?

Precision and physical layout

What software does Sohu run?

How fast is Sohu?

How does Sohu compare with other AI accelerators?

How much funding has Etched raised?

What is Oasis, the Decart partnership?

Has Sohu shipped to customers yet?

What are the risks and criticisms of Sohu?

What is next for Etched and Sohu?

See also

References

Improve this article

Related Articles

NVIDIA Picasso

Groq LPU

d-Matrix Corsair

Positron AI

AWS Inferentia

FP4 (4-bit floating point)

What links here

Related Articles

NVIDIA Picasso

Groq LPU

d-Matrix Corsair

Positron AI

AWS Inferentia

FP4 (4-bit floating point)

What links here