# Etched Sohu

> Source: https://aiwiki.ai/wiki/etched_sohu
> Updated: 2026-06-27
> Categories: AI Hardware, AI Inference
> License: CC BY 4.0 (https://creativecommons.org/licenses/by/4.0/)
> From AI Wiki (https://aiwiki.ai), the free encyclopedia of artificial intelligence. Reuse freely with attribution to "AI Wiki (aiwiki.ai)".

| Etched Sohu | |
| --- | --- |
| General information | |
| **Manufacturer** | [Etched](/wiki/etched) |
| **Country of origin** | United States |
| **Announced** | June 25, 2024 |
| **Status** | Pre-production (not yet shipping as of May 2026) |
| **Architecture** | Transformer-only ASIC |
| **Process node** | TSMC 4nm (N4 family) |
| **Memory** | 144 GB HBM3E |
| **Memory bandwidth** | Approximately 4,800 GB/s (claimed) |
| **Server configuration** | 8x Sohu chips per node |
| **Claimed throughput** | 500,000+ tokens/sec on Llama-3 70B (8-chip server) |
| **Performance claim vs H100** | One 8xSohu server replaces 160 H100s (~20x faster on transformer inference, Etched estimate) |
| **Software stack** | Open-source compiler, drivers, kernels, serving stack |
| **Website** | [etched.com](https://www.etched.com) |

**Sohu** is a transformer-specialized [application-specific integrated circuit](/wiki/asic) (ASIC) built by [Etched](/wiki/etched), a Silicon Valley [AI hardware](/wiki/ai_hardware) startup founded in 2022 by Harvard dropouts Gavin Uberti, Chris Zhu, and Robert Wachen.[^1][^2] Announced on June 25, 2024, Sohu is the first commercially marketed chip designed to run only one neural network architecture, the [transformer](/wiki/transformer), and Etched claims that a single server with eight Sohu chips delivers more than 500,000 tokens per second on [Llama-3 70B](/wiki/llama_3) and "replaces 160 H100s."[^1][^24] Because it hardwires the transformer into silicon instead of offering general-purpose programmable compute, Etched estimates Sohu runs transformer [inference](/wiki/inference) roughly 20 times faster than an [NVIDIA H100](/wiki/nvidia_h100) GPU while using significantly less energy, though as of mid-2026 these figures are vendor claims that no independent third party has verified and the chip has not shipped in volume.[^1][^4][^5][^11][^19]

The chip is fabricated on [TSMC](/wiki/tsmc) 4 nanometer process technology and pairs the compute die with 144 GB of HBM3E memory.[^1][^4] According to Etched's published claims, one server fitted with eight Sohu chips can sustain more than 500,000 Llama 70B tokens per second, compared with roughly 23,000 tokens per second for an eight-GPU H100 server and approximately 45,000 tokens per second for an eight-GPU [Blackwell B200](/wiki/nvidia_blackwell) server.[^24] Etched argues that this performance comes from achieving over 90 percent floating-point unit utilization, against approximately 30 percent for general-purpose [GPUs](/wiki/gpu) running attention-heavy workloads.[^1][^5]

Despite the headline numbers, as of May 2026 Sohu has not shipped in volume to external customers, and no independent third-party benchmarks have been published.[^11][^19] A Manifold prediction market on whether Sohu would ship to customers within a year of announcement resolved "no" on July 2, 2025.[^19] The company has, however, attracted significant attention and capital. Etched raised a $120 million [Series A](/wiki/series_a) in June 2024 led by Primary Venture Partners and Positive Sum Ventures with participation from [Peter Thiel](/wiki/peter_thiel), [Github](/wiki/github) CEO Thomas Dohmke, and former [Coinbase](/wiki/coinbase) CTO Balaji Srinivasan.[^2][^3] In January 2026 it closed a $500 million round led by Stripes with participation from Peter Thiel, Positive Sum, and Ribbit Capital, valuing the company at approximately $5 billion and bringing total funding close to $1 billion.[^11][^12][^13][^14]

## What is Sohu?

Sohu is a piece of silicon that does exactly one thing: run the forward pass of [transformer](/wiki/transformer) neural networks for [inference](/wiki/inference). A conventional [GPU](/wiki/gpu) is a programmable parallel processor that can execute any computation a compiler can map onto it, which is why the same NVIDIA chip can train a [large language model](/wiki/large_language_model), render graphics, or run a physics simulation. Sohu gives up all of that generality. It cannot run convolutional networks, classical recommender models, [state space models](/wiki/state_space_model), or non-transformer recurrent networks.[^4] In exchange, Etched claims it can devote almost its entire transistor budget to the specific arithmetic and memory-movement patterns of transformer attention and feed-forward layers, which the company says is what enables the headline throughput. The bet is that the transformer, introduced in Google's 2017 paper "Attention Is All You Need," is so dominant across [large language models](/wiki/large_language_model), [diffusion models](/wiki/diffusion_models), and multimodal generation that a chip running nothing else has a large enough market to be worth building.[^1]

## Who founded Etched and why?

Etched was founded in 2022 by Gavin Uberti, Chris Zhu, and Robert Wachen, three undergraduates who left [Harvard University](/wiki/harvard_university) to build a dedicated transformer accelerator.[^2][^15] Uberti, the company's chief executive officer, had previously worked at [OctoML](/wiki/octoml) and Xnor.ai on inference optimization.[^2] Chris Zhu holds degrees in mathematics and computer science from Harvard and serves as the company's chief technology officer.[^8] Robert Wachen, prior to Etched, had co-founded a startup accelerator. The founders are alumni of the [Thiel Fellowship](/wiki/thiel_fellowship) and built early prototypes of Sohu before raising institutional capital.[^15]

Etched's design and verification efforts are led in part by Mark Ross, a former chief technology officer of [Cypress Semiconductor](/wiki/cypress_semiconductor), with additional engineering hires drawn from [Broadcom](/wiki/broadcom), [Apple](/wiki/apple), and [NVIDIA](/wiki/nvidia).[^2] In June 2024, the company had approximately 35 employees.[^2]

The core thesis behind Sohu is that the [transformer architecture](/wiki/transformer), introduced by Google researchers in the 2017 paper "Attention Is All You Need," has become so dominant for [large language models](/wiki/large_language_model), [diffusion models](/wiki/diffusion_models), and increasingly for vision and video generation, that there is enough scale to justify a chip that runs nothing else.[^1] Uberti framed the bet bluntly in an interview at the time of the Series A: "We're making the biggest bet in AI. If transformers go away, we'll die. But if they stick around, we're the biggest company of all time."[^8] On the day of the announcement Uberti put the same wager on X: "We're taking the biggest bet in AI, a chip that can only run transformers, but does so orders of magnitude faster than GPUs. Maybe attention *is* all you need."[^1] Etched's argument is that general-purpose accelerators waste most of their transistor budget on flexibility the market no longer needs, since modern frontier AI workloads are almost entirely transformer-based.[^1]

Etched is headquartered in Cupertino and San Jose, California.[^11] The company partners with [TSMC](/wiki/tsmc)'s Emerging Businesses Group for fabrication and with [Rambus](/wiki/rambus) on HBM controller IP, which Etched has credited with reducing the complexity of integrating a high-bandwidth memory subsystem.[^11][^15] Sohu was unveiled on June 25, 2024, alongside Etched's $120 million Series A, with the marketing line "Meet Sohu, the fastest AI chip of all time."[^24] The announcement drew coverage from [TechCrunch](/wiki/techcrunch), [CNBC](/wiki/cnbc), Tom's Hardware, [The Register](/wiki/the_register), and [The Wall Street Journal](/wiki/wall_street_journal).[^2][^3][^4]

## How does a transformer ASIC work?

A conventional [GPU](/wiki/gpu) such as the [H100](/wiki/nvidia_h100) or [B200](/wiki/nvidia_blackwell) is a programmable parallel processor with thousands of generic compute units (CUDA cores, tensor cores) plus a fixed-function memory hierarchy. The compiler maps any neural network onto those units at runtime. By contrast, Sohu's compute fabric is laid out as a pipeline of dedicated transformer blocks.[^1][^5] Each block contains hardware tailored to one stage of the transformer forward pass:

- Token embedding and positional encoding
- Multi-head attention with QKV projection, scaled dot product, and output projection
- Softmax over attention scores
- [KV-cache](/wiki/kv_cache) read and write paths
- Feed-forward network with [GELU](/wiki/gelu) or [SiLU](/wiki/silu) activation
- Layer normalization or RMSNorm
- Residual connections

Because these stages are fixed in hardware, Etched does not need to emit instructions to schedule them or pay for the area of programmable issue logic. The control path is dramatically simpler than that of a GPU, and the same transistor budget can host far more useful arithmetic units. Etched claims that this is what allows Sohu to reach more than 90 percent floating-point unit utilization on transformer workloads, against roughly 30 percent for [GPUs](/wiki/gpu) where unit utilization is gated by memory bandwidth, kernel launch overheads, and attention sparsity.[^1][^5]

A consequence Etched emphasizes is the chip's tolerance for very large inference batches. Where typical GPU serving paths see performance degrade beyond batch sizes of roughly 32 to 64, Etched claims Sohu can run "batch sizes in the thousands without any performance degradation," which is intended as a direct enabler of [mixture of experts](/wiki/mixture_of_experts) and [speculative decoding](/wiki/speculative_decoding) workloads.[^4][^5]

### Precision and physical layout

Sohu's compute path is optimized for [FP8](/wiki/fp8).[^1][^5] The headline 500,000 tokens per second figure is reported for Llama-3 70B running in FP8 with 2,048 input tokens and 128 output tokens.[^5][^24] The chip also supports [INT8](/wiki/int8) for quantized models. Etched has not published a peak TFLOPS number, instead reporting tokens-per-second on specific reference models, which has drawn criticism from analysts who prefer architecture-neutral metrics.[^5][^7]

Sohu is described as a reticle-limit die fabricated on a 4 nanometer process node from [TSMC](/wiki/tsmc).[^1][^5] The die is connected via interposer to six HBM3E stacks totaling 144 GB of capacity, providing roughly 0.75x the capacity and bandwidth of an [NVIDIA B200](/wiki/nvidia_blackwell) and approximately 1.8x the capacity of an [H100](/wiki/nvidia_h100).[^5][^24]

| Specification | Value |
| --- | --- |
| Process node | TSMC 4nm (N4 family) |
| Die size | Reticle limit (~800 mm-squared) |
| On-package memory | 144 GB HBM3E (six stacks) |
| Memory bandwidth | ~4,800 GB/s (claimed) |
| Primary numeric format | FP8 (also INT8) |
| Claimed FLOPS utilization | >90% on transformer inference |
| Server configuration | 8x Sohu per node |
| Headline benchmark | 500,000+ tokens/sec on Llama-3 70B (8x Sohu) |

The 144 GB memory budget is enough to fit a 70 billion parameter model at FP8 weights (about 70 GB) with substantial room for the [KV cache](/wiki/kv_cache) and activations, which is essential for serving long [context windows](/wiki/context_window).[^5] An 8-chip server can hold a 400-billion to 600-billion parameter model with tensor parallelism, making Sohu well suited to the [mixture of experts](/wiki/mixture_of_experts) variants now dominant in frontier deployments.[^5]

## What software does Sohu run?

Because Sohu only runs transformers, Etched's software is much narrower than that of a general accelerator. There is no equivalent of [CUDA](/wiki/cuda) and no general kernel programming model.[^1] Instead, the stack consists of:

- A **transformer compiler** that takes a model graph from [PyTorch](/wiki/pytorch), [Hugging Face Transformers](/wiki/hugging_face), or [ONNX](/wiki/onnx) and emits a fixed-pipeline configuration for Sohu's blocks.
- **Drivers and runtime** that manage memory placement, KV-cache lifecycle, and request batching.
- **Optimized kernels** for the fixed stages of attention, feed forward, normalization, and embedding.
- A **serving stack** that supports continuous batching, [speculative decoding](/wiki/speculative_decoding), [parallel reasoning](/wiki/parallel_reasoning), and [mixture of experts](/wiki/mixture_of_experts) routing.
- **Frontend bindings** for popular inference engines, including drop-in adapters for serving frameworks such as vLLM, TensorRT-LLM, and Hugging Face's text generation inference.

Etched has emphasized that the stack is open source, which the company hopes will accelerate adoption by allowing frontier labs to audit and modify the compiler.[^1] The stack natively handles modern transformer variants, including grouped-query attention, multi-query attention, sliding window attention, [rotary position embeddings](/wiki/rotary_position_embedding), parallel attention and feed-forward layers, and mixture-of-experts routing.[^1] Etched has stated that future architectures that remain within the transformer family will be supported via firmware and compiler updates, but that a successor chip will be required for any architecturally distinct successor to the transformer, such as a pure [state space model](/wiki/state_space_model) or recurrent style network.[^5]

In parallel with chip development, the company announced the **Sohu Developer Cloud**, an online preview environment intended to let prospective customers run transformer workloads against an emulated Sohu pipeline ahead of silicon availability.[^2]

## How fast is Sohu?

Etched's headline performance figure is **500,000 tokens per second** for Llama-3 70B running on a single 8x Sohu server in FP8 precision, with 2,048 input tokens and 128 output tokens per request.[^5][^24] The same benchmark on equivalent eight-GPU servers yields, by Etched's measurements, roughly:

| Platform | Llama-3 70B (FP8) tokens/sec | Ratio to Sohu |
| --- | --- | --- |
| 8x Sohu (server) | ~500,000 | 1.0x |
| 8x NVIDIA B200 (Blackwell) | ~45,000 | ~0.09x |
| 8x NVIDIA H100 (Hopper) | ~23,000 | ~0.046x |
| 8x NVIDIA A100 | ~9,000 | ~0.018x |

Sources: Etched announcement, June 25, 2024.[^1][^24]

Etched concludes that a single 8x Sohu server is equivalent to about 160 H100 GPUs on transformer inference.[^1][^24] In its launch post the company stated plainly: "With over 500,000 tokens per second running Llama 70B, Sohu lets you build products that are impossible on GPUs. One 8xSohu server replaces 160 H100s."[^24] Uberti has described Sohu as "an order of magnitude faster and cheaper than even Nvidia's next generation of Blackwell GB200 GPUs."[^2] Critics have raised several concerns: the H100 numbers are for unoptimized stock paths while Sohu uses its own optimized stack; the benchmark uses input and output lengths favorable to high prefill throughput; no third party has measured Sohu in physical form; and Etched has not released power numbers.[^5][^7] LessWrong analysts have noted that the chip's claimed memory bandwidth, while larger than the H100's, falls behind the [H200](/wiki/nvidia_h200), [MI300X](/wiki/amd_mi300x), and B200, which complicates the "20x" framing because decode-phase performance is typically bandwidth-bound.[^5] Independent commentators have nonetheless observed that even if Sohu delivers half the claimed throughput, it would still represent a meaningful architectural advance, since the 90 percent FLOPS utilization figure is consistent with what a fixed transformer pipeline could in principle achieve.[^5]

At announcement, Etched disclosed that the chip existed only as an FPGA emulation, with first silicon "less than two years away from tapeout."[^4] That timeline, taken literally, places production silicon in 2026.

## How does Sohu compare with other AI accelerators?

Sohu enters a crowded field of AI accelerator startups, each of which has made a different bet about the right architectural specialization point.

| Chip / system | Vendor | Strategy | Specialization |
| --- | --- | --- | --- |
| [H100, B200, GB200](/wiki/nvidia_blackwell) | [NVIDIA](/wiki/nvidia) | General-purpose GPU plus tensor cores | Wide: training and inference, all model classes |
| [TPU v5p / v6](/wiki/tpu) | [Google](/wiki/google) | Systolic-array matrix engine | Training and inference, internal use plus cloud |
| [Trainium 2 / Inferentia 2](/wiki/aws_trainium) | [AWS](/wiki/aws) | Custom AI ASIC | Training and inference inside AWS |
| [MTIA](/wiki/meta_mtia) | [Meta](/wiki/meta) | Custom inference ASIC | Internal recommendation and language model inference |
| [Groq LPU](/wiki/groq_lpu) | [Groq](/wiki/groq) | Deterministic streaming compute with on-chip SRAM | Ultra-low-latency LLM inference |
| [Cerebras WSE-3](/wiki/cerebras_wse_3) | [Cerebras](/wiki/cerebras) | Wafer-scale, in-memory compute | Training and inference for very large models |
| [SambaNova SN40L](/wiki/sambanova_sn40l) | [SambaNova](/wiki/sambanova) | Reconfigurable dataflow architecture | Training and inference for foundation models |
| Taalas hardcoded models | [Taalas](/wiki/taalas) | Each chip is a single trained model in silicon | One model per chip, extreme specialization |
| **Sohu** | **Etched** | **Hardcoded transformer architecture (not model)** | **Transformer inference only** |

Several distinctions are worth highlighting:

- **Sohu versus general GPUs:** Sohu trades flexibility for throughput. It cannot run convolutional networks, classical recommender models, [state space models](/wiki/state_space_model), or non-transformer recurrent networks.[^4] If a non-transformer architecture displaces transformers, Sohu becomes obsolete in a way that GPUs do not.[^5]
- **Sohu versus Groq LPU:** Both target inference, but [Groq's LPU](/wiki/groq_lpu) uses a deterministic streaming dataflow with hundreds of megabytes of on-chip [SRAM](/wiki/sram) as primary weight storage to minimize latency. Groq excels at low latency and serves users through its own cloud service, while Sohu aims for raw throughput per server and is sold as silicon to be deployed by customers in their own datacenters.[^16]
- **Sohu versus Cerebras WSE-3:** Cerebras's wafer scale engine is the largest chip ever built and supports both training and inference, with very high single-job throughput on models that fit entirely on-die. Sohu is a conventional reticle-size die packaged with HBM and is inference only.[^16]
- **Sohu versus Taalas:** [Taalas](/wiki/taalas) goes a step further than Etched by hardcoding individual trained models into silicon, not just the architecture. Etched argues that this is too rigid because frontier models are updated faster than custom silicon can be respun.[^16]
- **Sohu versus TPU and Trainium:** [Google TPU](/wiki/tpu) and [AWS Trainium](/wiki/aws_trainium) are general AI ASICs developed by hyperscalers for internal use. They are wider than Sohu architecturally (they handle training as well as inference) but less specialized for the transformer attention path specifically.

## How much funding has Etched raised?

Etched has raised approximately $1 billion across publicly disclosed rounds.[^11][^13]

| Round | Date | Amount | Lead investor | Notable participants |
| --- | --- | --- | --- | --- |
| Seed | March 2023 | ~$5.4 million | (undisclosed) | Multiple early-stage investors[^8] |
| Series A | June 25, 2024 | $120 million | Primary Venture Partners, Positive Sum Ventures | [Peter Thiel](/wiki/peter_thiel), Thomas Dohmke (Github), Balaji Srinivasan, Amjad Masad (Replit), Kyle Vogt (Cruise), Charlie Cheever (Quora)[^2][^3][^8] |
| Growth round | January 2026 | ~$500 million | Stripes | Peter Thiel, Positive Sum, Ribbit Capital[^11][^12][^13][^14] |

The January 2026 round, reported by [The Information](/wiki/the_information) and corroborated by [Reuters](/wiki/reuters) and [Bloomberg](/wiki/bloomberg), valued Etched at roughly $5 billion.[^11][^13][^14] At the time of the round, the company had been operating for about three and a half years and had not yet shipped Sohu to external customers.[^11] Investors cited as their primary rationale Etched's ability to lock in TSMC 4 nanometer capacity, the strength of its founder team, and the strategic value of an alternative to [NVIDIA](/wiki/nvidia) in transformer inference.[^11][^13] The round brought Etched's total raised close to $1 billion.[^11] Peter Thiel participated personally in both rounds. Primary Venture Partners has positioned Etched as the centerpiece of its AI hardware portfolio.[^11]

At the time of the Series A announcement, Etched stated that customers had already reserved "tens of millions of dollars" worth of Sohu hardware, although none of the prospective customers were named publicly.[^2]

## What is Oasis, the Decart partnership?

On October 31, 2024, Etched and [Decart](/wiki/decart), an Israeli AI startup, jointly released **Oasis**, an interactive, playable world model that generates a [Minecraft](/wiki/minecraft)-style 3D environment frame by frame from keyboard and mouse inputs.[^9][^10] Oasis is a diffusion transformer that performs next-frame prediction, treating each frame of the game as a token to be predicted given a short history of previous frames and the player's most recent inputs. There is no game engine and no procedural world; the world exists only in the model's predictions.[^10]

The model was trained on millions of hours of Minecraft gameplay and corresponding actions and runs at 20 FPS at 360p in its public demo.[^10][^17] The architecture combines a vision transformer (ViT) based autoencoder with a [DiT](/wiki/dit) (Diffusion Transformer) backbone.[^9] The public demo ran on [NVIDIA H100](/wiki/nvidia_h100) GPUs and at launch had queues of more than 400 users with 15-minute waits, illustrating the cost-to-scale problem the companies argue Sohu addresses.[^17] Etched described Oasis as a demonstration of the kind of workload Sohu is meant to accelerate, with future Sohu-hosted versions targeting models exceeding 100 billion parameters at 4K resolution and supporting "more than an order of magnitude" additional concurrent users.[^10][^17] Independent commentary characterized Oasis as a strategic exercise in market creation: rather than searching for workloads to fit Sohu, Etched is constructing a workload class (real-time, interactive multimodal generation) that is uneconomical without Sohu.[^17]

## Has Sohu shipped to customers yet?

As of May 2026, Etched has not publicly confirmed that Sohu has shipped in volume to external customers.[^11][^19] The company has discussed reservation commitments with major AI labs and cloud providers and has stated that early reference units are in the hands of select partners, but it has not named those partners or published deployment dates.[^11] Public commentary, including from Etched's own investors and the [Manifold Markets](/wiki/manifold_markets) prediction market, suggests that volume shipment slipped from the original "late 2024 to early 2025" target into 2026.[^19]

A Manifold prediction market titled "Will the Sohu AI chip ship to customers within a year?" resolved "no" on July 2, 2025; the resolution criterion required only that any chip called Sohu ship, not that it meet the headline performance claims.[^19]

Key gaps in the public record include:

- **Tapeout status:** Etched has not publicly confirmed first or second silicon tapeouts. At the June 2024 announcement, the company stated that first chips were "less than two years away from tapeout," implying production silicon in 2026.[^4]
- **Sampling dates:** Etched has not published a sampling schedule for evaluation units.
- **Customer announcements:** No named customer deployment has been publicly disclosed.
- **Third-party benchmarks:** No independent benchmarks of Sohu exist; all published figures come from Etched.[^5][^11]
- **Power consumption:** Etched has not disclosed Sohu's TDP, board power, or rack-level power numbers.[^5]

The absence of these data points has fueled both skepticism (Etched is selling a bet on silicon that may not ship) and excitement (whoever is using early Sohu units is presumably under non-disclosure with Etched and is one of the largest [hyperscalers](/wiki/hyperscaler) or AI labs).[^7][^11] Etched has publicly stated that it is prioritizing volume readiness over early demonstrations, and has cited the example of [Groq](/wiki/groq), which scaled deployment slowly to ensure reliability, as a precedent.

## What are the risks and criticisms of Sohu?

| Risk | Description |
| --- | --- |
| Architectural | Sohu only runs transformer-family models. If the dominant architecture shifts to a [state space model](/wiki/state_space_model), [Mamba](/wiki/mamba), or a hybrid successor, Sohu's value collapses.[^4][^5] Etched argues the transformer has been dominant for nearly a decade and that even rumored hybrids still spend most of their compute inside transformer layers.[^1] Uberti has stated explicitly: "If transformers go away, we'll die."[^8] |
| Execution | Shipping a reticle-limit chip on TSMC 4nm with HBM3E typically requires hundreds of engineers and $200-300 million in NRE. Etched had roughly 35 employees at the time of the Series A.[^2] Slippage has historically killed AI chip startups (Wave Computing, others). The January 2026 round was widely interpreted as runway insurance.[^11] |
| Benchmark verification | Until independent third parties measure Sohu, 500,000 tokens per second is a marketing claim. Comparisons against unoptimized GPU baselines rather than current TensorRT-LLM or vLLM configurations are a recurring objection.[^5] |
| Bandwidth versus compute | LessWrong analysts noted Sohu's claimed memory bandwidth is below the H200, MI300X, and B200, which complicates the "20x" framing because decode-phase inference is typically bandwidth-bound.[^5] |
| Software porting | Customers must port inference workloads from [CUDA](/wiki/cuda) to Etched's compiler and serving stack. Adoption of any non-NVIDIA accelerator has been slow at hyperscale. |
| Supply chain | Sohu depends on [TSMC](/wiki/tsmc) 4nm capacity, HBM3E supply from [SK Hynix](/wiki/sk_hynix) or [Micron](/wiki/micron), and advanced CoWoS packaging. All three are in tight supply.[^7] |

## What is next for Etched and Sohu?

Reception of Sohu has been polarized. Bullish coverage from venture capital and AI infrastructure outlets has framed Etched as the most credible attempt to date to break the [NVIDIA](/wiki/nvidia) monopoly on AI inference.[^13][^14] Skeptical coverage has focused on three themes: that Etched is selling a chip that has not shipped against benchmarks the company controls; that the transformer architecture, while dominant, is not necessarily permanent; and that the founders' age and lack of prior chip industry track record raises execution risk.[^5][^7] Jon Peddie Research observed that competition for HBM3E and successor memory supply with NVIDIA's Rubin generation is a structural challenge for any merchant AI chip startup.[^7] The AI safety community on LessWrong and similar venues has noted that an order-of-magnitude reduction in inference cost would accelerate deployment of [autonomous agents](/wiki/autonomous_agents), reasoning systems, and multimodal applications, raising downstream questions about cheap-inference safety.[^5]

Etched has publicly stated that Sohu is the first chip in a multi-generation roadmap. The company has discussed a second-generation chip on a more advanced node targeting both inference and prefill-heavy training workloads, as well as a smaller, lower-power inference chip for edge and on-device transformer inference. Etched executives have suggested that the addressable market for transformer-specific silicon will exceed $100 billion annually by the end of the decade if inference volumes grow as expected.[^22]

## See also

- [Etched](/wiki/etched)
- [AI accelerator](/wiki/ai_accelerator)
- [Transformer](/wiki/transformer)
- [Large language model](/wiki/large_language_model)
- [Inference](/wiki/inference)
- [NVIDIA H100](/wiki/nvidia_h100)
- [NVIDIA Blackwell](/wiki/nvidia_blackwell)
- [Groq LPU](/wiki/groq_lpu)
- [Cerebras WSE-3](/wiki/cerebras_wse_3)
- [SambaNova SN40L](/wiki/sambanova_sn40l)
- [TPU](/wiki/tpu)
- [AWS Trainium](/wiki/aws_trainium)
- [Mixture of experts](/wiki/mixture_of_experts)

## References

[^1]: "Etched is Making the Biggest Bet in AI." Etched, June 25, 2024. https://www.etched.com/announcing-etched
[^2]: "Etched is building an AI chip that only runs transformer models." TechCrunch, June 25, 2024. https://techcrunch.com/2024/06/25/etched-is-building-an-ai-chip-that-only-runs-transformer-models/
[^3]: "Etched raises $120 million to build chip to take on Nvidia in AI." CNBC, June 25, 2024. https://www.cnbc.com/2024/06/25/etched-raises-120-million-to-build-chip-to-take-on-nvidia-in-ai.html
[^4]: "Etched scores $120M for an ASIC built for transformer models." The Register, June 26, 2024. https://www.theregister.com/2024/06/26/etched_asic_ai/
[^5]: "New fast transformer inference ASIC: Sohu by Etched." LessWrong, July 2024. https://www.lesswrong.com/posts/qhpB9NjcCHjdNDsMG/new-fast-transformer-inference-asic-sohu-by-etched
[^6]: "Etched Sohu: Transformer-Only Inference ASIC." Awesome Agents Hardware Directory. https://awesomeagents.ai/hardware/etched-sohu/
[^7]: "Welcome to the club, Etched." Jon Peddie Research, 2024. https://www.jonpeddie.com/news/welcome-to-the-club-etched/
[^8]: "Transformer model chipmaker Etched.ai raises $120M to challenge Nvidia's market dominance." SiliconAngle, June 25, 2024. https://siliconangle.com/2024/06/25/transformer-model-chipmaker-etched-ai-raises-120m-challenge-nvidias-market-dominance/
[^9]: "Etched and Decart Release Oasis, a New AI Model Transforming Gaming Worlds." InfoQ, November 2024. https://www.infoq.com/news/2024/11/decart-etched-oasis/
[^10]: "This AI-generated Minecraft may represent the future of real-time video generation." MIT Technology Review, October 31, 2024. https://www.technologyreview.com/2024/10/31/1106461/this-ai-generated-minecraft-may-represent-the-future-of-real-time-video-generation/
[^11]: "Etched.ai raises $500m for a $5bn valuation - report." DatacenterDynamics, January 2026. https://www.datacenterdynamics.com/en/news/etchedai-raises-500m-for-a-5bn-valuation-report/
[^12]: "Etched Raises $500M, Valued at $5B for AI Chip Sohu." Asia Business Outlook, January 2026. https://www.asiabusinessoutlook.com/news/etched-raises-500m-valued-at-5b-for-ai-chip-sohu-nwid-11065.html
[^13]: "Harvard dropouts' Etched raises $500M at $5B valuation to challenge Nvidia." Tech Funding News, January 2026. https://techfundingnews.com/nvidia-rival-ai-chip-maker-etched-founded-by-harvard-dropouts-lands-500m-at-5b-valuation/
[^14]: "AI Chip Startup Etched Raises $500 Million to Take on Nvidia." Bloomberg, January 13, 2026. https://www.bloomberg.com/news/articles/2026-01-13/ai-chip-startup-etched-raises-500-million-to-take-on-nvidia
[^15]: "From Dorm Room Beginnings to a Pioneer in the AI Chip Revolution." Rambus, 2024. https://www.rambus.com/blogs/from-dorm-room-beginnings-to-a-pioneer-in-the-ai-chip-revolution-how-etched-is-collaborating-with-rambus-to-achieve-their-vision/
[^16]: "The AI Inference Wars: Comparing Taalas, Cerebras, Groq, Etched, and NVIDIA." The Menon Lab Blog, 2025. https://themenonlab.blog/blog/ai-inference-accelerators-compared
[^17]: "Etched's Oasis: Creating a Market For Sohu." Chipstrat, 2024. https://www.chipstrat.com/p/etcheds-oasis-creating-a-market-for
[^18]: "Etched (company)." Wikipedia. https://en.wikipedia.org/wiki/Etched.ai
[^19]: "Will the Sohu AI chip ship to customers within a year?" Manifold Markets, June 2024 to July 2025 (resolved no). https://manifold.markets/ahalekelly/will-the-sohu-ai-chip-ship-to-custo
[^20]: "Sohu AI chip claimed to run models 20x faster and cheaper than Nvidia H100 GPUs." Tom's Hardware, June 26, 2024. https://www.tomshardware.com/tech-industry/artificial-intelligence/sohu-ai-chip-claimed-to-run-models-20x-faster-and-cheaper-than-nvidia-h100-gpus
[^21]: "Etched: specialized Sohu chips for AI transformer architecture." Aurorean Horizon, 2024. https://blog.auroreanhorizon.com/p/etched
[^22]: "Etched's $500 Million Raise: A Blueprint for Enterprise AI Inference in 2026." AI 2 Work, 2026. https://ai2.work/technology/etcheds-500-million-raise-a-blueprint-for-enterprise-ai-inference-in-2026/
[^23]: "Etched chip 10x faster than Blackwell." Electronics Weekly, June 2024. https://www.electronicsweekly.com/news/business/etched-chip-an-order-of-magnitude-faster-than-blackwell-2024-06/
[^24]: "Etched: Meet Sohu, the fastest AI chip of all time" (X/Twitter announcement). Etched on X, June 25, 2024. https://x.com/Etched/status/1805625693113663834