CANN (Huawei)
Last reviewed
Jun 3, 2026
Sources
10 citations
Review status
Source-backed
Revision
v1 · 1,658 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
Jun 3, 2026
Sources
10 citations
Review status
Source-backed
Revision
v1 · 1,658 words
Add missing citations, update stale details, or suggest a clearer explanation.
CANN, short for Compute Architecture for Neural Networks, is the heterogeneous computing architecture and software stack that Huawei provides for its Ascend line of AI processors. It sits between high-level AI frameworks and Ascend hardware, turning the computational graphs and operators that a framework emits into instructions that an Ascend neural processing unit (NPU) can execute. In the Ascend ecosystem CANN occupies the role that CUDA occupies for NVIDIA GPUs: it is the layer that makes the chips programmable, and its maturity is central to whether Huawei can offer a credible alternative to NVIDIA's software ecosystem.[1][2]
Huawei's own documentation describes CANN as "a heterogeneous computing architecture launched by Ascend for AI scenarios" and "a crucial platform for improving the computing efficiency of the Ascend AI processors." It exposes a set of hierarchical programming interfaces so that developers can work at different levels of abstraction, from calling pre-built operators to writing custom kernels.[1] The stack supports three main training and inference frameworks: Huawei's own MindSpore, PyTorch through an adapter, and TensorFlow through an adapter.[1]
CANN is released in versioned editions. As of the CANN Commercial Edition documentation, the stack is at the 8.x series (for example version 8.0 and version 8.5.0), with new releases tracking the capabilities of successive Ascend chips such as the 910B and the 910C.[1][3]
CANN is organized as a set of layers that progressively lower a model from a framework graph down to hardware instructions. The main pieces, drawn from Huawei's documentation, are summarized below.
| Layer | Component | Role |
|---|---|---|
| Application interface | AscendCL (Ascend Computing Language) | Unified C and Python style APIs for system configuration, runtime management, and model execution |
| Graph | Graph Engine (GE) | Builds and optimizes the computational graph into a form executable on Ascend chips |
| Operator development | Ascend C, TBE, AI CPU | Languages and frameworks for writing custom operators |
| Operator libraries | AOL, ATB | Pre-tuned high-performance operators, including a Transformer-focused library |
| Communication | HCCL | Collective communication across multiple Ascend processors for distributed training |
| Media | DVPP | Hardware-accelerated digital vision pre-processing for image and video data |
| Compiler and tuning | BiSheng compiler, AOE | Compiles operator code to binaries and auto-tunes performance |
| Runtime and driver | Runtime, driver | Task scheduling, memory management, and the kernel-mode interface to the hardware |
At the top, AscendCL provides the programming surface that applications and frameworks call into. The Graph Engine (GE) is the module that takes a whole-model graph and adapts it into a more efficient graph that can run directly on Ascend chips; in the MindSpore source tree GE is the linkage between the framework front end and the hardware.[1][4] Below the operator level, the Ascend Operator Library (AOL) supplies high-performance operators tuned for the hardware, and the Ascend Transformer Boost (ATB) library targets the attention and matrix patterns common in large language models. HCCL (Huawei Collective Communication Library) handles cross-device data exchange such as all-reduce in distributed training, and DVPP (Digital Vision Pre-Processing) offloads image decoding and resizing to dedicated hardware. The BiSheng compiler lowers operator code to binary, and the AOE (Ascend Optimization Engine) tool performs automatic performance tuning. The runtime schedules tasks onto the NPU and manages memory, while the driver provides the kernel-level interface to the silicon.[1]
Ascend C is the operator programming language that Huawei offers for writing custom kernels on Ascend hardware. It is the rough counterpart to writing CUDA C++ kernels, and it is the layer developers reach for when an existing library operator does not cover their needs. An Ascend C operator is typically split into two parts: a host-side tiling program that decides how to partition data and orchestrate movement, and a device-side kernel program that schedules and pipelines the actual compute instructions on the NPU.[5] Older operator development paths, including TBE (Tensor Boost Engine) and AI CPU operators, also exist within CANN, but Ascend C is the path Huawei has promoted for newer development. The associated toolchain includes utilities for generating, testing, debugging, and profiling operators.[1]
Most practitioners do not call CANN directly; they reach it through a framework. PyTorch support is provided by torch_npu, the Ascend Extension for PyTorch. It registers an "npu" device through PyTorch's PrivateUse1 backend mechanism and bridges PyTorch's front end to the CANN runtime and libraries underneath. CANN must be installed first, because torch_npu depends on its runtime and operator libraries; the extension's version numbers follow a "{PyTorch version}-{Ascend version}" convention so that a given build is matched to a specific CANN release (for example a torch_npu build aligned with PyTorch 2.1.0 and a particular CANN 8.x version).[6] MindSpore, Huawei's first-party framework, integrates with CANN natively through GE rather than through a separate adapter, and a TensorFlow adapter is also available.[1] Beyond training frameworks, CANN is used as a backend by inference and serving software in the Ascend ecosystem, and there is a community-maintained CANN execution provider for ONNX Runtime.[2]
The clearest way to understand CANN is by analogy to NVIDIA's stack, with the caveat that the analogy is approximate rather than a one-to-one mapping.
| NVIDIA / CUDA | Huawei / CANN | Purpose |
|---|---|---|
| CUDA C++ kernels | Ascend C | Writing custom device kernels |
| CUDA Runtime API | AscendCL and runtime | Application-level device and execution control |
| cuDNN, cuBLAS | AOL, ATB | Pre-optimized math and neural-network operators |
| NCCL | HCCL | Multi-device collective communication |
| NVCC | BiSheng compiler | Compiling kernels to device binaries |
| nvJPEG and related | DVPP | Hardware media pre-processing |
The functional pieces line up reasonably well, but the ecosystems are not equivalent in maturity. Press coverage of Huawei's plans noted that CUDA has been developed and refined for close to two decades, and that CANN would likely take years to approach that level of polish, tooling, and third-party library coverage.[2] Huawei has itself acknowledged friction: at Huawei Connect 2025, Eric Xu said customer feedback after the open-sourcing of DeepSeek's models surfaced "many issues and expectations they've had with Ascend," which was part of the motivation for opening the stack.[7]
Huawei first announced in August 2025 that it would open-source CANN, with rotating chairman Eric Xu framing the move as a way to "speed up innovation from developers" and "make Ascend easier to use," and positioning it as an alternative to NVIDIA's proprietary platform.[8] The plan was laid out in more detail at Huawei Connect 2025, held in Shanghai from 18 to 20 September 2025.[7][9]
Xu described a tiered approach for CANN: Huawei would open the interfaces for the compiler and the virtual instruction set, and fully open-source the rest of the software, based on the existing Ascend 910B and 910C designs, by 31 December 2025. He paired this with commitments to fully open-source the Mind series of application enablement kits and toolchains and the openPangu foundation models on the same deadline, and said future versions would have their open-source plans synchronized with product launches.[7] Zhang Dixuan, president of Huawei's Ascend Computing Business, gave the staged schedule: all CANN operators were to be open-sourced on the GitCode platform by late September 2025, with core components including domain-specific libraries, GE, Ascend C, and the MindIE inference engine to follow by December 2025.[9] A CANN Technical Steering Committee was set up to govern contributions and roadmaps, and Huawei pledged supporting resources reported as 1,500 PFLOPS of computing power and 30,000 development boards per year for the open-source community.[9]
The open-sourcing is partial rather than total. Keeping the compiler and the virtual instruction set as opened interfaces, rather than fully open implementations, lets developers see and target the compilation path while Huawei retains some lower-level detail. Commentators also noted at the time that licensing terms, governance specifics, documentation quality, and the durability of community support remained to be proven.[10]
CANN is the focal point of a broader question about whether Chinese AI accelerators can build a durable software ecosystem against CUDA. The hardware side has advanced, with Ascend parts and the CloudMatrix systems built around them positioned against NVIDIA's high-end accelerators, but the software experience has lagged.[2][8] The practical gaps are familiar ones for any challenger to an entrenched platform: a smaller body of tuned operators and third-party libraries, fewer developers fluent in the tools, and a documentation and debugging experience that reviewers describe as still behind CUDA's. Opening the stack is Huawei's attempt to close those gaps by letting external developers, Chinese AI companies, universities, and research institutions contribute operators and fixes directly, and by leaning on adapters such as torch_npu and compatibility with widely used serving software to lower the cost of moving existing PyTorch workloads onto Ascend. Whether that converts into the kind of self-sustaining ecosystem CUDA enjoys is the open question that CANN's trajectory will answer.[2][7][8]