The rank of a tensor, also referred to as its order or degree, is the number of dimensions (axes or indices) needed to describe the tensor. A scalar has rank 0, a vector has rank 1, a matrix has rank 2, and arrays with three or more axes are higher-rank tensors. This concept is fundamental to linear algebra, deep learning, and scientific computing, where data is routinely organized and manipulated as multi-dimensional arrays.
It is worth noting that the term "tensor rank" carries two distinct meanings depending on context. In machine learning frameworks like TensorFlow and PyTorch, "rank" almost always refers to the number of axes (the order). In multilinear algebra and applied mathematics, "rank" can instead refer to the minimum number of rank-1 tensors needed to express a given tensor as their sum, analogous to the concept of matrix factorization rank. This article covers both meanings.
Imagine you have a box of colored blocks.
The rank of a tensor is simply how many numbers you need to locate a single entry inside it.
In computational contexts, the rank of a tensor equals the number of indices required to uniquely address each element. This is also called the tensor's order, degree, or number of dimensions (ndim). Different communities have adopted different preferred terms:
| Community | Preferred term | Notes |
|---|---|---|
| Physics / differential geometry | Order or rank | "Rank" also used for type (p,q) counting |
| Pure mathematics | Order or degree | "Rank" reserved for decomposition rank |
| Machine learning / computer science | Rank or ndim | Follows TensorFlow and NumPy conventions |
| Signal processing | Order or way | A "3-way array" is order 3 |
The following table summarizes tensors of various ranks, their mathematical names, shapes, and common uses in machine learning and science.
| Rank (order) | Mathematical name | Example shape | ML / science example |
|---|---|---|---|
| 0 | Scalar | () | Learning rate, loss value, temperature reading |
| 1 | Vector | (n,) | Word embedding, bias vector, time series |
| 2 | Matrix | (m, n) | Weight matrix in a neural network, grayscale image, adjacency matrix |
| 3 | 3rd-order tensor | (m, n, p) | RGB color image (height x width x 3), word embeddings for a sentence |
| 4 | 4th-order tensor | (b, c, h, w) | Mini-batch of images in a CNN, video (frames x height x width x channels) |
| 5 | 5th-order tensor | (b, t, c, h, w) | Batch of video clips (batch x time x channels x height x width) |
| n | nth-order tensor | (d1, d2, ..., dn) | General multi-dimensional data, higher-order statistics |
A scalar is a single numerical value with no axes. In machine learning, scalars represent quantities like a model's loss value, the learning rate, or a single pixel intensity. In physics, temperature at a single point or electric charge are scalar quantities.
import tensorflow as tf
import torch
# TensorFlow
scalar_tf = tf.constant(3.14)
print(scalar_tf.ndim) # 0
# PyTorch
scalar_pt = torch.tensor(3.14)
print(scalar_pt.ndim) # 0
A vector is an ordered list of numbers, requiring one index to access each element. Vectors are used to represent feature vectors in datasets, weight biases in neural networks, and one-dimensional signals.
# TensorFlow
vector_tf = tf.constant([1.0, 2.0, 3.0, 4.0])
print(vector_tf.ndim) # 1
print(vector_tf.shape) # (4,)
# PyTorch
vector_pt = torch.tensor([1.0, 2.0, 3.0, 4.0])
print(vector_pt.ndim) # 1
print(vector_pt.shape) # torch.Size(<sup><a href="#cite_note-4" class="cite-ref">[4]</a></sup>)
A matrix is a two-dimensional rectangular array of numbers organized into rows and columns. In deep learning, weight matrices connect layers of a neural network. A grayscale image is stored as a rank-2 tensor where each element represents a pixel intensity.
# TensorFlow
matrix_tf = tf.constant([[1, 2, 3],
[4, 5, 6]])
print(matrix_tf.ndim) # 2
print(matrix_tf.shape) # (2, 3)
# PyTorch
matrix_pt = torch.tensor([[1, 2, 3],
[4, 5, 6]])
print(matrix_pt.ndim) # 2
print(matrix_pt.shape) # torch.Size([2, 3])
Tensors with three or more axes represent complex, multi-dimensional data. A color image is a rank-3 tensor with dimensions for height, width, and color channel. A batch of color images, common in convolutional neural networks, is a rank-4 tensor (batch size x channels x height x width, or batch size x height x width x channels depending on the framework's data format). Video data adds a temporal axis, producing rank-5 tensors.
# A batch of 32 RGB images, each 224x224 pixels (PyTorch NCHW format)
images = torch.randn(32, 3, 224, 224)
print(images.ndim) # 4
print(images.shape) # torch.Size([32, 3, 224, 224])
# A batch of 8 video clips, each 16 frames, 3 channels, 112x112 pixels
videos = torch.randn(8, 16, 3, 112, 112)
print(videos.ndim) # 5
Three related but distinct properties describe the structure of a tensor. They are often confused by beginners.
| Property | Definition | Example for a (3, 4, 5) tensor |
|---|---|---|
| Rank (order / ndim) | Number of axes | 3 |
| Shape | Tuple listing the size of each axis | (3, 4, 5) |
| Size (number of elements) | Total count of scalar entries (product of shape) | 60 |
Two tensors can share the same rank but have completely different shapes. For instance, a (2, 3) matrix and a (100, 100) matrix both have rank 2, but they differ in shape and size.
The major deep learning frameworks all expose tensor rank through similar APIs, though the naming conventions vary slightly.
| Framework | Property / function | Returns |
|---|---|---|
| TensorFlow | tensor.ndim or tf.rank(tensor) | Python int or scalar tensor |
| PyTorch | tensor.ndim or tensor.dim() | Python int |
| NumPy | array.ndim | Python int |
| JAX | array.ndim | Python int |
An important distinction in TensorFlow: .ndim returns a Python integer (useful during eager execution), while tf.rank() returns a tf.Tensor (required inside tf.function graphs and Keras functional models).
In multilinear algebra, the rank of a tensor has a different, more technical meaning. The rank of an order-k tensor T is the smallest integer r such that T can be written as a sum of r rank-1 tensors (outer products of vectors):
T = sum from i=1 to r of (a_i^(1) ⊗ a_i^(2) ⊗ ... ⊗ a_i^(k))
where ⊗ denotes the outer product. This generalizes the concept of matrix rank: for a matrix (order-2 tensor), the decomposition rank equals the familiar linear algebra rank, which can be computed efficiently using Gaussian elimination or singular value decomposition.
Although tensor rank generalizes matrix rank, the two concepts behave very differently in practice.
| Property | Matrix rank | Tensor rank (order 3+) |
|---|---|---|
| Computational complexity | Polynomial time (Gaussian elimination) | NP-hard in general |
| Field dependence | Same rank over reals and complex numbers | Rank can differ between real and complex fields |
| Generic uniqueness of decomposition | Non-unique for rank > 1 (e.g., SVD has rotational freedom) | Often essentially unique (up to permutation and scaling) |
| Best low-rank approximation | Always exists (Eckart-Young theorem) | May not exist; the approximation problem can be ill-posed |
| Border rank vs. rank | Always equal | Can differ for order 3 and higher |
The fact that computing tensor rank is NP-hard was proven by Johan Hastad in 1990. Hastad showed that determining the rank of a three-dimensional tensor over any finite field is NP-complete, and over the rational numbers (and by extension the real and complex numbers) the problem is NP-hard [1].
The border rank of a tensor T is the smallest integer r such that T can be approximated arbitrarily closely by tensors of rank r. Border rank is always less than or equal to rank, and for matrices the two always coincide. However, for tensors of order 3 or higher, border rank can be strictly less than rank. For example, the tensor u ⊗ u ⊗ v + u ⊗ v ⊗ u + v ⊗ u ⊗ u has rank 3 but border rank 2, because it can be approximated to arbitrary precision by rank-2 tensors [2].
Border rank plays a role in algebraic complexity theory, particularly in the study of fast matrix multiplication algorithms.
Tensor decompositions factorize a higher-order tensor into simpler components, extending the idea of matrix factorization (such as SVD or eigendecomposition) to multi-dimensional arrays. The two most widely used decompositions are CP and Tucker.
The CP decomposition expresses a tensor as a sum of K rank-1 tensors:
T ≈ sum from i=1 to K of (lambda_i * a_i ⊗ b_i ⊗ c_i)
where lambda_i are scalar weights and a_i, b_i, c_i are vectors along each mode. When K equals the true rank, the decomposition is exact.
The concept was first introduced by Frank L. Hitchcock in 1927 [3]. It was independently rediscovered in psychometrics by Carroll and Chang (CANDECOMP, 1970) and by Harshman (PARAFAC, 1970), and is now commonly known as CP decomposition [4].
Key properties of CP decomposition:
The Tucker decomposition factorizes a tensor into a smaller core tensor G multiplied by factor matrices along each mode:
T ≈ G x_1 A x_2 B x_3 C
where x_n denotes the mode-n product, G is the core tensor (typically smaller than T), and A, B, C are factor matrices. Tucker decomposition can be seen as a higher-order generalization of principal component analysis (PCA).
CP decomposition is a special case of Tucker decomposition where the core tensor G is super-diagonal (nonzero only when all indices are equal) [4].
Developed by Ivan Oseledets in 2011, the tensor train decomposition represents a high-order tensor as a chain of lower-order tensors [5]:
T(i_1, i_2, ..., i_n) = G_1(i_1) * G_2(i_2) * ... * G_n(i_n)
where each G_k is a three-dimensional tensor (or a matrix for the boundary terms). TT decomposition is particularly effective for very high-dimensional tensors where Tucker decomposition becomes impractical due to the exponential growth of the core tensor.
| Method | Core structure | Parameter scaling | Uniqueness | Typical algorithm |
|---|---|---|---|---|
| CP | No core (sum of rank-1 terms) | Linear in order | Essentially unique | Alternating least squares |
| Tucker | Dense core tensor | Exponential in order | Not unique (rotational freedom) | Higher-order SVD (HOSVD) |
| Tensor Train | Chain of 3rd-order cores | Linear in order | Unique under canonical form | TT-SVD |
In physics and differential geometry, tensors carry additional structure beyond their order. A tensor of type (p, q) has p contravariant (upper) indices and q covariant (lower) indices. The total order of the tensor is p + q.
Contravariant and covariant indices transform differently under changes of coordinate system. Contravariant components transform inversely to the basis vectors, while covariant components transform in the same way as the basis vectors. The metric tensor allows raising and lowering indices, converting between the two types [6].
Tensors of various ranks appear throughout physics. The rank of a tensor determines what kind of physical quantity it can represent.
| Rank | Physical example | Description |
|---|---|---|
| 0 (scalar) | Temperature, mass, electric charge | Single-valued quantities invariant under coordinate transformations |
| 1 (vector) | Force, velocity, electric field | Directed quantities with magnitude and direction |
| 2 (matrix) | Stress tensor, moment of inertia tensor, metric tensor | Quantities relating two vector spaces; describe how a material responds to applied forces |
| 3 | Piezoelectric tensor | Relates stress (rank 2) to electric polarization (rank 1) |
| 4 | Riemann curvature tensor, elasticity tensor | Describes spacetime curvature in general relativity; relates stress and strain in elasticity |
The Cauchy stress tensor is a symmetric rank-2 tensor that completely specifies the state of stress at a point inside a material. The Maxwell stress tensor, also rank 2, describes the interaction between electromagnetic forces and mechanical momentum in electrodynamics [7].
Modern machine learning operates almost entirely on tensors. The rank of a tensor determines how many structural dimensions the data has:
Tensor decompositions are used to compress neural network weight tensors. The weight tensor of a convolutional layer in a CNN is a rank-4 tensor (output_channels x input_channels x kernel_height x kernel_width). Applying CP or Tucker decomposition to this tensor can reduce the number of parameters and speed up inference with minimal loss of accuracy [8]. Research has shown that convolution layers decomposed using Tucker or CP formats can maintain performance even at relatively low ranks.
Tensor networks, originally developed for quantum physics, have been applied to machine learning. Key architectures include:
These tensor network architectures have been integrated with transformers, recurrent neural networks, and other modern architectures [9].
Google's Tensor Processing Unit (TPU) is a custom ASIC designed to accelerate tensor operations, particularly matrix multiplications and convolutions that dominate deep learning workloads. NVIDIA's Tensor Cores, introduced with the Volta GPU architecture in 2017, perform mixed-precision matrix multiply-and-accumulate operations at high throughput [10].
The computational difficulty of problems related to tensor rank stands in sharp contrast to the corresponding matrix problems.
| Problem | Matrices (order 2) | Tensors (order 3+) |
|---|---|---|
| Computing rank | O(n^3) via Gaussian elimination | NP-hard (Hastad, 1990) |
| Best rank-r approximation | Polynomial time (truncated SVD) | NP-hard (Hillar and Lim, 2013) |
| Checking if rank equals 1 | O(n^2) | Polynomial time (higher-order SVD) |
| Uniqueness of rank decomposition | Not unique for rank > 1 | Generically unique under mild conditions |
Hillar and Lim (2013) proved that most tensor problems are NP-hard, including computing the rank, the best rank-1 approximation, and the spectral norm of a tensor of order 3 or higher [11].
Despite these worst-case complexity results, practical algorithms like alternating least squares, gradient-based optimization, and the higher-order SVD work well for many real-world tensor decomposition tasks.
The mathematical study of tensors dates back to the 19th century. Bernhard Riemann and Gregorio Ricci-Curbastro developed the foundations of tensor calculus in differential geometry. Tullio Levi-Civita and Ricci-Curbastro published a systematic treatment of tensor calculus in 1900, and Albert Einstein later used this framework extensively in his general theory of relativity (1915).
The concept of tensor rank as a decomposition property was introduced by Frank L. Hitchcock in 1927, who defined the "rank of a form" as the minimum number of rank-1 terms in its expression [3]. This idea was largely forgotten until the 1970s, when it was rediscovered independently by J.D. Carroll and J.J. Chang (CANDECOMP) and by R.A. Harshman (PARAFAC) in the context of psychometric data analysis.
The modern era of tensor computation was catalyzed by Kolda and Bader's comprehensive 2009 survey "Tensor Decompositions and Applications" in SIAM Review, which unified notation and terminology across disciplines [4].