Tensor Rank

The rank of a tensor, also referred to as its order or degree, is the number of dimensions (axes or indices) needed to describe the tensor. A scalar has rank 0, a vector has rank 1, a matrix has rank 2, and arrays with three or more axes are higher-rank tensors. This concept is fundamental to linear algebra, deep learning, and scientific computing, where data is routinely organized and manipulated as multi-dimensional arrays.

It is worth noting that the term "tensor rank" carries two distinct meanings depending on context. In machine learning frameworks like TensorFlow and PyTorch, "rank" almost always refers to the number of axes (the order). In multilinear algebra and applied mathematics, "rank" can instead refer to the minimum number of rank-1 tensors needed to express a given tensor as their sum, analogous to the concept of matrix factorization rank. This article covers both meanings.

Explain like I'm 5 (ELI5)

Imagine you have a box of colored blocks.

If someone hands you one block and says "this block weighs 3 ounces," that single number is like a scalar (rank 0). You only need one number to describe it.
If you line up blocks in a row and write down each block's weight, you get a list of numbers, like [3, 5, 2, 7]. That list is a vector (rank 1). You need one number (position in the row) to find any block's weight.
If you arrange blocks in a grid (rows and columns), you need two numbers (which row, which column) to find a specific block. That grid is a matrix (rank 2).
If you stack several grids on top of each other, you now need three numbers (which layer, which row, which column) to find a block. That is a rank-3 tensor.

The rank of a tensor is simply how many numbers you need to locate a single entry inside it.

Tensor rank as order (number of axes)

Definition

In computational contexts, the rank of a tensor equals the number of indices required to uniquely address each element. This is also called the tensor's order, degree, or number of dimensions (ndim). Different communities have adopted different preferred terms:

Community	Preferred term	Notes
Physics / differential geometry	Order or rank	"Rank" also used for type (p,q) counting
Pure mathematics	Order or degree	"Rank" reserved for decomposition rank
Machine learning / computer science	Rank or ndim	Follows TensorFlow and NumPy conventions
Signal processing	Order or way	A "3-way array" is order 3

Common tensor ranks

The following table summarizes tensors of various ranks, their mathematical names, shapes, and common uses in machine learning and science.

Rank (order)	Mathematical name	Example shape	ML / science example
0	Scalar	()	Learning rate, loss value, temperature reading
1	Vector	(n,)	Word embedding, bias vector, time series
2	Matrix	(m, n)	Weight matrix in a neural network, grayscale image, adjacency matrix
3	3rd-order tensor	(m, n, p)	RGB color image (height x width x 3), word embeddings for a sentence
4	4th-order tensor	(b, c, h, w)	Mini-batch of images in a CNN, video (frames x height x width x channels)
5	5th-order tensor	(b, t, c, h, w)	Batch of video clips (batch x time x channels x height x width)
n	nth-order tensor	(d1, d2, ..., dn)	General multi-dimensional data, higher-order statistics

Rank-0: scalars

A scalar is a single numerical value with no axes. In machine learning, scalars represent quantities like a model's loss value, the learning rate, or a single pixel intensity. In physics, temperature at a single point or electric charge are scalar quantities.

import tensorflow as tf
import torch

# TensorFlow
scalar_tf = tf.constant(3.14)
print(scalar_tf.ndim)  # 0

# PyTorch
scalar_pt = torch.tensor(3.14)
print(scalar_pt.ndim)  # 0

Rank-1: vectors

A vector is an ordered list of numbers, requiring one index to access each element. Vectors are used to represent feature vectors in datasets, weight biases in neural networks, and one-dimensional signals.

# TensorFlow
vector_tf = tf.constant([1.0, 2.0, 3.0, 4.0])
print(vector_tf.ndim)   # 1
print(vector_tf.shape)  # (4,)

# PyTorch
vector_pt = torch.tensor([1.0, 2.0, 3.0, 4.0])
print(vector_pt.ndim)   # 1
print(vector_pt.shape)  # torch.Size(<sup><a href="#cite_note-4" class="cite-ref">[4]</a></sup>)

Rank-2: matrices

A matrix is a two-dimensional rectangular array of numbers organized into rows and columns. In deep learning, weight matrices connect layers of a neural network. A grayscale image is stored as a rank-2 tensor where each element represents a pixel intensity.

# TensorFlow
matrix_tf = tf.constant([[1, 2, 3],
                         [4, 5, 6]])
print(matrix_tf.ndim)   # 2
print(matrix_tf.shape)  # (2, 3)

# PyTorch
matrix_pt = torch.tensor([[1, 2, 3],
                          [4, 5, 6]])
print(matrix_pt.ndim)   # 2
print(matrix_pt.shape)  # torch.Size([2, 3])

Rank-3 and higher

Tensors with three or more axes represent complex, multi-dimensional data. A color image is a rank-3 tensor with dimensions for height, width, and color channel. A batch of color images, common in convolutional neural networks, is a rank-4 tensor (batch size x channels x height x width, or batch size x height x width x channels depending on the framework's data format). Video data adds a temporal axis, producing rank-5 tensors.

# A batch of 32 RGB images, each 224x224 pixels (PyTorch NCHW format)
images = torch.randn(32, 3, 224, 224)
print(images.ndim)   # 4
print(images.shape)  # torch.Size([32, 3, 224, 224])

# A batch of 8 video clips, each 16 frames, 3 channels, 112x112 pixels
videos = torch.randn(8, 16, 3, 112, 112)
print(videos.ndim)   # 5

Tensor rank vs. shape vs. size

Three related but distinct properties describe the structure of a tensor. They are often confused by beginners.

Property	Definition	Example for a (3, 4, 5) tensor
Rank (order / ndim)	Number of axes	3
Shape	Tuple listing the size of each axis	(3, 4, 5)
Size (number of elements)	Total count of scalar entries (product of shape)	60

Two tensors can share the same rank but have completely different shapes. For instance, a (2, 3) matrix and a (100, 100) matrix both have rank 2, but they differ in shape and size.

Rank in ML frameworks

The major deep learning frameworks all expose tensor rank through similar APIs, though the naming conventions vary slightly.

Framework	Property / function	Returns
TensorFlow	`tensor.ndim` or `tf.rank(tensor)`	Python int or scalar tensor
PyTorch	`tensor.ndim` or `tensor.dim()`	Python int
NumPy	`array.ndim`	Python int
JAX	`array.ndim`	Python int

An important distinction in TensorFlow: .ndim returns a Python integer (useful during eager execution), while tf.rank() returns a tf.Tensor (required inside tf.function graphs and Keras functional models).

Tensor rank as decomposition rank

Definition

In multilinear algebra, the rank of a tensor has a different, more technical meaning. The rank of an order-k tensor T is the smallest integer r such that T can be written as a sum of r rank-1 tensors (outer products of vectors):

T = sum from i=1 to r of (a_i^(1) ⊗ a_i^(2) ⊗ ... ⊗ a_i^(k))

where ⊗ denotes the outer product. This generalizes the concept of matrix rank: for a matrix (order-2 tensor), the decomposition rank equals the familiar linear algebra rank, which can be computed efficiently using Gaussian elimination or singular value decomposition.

Tensor rank vs. matrix rank

Although tensor rank generalizes matrix rank, the two concepts behave very differently in practice.

Property	Matrix rank	Tensor rank (order 3+)
Computational complexity	Polynomial time (Gaussian elimination)	NP-hard in general
Field dependence	Same rank over reals and complex numbers	Rank can differ between real and complex fields
Generic uniqueness of decomposition	Non-unique for rank > 1 (e.g., SVD has rotational freedom)	Often essentially unique (up to permutation and scaling)
Best low-rank approximation	Always exists (Eckart-Young theorem)	May not exist; the approximation problem can be ill-posed
Border rank vs. rank	Always equal	Can differ for order 3 and higher

The fact that computing tensor rank is NP-hard was proven by Johan Hastad in 1990. Hastad showed that determining the rank of a three-dimensional tensor over any finite field is NP-complete, and over the rational numbers (and by extension the real and complex numbers) the problem is NP-hard ^[1].

Border rank

The border rank of a tensor T is the smallest integer r such that T can be approximated arbitrarily closely by tensors of rank r. Border rank is always less than or equal to rank, and for matrices the two always coincide. However, for tensors of order 3 or higher, border rank can be strictly less than rank. For example, the tensor u ⊗ u ⊗ v + u ⊗ v ⊗ u + v ⊗ u ⊗ u has rank 3 but border rank 2, because it can be approximated to arbitrary precision by rank-2 tensors ^[2].

Border rank plays a role in algebraic complexity theory, particularly in the study of fast matrix multiplication algorithms.

Tensor decomposition methods

Tensor decompositions factorize a higher-order tensor into simpler components, extending the idea of matrix factorization (such as SVD or eigendecomposition) to multi-dimensional arrays. The two most widely used decompositions are CP and Tucker.

CP (CANDECOMP/PARAFAC) decomposition

The CP decomposition expresses a tensor as a sum of K rank-1 tensors:

T ≈ sum from i=1 to K of (lambda_i * a_i ⊗ b_i ⊗ c_i)

where lambda_i are scalar weights and a_i, b_i, c_i are vectors along each mode. When K equals the true rank, the decomposition is exact.

The concept was first introduced by Frank L. Hitchcock in 1927 ^[3]. It was independently rediscovered in psychometrics by Carroll and Chang (CANDECOMP, 1970) and by Harshman (PARAFAC, 1970), and is now commonly known as CP decomposition ^[4].

Key properties of CP decomposition:

Under mild conditions, CP decomposition is essentially unique (up to permutation and scaling of components), unlike matrix SVD.
The most common algorithm for computing CP is alternating least squares (ALS), which iteratively optimizes one factor matrix at a time while holding the others fixed.
Applications include chemometrics (fluorescence spectroscopy), psychometrics, topic modeling, and signal separation.

Tucker decomposition

The Tucker decomposition factorizes a tensor into a smaller core tensor G multiplied by factor matrices along each mode:

T ≈ G x_1 A x_2 B x_3 C

where x_n denotes the mode-n product, G is the core tensor (typically smaller than T), and A, B, C are factor matrices. Tucker decomposition can be seen as a higher-order generalization of principal component analysis (PCA).

CP decomposition is a special case of Tucker decomposition where the core tensor G is super-diagonal (nonzero only when all indices are equal) ^[4].

Tensor train (TT) decomposition

Developed by Ivan Oseledets in 2011, the tensor train decomposition represents a high-order tensor as a chain of lower-order tensors ^[5]:

T(i_1, i_2, ..., i_n) = G_1(i_1) * G_2(i_2) * ... * G_n(i_n)

where each G_k is a three-dimensional tensor (or a matrix for the boundary terms). TT decomposition is particularly effective for very high-dimensional tensors where Tucker decomposition becomes impractical due to the exponential growth of the core tensor.

Comparison of decomposition methods

Method	Core structure	Parameter scaling	Uniqueness	Typical algorithm
CP	No core (sum of rank-1 terms)	Linear in order	Essentially unique	Alternating least squares
Tucker	Dense core tensor	Exponential in order	Not unique (rotational freedom)	Higher-order SVD (HOSVD)
Tensor Train	Chain of 3rd-order cores	Linear in order	Unique under canonical form	TT-SVD

Type (p, q) in differential geometry

In physics and differential geometry, tensors carry additional structure beyond their order. A tensor of type (p, q) has p contravariant (upper) indices and q covariant (lower) indices. The total order of the tensor is p + q.

A contravariant vector (such as a velocity vector) is type (1, 0).
A covariant vector (covector or 1-form, such as a gradient) is type (0, 1).
The metric tensor in general relativity is type (0, 2).
The Riemann curvature tensor is type (1, 3), making it a 4th-order tensor.

Contravariant and covariant indices transform differently under changes of coordinate system. Contravariant components transform inversely to the basis vectors, while covariant components transform in the same way as the basis vectors. The metric tensor allows raising and lowering indices, converting between the two types ^[6].

Tensors in physics and engineering

Tensors of various ranks appear throughout physics. The rank of a tensor determines what kind of physical quantity it can represent.

Rank	Physical example	Description
0 (scalar)	Temperature, mass, electric charge	Single-valued quantities invariant under coordinate transformations
1 (vector)	Force, velocity, electric field	Directed quantities with magnitude and direction
2 (matrix)	Stress tensor, moment of inertia tensor, metric tensor	Quantities relating two vector spaces; describe how a material responds to applied forces
3	Piezoelectric tensor	Relates stress (rank 2) to electric polarization (rank 1)
4	Riemann curvature tensor, elasticity tensor	Describes spacetime curvature in general relativity; relates stress and strain in elasticity

The Cauchy stress tensor is a symmetric rank-2 tensor that completely specifies the state of stress at a point inside a material. The Maxwell stress tensor, also rank 2, describes the interaction between electromagnetic forces and mechanical momentum in electrodynamics ^[7].

Applications in machine learning

Data representation

Modern machine learning operates almost entirely on tensors. The rank of a tensor determines how many structural dimensions the data has:

Tabular data: a rank-2 tensor (samples x features).
Text sequences: a rank-2 tensor (sequence_length x embedding_dim) for a single sample, or rank 3 when batched.
Images: a rank-3 tensor (height x width x channels) for a single image, or rank 4 when batched.
Video: rank-4 for a single clip (frames x height x width x channels), or rank 5 when batched.
Point clouds: rank-2 (points x 3) for a single cloud, rank 3 when batched.

Neural network compression

Tensor decompositions are used to compress neural network weight tensors. The weight tensor of a convolutional layer in a CNN is a rank-4 tensor (output_channels x input_channels x kernel_height x kernel_width). Applying CP or Tucker decomposition to this tensor can reduce the number of parameters and speed up inference with minimal loss of accuracy ^[8]. Research has shown that convolution layers decomposed using Tucker or CP formats can maintain performance even at relatively low ranks.

Tensor networks in deep learning

Tensor networks, originally developed for quantum physics, have been applied to machine learning. Key architectures include:

Matrix Product States (MPS): suitable for one-dimensional sequential data like time series and text.
Projected Entangled Pair States (PEPS): designed for two-dimensional data patterns.
Multi-scale Entanglement Renormalization Ansatz (MERA): captures hierarchical, multi-scale structure in data.

These tensor network architectures have been integrated with transformers, recurrent neural networks, and other modern architectures ^[9].

Tensor Processing Units

Google's Tensor Processing Unit (TPU) is a custom ASIC designed to accelerate tensor operations, particularly matrix multiplications and convolutions that dominate deep learning workloads. NVIDIA's Tensor Cores, introduced with the Volta GPU architecture in 2017, perform mixed-precision matrix multiply-and-accumulate operations at high throughput ^[10].

Computational complexity

The computational difficulty of problems related to tensor rank stands in sharp contrast to the corresponding matrix problems.

Problem	Matrices (order 2)	Tensors (order 3+)
Computing rank	O(n^3) via Gaussian elimination	NP-hard (Hastad, 1990)
Best rank-r approximation	Polynomial time (truncated SVD)	NP-hard (Hillar and Lim, 2013)
Checking if rank equals 1	O(n^2)	Polynomial time (higher-order SVD)
Uniqueness of rank decomposition	Not unique for rank > 1	Generically unique under mild conditions

Hillar and Lim (2013) proved that most tensor problems are NP-hard, including computing the rank, the best rank-1 approximation, and the spectral norm of a tensor of order 3 or higher ^[11].

Despite these worst-case complexity results, practical algorithms like alternating least squares, gradient-based optimization, and the higher-order SVD work well for many real-world tensor decomposition tasks.

Historical notes

The mathematical study of tensors dates back to the 19th century. Bernhard Riemann and Gregorio Ricci-Curbastro developed the foundations of tensor calculus in differential geometry. Tullio Levi-Civita and Ricci-Curbastro published a systematic treatment of tensor calculus in 1900, and Albert Einstein later used this framework extensively in his general theory of relativity (1915).

The concept of tensor rank as a decomposition property was introduced by Frank L. Hitchcock in 1927, who defined the "rank of a form" as the minimum number of rank-1 terms in its expression ^[3]. This idea was largely forgotten until the 1970s, when it was rediscovered independently by J.D. Carroll and J.J. Chang (CANDECOMP) and by R.A. Harshman (PARAFAC) in the context of psychometric data analysis.

The modern era of tensor computation was catalyzed by Kolda and Bader's comprehensive 2009 survey "Tensor Decompositions and Applications" in SIAM Review, which unified notation and terminology across disciplines ^[4].

References

J. Hastad, "Tensor Rank Is NP-Complete," *Journal of Algorithms*, vol. 11, no. 4, pp. 644-654, 1990.
V. De Silva and L.-H. Lim, "Tensor Rank and the Ill-Posedness of the Best Low-Rank Approximation Problem," *SIAM Journal on Matrix Analysis and Applications*, vol. 30, no. 3, pp. 1084-1127, 2008.
F. L. Hitchcock, "The Expression of a Tensor or a Polyadic as a Sum of Products," *Journal of Mathematics and Physics*, vol. 6, pp. 164-189, 1927.
T. G. Kolda and B. W. Bader, "Tensor Decompositions and Applications," *SIAM Review*, vol. 51, no. 3, pp. 455-500, 2009.
I. V. Oseledets, "Tensor-Train Decomposition," *SIAM Journal on Scientific Computing*, vol. 33, no. 5, pp. 2295-2317, 2011.
T. Levi-Civita and G. Ricci-Curbastro, "Methodes de calcul differentiel absolu et leurs applications," *Mathematische Annalen*, vol. 54, pp. 125-201, 1900.
J. D. Jackson, *Classical Electrodynamics*, 3rd ed., Wiley, 1998.
J. Gil, "Accelerating Deep Neural Networks with Tensor Decompositions," jacobgil.github.io, 2017.
H. Rieser et al., "Tensor Networks for Quantum Machine Learning," *Proceedings of the Royal Society A*, vol. 479, 2023.
NVIDIA, "NVIDIA Tensor Core GPU Architecture," NVIDIA Whitepaper, 2017.
C. J. Hillar and L.-H. Lim, "Most Tensor Problems Are NP-Hard," *Journal of the ACM*, vol. 60, no. 6, article 45, 2013.
"Introduction to Tensors," TensorFlow documentation, tensorflow.org.
R. Kienzler, "Confused about Tensors, Dimensions, Ranks, Orders, Matrices and Vectors?" Medium / CODAIT, 2018.
L. Mao, "Tensor Rank VS Matrix Rank," Lei Mao's Log Book, leimao.github.io.

Explain like I'm 5 (ELI5)

Tensor rank as order (number of axes)

Definition

Common tensor ranks

Rank-0: scalars

Rank-1: vectors

Rank-2: matrices

Rank-3 and higher

Tensor rank vs. shape vs. size

Rank in ML frameworks

Tensor rank as decomposition rank

Definition

Tensor rank vs. matrix rank

Border rank

Tensor decomposition methods

CP (CANDECOMP/PARAFAC) decomposition

Tucker decomposition

Tensor train (TT) decomposition

Comparison of decomposition methods

Type (p, q) in differential geometry

Tensors in physics and engineering

Applications in machine learning

Data representation

Neural network compression

Tensor networks in deep learning

Tensor Processing Units

Computational complexity

Historical notes

See also

References

Improve this article

Related Articles

ARC-AGI 2

Hyperplane

Scalar

Singular value decomposition

Sparse Vector

Matrix factorization

Explain like I'm 5 (ELI5)

Tensor rank as order (number of axes)

Definition

Common tensor ranks

Rank-0: scalars

Rank-1: vectors

Rank-2: matrices

Rank-3 and higher

Tensor rank vs. shape vs. size

Rank in ML frameworks

Tensor rank as decomposition rank

Definition

Tensor rank vs. matrix rank

Border rank

Tensor decomposition methods

CP (CANDECOMP/PARAFAC) decomposition

Tucker decomposition

Tensor train (TT) decomposition

Comparison of decomposition methods

Type (p, q) in differential geometry

Tensors in physics and engineering

Applications in machine learning

Data representation

Neural network compression

Tensor networks in deep learning

Tensor Processing Units

Computational complexity

Historical notes

See also

References

Related Articles

ARC-AGI 2

Hyperplane

Scalar

Singular value decomposition

Sparse Vector

Matrix factorization