Rank (Tensor)

Introduction

In machine learning and deep learning, the rank of a tensor is the number of dimensions (also called axes) the tensor has. It is the count of indices you need to specify in order to pick out a single scalar element. A scalar is rank 0, a vector is rank 1, a matrix is rank 2, and so on. This sense of rank is sometimes called the order or degree of the tensor, and in practical code it is what NumPy exposes as ndarray.ndim, what PyTorch exposes as Tensor.ndim or Tensor.dim(), and what TensorFlow returns from tf.rank.

This definition is the one used almost universally in neural network frameworks and tutorials. It is, however, distinct from a different and older notion of tensor rank from multilinear algebra (the minimum number of rank-1 terms needed to write the tensor as a sum, also known as the CP rank). The two ideas share a name but mean different things, and confusion between them is one of the more common terminology traps in the field.

What rank means precisely

A tensor is a generalization of scalars, vectors, and matrices. In deep learning libraries, the word tensor is essentially a synonym for a multidimensional array of numerical values that lives in CPU or GPU memory, usually as 32 bit or 16 bit floats, sometimes as integers, booleans, or complex numbers.

The rank of a tensor in this dimensional sense is simply the length of its shape tuple. If a tensor has shape (2, 3, 4), then its rank is 3, because the shape has three entries. Each entry in the shape gives the length of one axis, and the product of those lengths is the total number of scalar elements stored.

A useful way to think about it: rank answers the question "how many indices do I need to fully specify one element?" For a matrix M you write M[i, j], so the rank is 2. For a 3D tensor T you write T[i, j, k], so the rank is 3. For a scalar s you need no indices, so the rank is 0.

Rank, axes, and shape

Four related terms appear together often enough to deserve a side by side comparison.

Term	What it is	Example for shape (32, 64, 128)
Rank	Number of axes (a single integer)	3
Axes	The individual dimensions, indexed 0, 1, 2, ...	axis 0, axis 1, axis 2
Shape	Tuple giving the length of each axis	(32, 64, 128)
Size	Total scalar elements (product of shape)	262,144

The rank is the count of dimensions; the shape is the list of sizes. A tensor of shape (5,) has rank 1, while a tensor of shape (5, 1) has rank 2. The two look similar on paper but they are different objects, and operations that depend on broadcasting or matrix multiplication treat them differently.

Common ranks in machine learning

Most real workloads use tensors with rank between 0 and 5. Higher ranks appear in research but are less common in everyday training code.

Rank	Common name	Typical example	Example shape
0	Scalar	A loss value, learning rate, prediction probability	`()`
1	Vector	A feature vector, embedding, or 1D signal	`(768,)`
2	Matrix	Weight matrix, grayscale image, batch of feature vectors	`(32, 768)`
3	3-tensor	An RGB image, or a batch of token embeddings	`(224, 224, 3)` or `(32, 128, 768)`
4	4-tensor	A batch of RGB images, a convolutional feature map	`(32, 224, 224, 3)`
5	5-tensor	A batch of videos, volumetric medical data	`(8, 30, 224, 224, 3)`
6	6-tensor	Batches of video clips with multiple camera views	`(N, V, T, H, W, C)`

Rank 0 through rank 3

A scalar is still a tensor object in modern frameworks, just one with an empty shape: tf.constant(4) produces a tensor with shape () and rank 0. A vector with n entries has shape (n,), which is how word embeddings in language models, bias terms, and the output of a single softmax classifier are stored. Dense layer weights are rank 2 tensors of shape (in_features, out_features). Grayscale images and tabular datasets are also rank 2. A single RGB image jumps to rank 3 with shape (H, W, 3) (channels last, the TensorFlow default) or (3, H, W) (channels first, the PyTorch default). Sequence models such as Transformers take rank 3 inputs of shape (batch, sequence_length, embedding_dim).

Rank 4 and higher

For image classification, the workhorse input is a rank 4 tensor of shape (N, H, W, C) or (N, C, H, W). A 2D convolutional layer expects this shape, and its weights are also rank 4 tensors of shape (out_channels, in_channels, kernel_h, kernel_w). Video models add a temporal axis to reach rank 5 with shape (N, T, H, W, C). 3D convolutions and volumetric medical imaging models that process MRI or CT volumes also operate on rank 5 tensors. Attention based architectures briefly produce rank 5 or rank 6 intermediates when they reshape into multi head form or split a sequence axis into blocks.

How frameworks expose rank

Deep learning libraries all provide a quick way to query the rank of a tensor, though names vary.

Library	Get rank as integer	Get rank as tensor	Get shape
NumPy	`a.ndim`	not applicable	`a.shape`
PyTorch	`t.ndim`, `t.dim()`, `t.ndimension()`	not applicable	`t.shape`, `t.size()`
TensorFlow	`t.ndim`	`tf.rank(t)`	`t.shape`, `tf.shape(t)`
JAX	`a.ndim`	not applicable	`a.shape`

A short example for a tensor with shape (3, 2, 4, 5):

import numpy as np, torch, tensorflow as tf

np.zeros((3, 2, 4, 5)).ndim      # 4
torch.zeros(3, 2, 4, 5).ndim     # 4 (also .dim())
tf.zeros([3, 2, 4, 5]).ndim      # 4
tf.rank(tf.zeros([3, 2, 4, 5]))  # scalar Tensor with value 4

There is one small TensorFlow distinction worth noting. t.ndim returns a Python integer, while tf.rank(t) returns a scalar Tensor. The former is fine for control flow in eager mode; the latter is what you want inside a tf.function graph when the rank might not be known until runtime.

Distinction from matrix rank

In linear algebra, the rank of a matrix is the dimension of the vector space spanned by its rows (or equivalently its columns), bounded above by the smaller of the number of rows and columns. A 100 by 100 matrix has matrix rank at most 100, and might have rank 7 if only seven rows are linearly independent.

The tensor rank on this page is a completely different quantity. A matrix has tensor rank 2 simply because it is a 2D array. It can simultaneously have matrix rank 7, 99, or 100 depending on its entries. The matrix numpy.zeros((100, 100)) has tensor rank 2 and matrix rank 0. The identity numpy.eye(100) has tensor rank 2 and matrix rank 100. When a linear algebra textbook says "the rank of A," it almost always means linearly independent rows or columns. When a TensorFlow or PyTorch tutorial says "the rank of A," it almost always means ndim.

Distinction from tensor decomposition rank

Multilinear algebra extends matrix rank to tensors in a way that has nothing to do with ndim. The CP rank (canonical polyadic rank, also called tensor rank in the older mathematical literature) of a tensor is the minimum number of rank-1 tensors that sum to it. A rank-1 tensor is one that can be written as an outer product of vectors, for example u \otimes v \otimes w in the 3D case. The CP rank is the smallest number of terms needed.

A related idea is Tucker rank (also called n-rank or multilinear rank), which is a tuple rather than a single integer. The n-th entry of the Tucker rank is the matrix rank of the n-th mode unfolding of the tensor (flattening all axes except the n-th into a long matrix). Tucker decomposition gives per axis control over approximation quality, while CP uses a single scalar.

These are not the same idea as ndim. Computing the CP rank of a general tensor is NP-hard, while ndim is just the length of the shape tuple. The CP rank also depends on the underlying field: a real tensor can have a strictly larger CP rank over the reals than over the complex numbers, which never happens with matrix rank. For matrices, CP rank coincides with ordinary matrix rank, but for tensors of order 3 or higher the CP rank and the border rank can differ, a phenomenon with no matrix analogue.

In practice, CP and Tucker ranks show up in tensor decomposition methods used for model compression, recommendation systems, neural network weight factorization, and signal processing. When a paper says it uses a "low rank tensor approximation," it almost always means low CP or Tucker rank, not a low number of axes.

Common operations that change rank

Many standard tensor operations change the rank by adding or removing axes.

Operation	Effect on rank	Example
`reshape`	Rank can change to any compatible value	`(24,) -> (2, 3, 4)` goes from rank 1 to rank 3
`squeeze`	Removes axes of length 1	`(1, 3, 1, 5) -> (3, 5)` goes from rank 4 to rank 2
`unsqueeze` or `expand_dims`	Adds an axis of length 1	`(3, 5) -> (1, 3, 5)` goes from rank 2 to rank 3
`flatten`	Collapses several axes into one	`(2, 3, 4) -> (24,)` goes from rank 3 to rank 1
Reduction like `sum(axis=k)`	Removes one axis	`(2, 3, 4).sum(axis=1) -> (2, 4)`
Stacking	Adds one axis	`stack([(3,), (3,), (3,)]) -> (3, 3)`

Adding a missing batch dimension with unsqueeze(0) or tf.expand_dims(x, axis=0) is so common that it has become a reflex for many practitioners.

A note on terminology

Several words are used for what this article calls rank, and all of them appear in active code and papers: rank (common in TensorFlow), order (more common in physics and older mathematics literature), degree (occasionally seen in tensor calculus), number of dimensions or ndim (common in NumPy and PyTorch), and number of axes (used in the official TensorFlow tensor guide). When a tensor has shape (2, 3, 4), someone might describe it as "rank 3," "three dimensional," "a 3-tensor," "order 3," or simply as having "three axes." These are all the same statement. To avoid confusion with matrix rank, some authors prefer the word order when they want to be unambiguous, especially in papers that also discuss CP or Tucker rank.

Practical tips for debugging rank

A mismatch between expected and actual rank is one of the most common causes of beginner errors in deep learning. A few habits help. Print the rank and shape of every input tensor when a layer raises a shape mismatch error, since most framework error messages mention the expected rank in their text. Add explicit assert tensor.ndim == k checks in custom training loops, especially around the boundary between dataloader output and model input. Consider einops style notation such as rearrange(x, 'b h w c -> b c h w') for transposes and reshapes, since the pattern strings document the rank and meaning of each axis in code. Remember that (N, 1) and (N,) look the same when printed but have different ranks, and operations like binary cross entropy and softmax are sensitive to this difference.

Explain like I'm 5 (ELI5)

Imagine boxes you need to organize. One box on its own is like a single number. A row of boxes is like a list. Stack rows into a wall and you get a table. Stack walls together and you have a cube, and you need three pieces of information (row, column, and layer) to point at any one box inside it. The rank of a tensor is just the number of directions you have to follow to find a particular box. In machine learning, tensors help organize huge amounts of information in a way that computers can process quickly during training and inference.

References

Introduction to Tensors, TensorFlow Core Guide
tf.rank, TensorFlow API Documentation
numpy.ndarray.ndim, NumPy Documentation
Tensor rank decomposition, Wikipedia
TensorFlow Tensor Ranks, Shapes, and Types (r0.7 docs)
Tensor Rank VS Matrix Rank, Lei Mao's Log Book
Rank, Axes, and Shape Explained, deeplizard
Goodfellow, Bengio, and Courville. *Deep Learning*. MIT Press, 2016. Online edition
Kolda, T. G., and Bader, B. W. (2009). "Tensor Decompositions and Applications." *SIAM Review* 51(3): 455-500.

Introduction

What rank means precisely

Rank, axes, and shape

Common ranks in machine learning

Rank 0 through rank 3

Rank 4 and higher

How frameworks expose rank

Distinction from matrix rank

Distinction from tensor decomposition rank

Common operations that change rank

A note on terminology

Practical tips for debugging rank

Explain like I'm 5 (ELI5)

References

Improve this article

Related Articles

Machine learning terms/Natural Language Processing

Machine learning terms/Computer Vision

Machine learning terms/Sequence Models

Split

Static

Agglomerative clustering

Introduction

What rank means precisely

Rank, axes, and shape

Common ranks in machine learning

Rank 0 through rank 3

Rank 4 and higher

How frameworks expose rank

Distinction from matrix rank

Distinction from tensor decomposition rank

Common operations that change rank

A note on terminology

Practical tips for debugging rank

Explain like I'm 5 (ELI5)

References

Related Articles

Machine learning terms/Natural Language Processing

Machine learning terms/Computer Vision

Machine learning terms/Sequence Models

Split

Static

Agglomerative clustering