See also: Machine learning terms
Downsampling is a broad technique used across signal processing, image processing, and machine learning to reduce the size or resolution of data. Depending on the context, it can mean reducing the sampling rate of a signal, shrinking the spatial dimensions of an image or feature map, or removing examples from a majority class to address class imbalance in a dataset. The central motivation behind downsampling is to lower computational costs and memory requirements while preserving as much useful information as possible.
The term appears in several distinct but related settings. In digital signal processing, downsampling (also called decimation) reduces the number of samples per unit of time. In convolutional neural network architectures, downsampling reduces the height and width of feature maps through pooling layers or strided convolutions. In data science and statistical learning, downsampling (also called undersampling) refers to reducing the number of instances in the majority class of a class-imbalanced dataset. Each of these uses shares a common thread: systematically discarding some portion of data in a controlled way to achieve a practical benefit.
In signal processing, downsampling means reducing the sampling rate of a discrete-time signal. If a signal is originally sampled at rate f_s, downsampling by a factor of M produces a new signal sampled at rate f_s / M by retaining only every M-th sample and discarding the rest.
A critical concern when downsampling a signal is aliasing. According to the Nyquist-Shannon sampling theorem, a signal must be sampled at a rate at least twice its highest frequency component in order to be reconstructed without distortion. Reducing the sampling rate without first removing high-frequency content can cause those frequencies to "fold" into lower frequency bands, producing artifacts known as aliasing.
To prevent aliasing, the standard approach is to apply a low-pass anti-aliasing filter before the sample-rate reduction step. This filter attenuates frequency components above the new Nyquist frequency (f_s / 2M), ensuring that the remaining samples faithfully represent the signal at the lower rate. The combined process of low-pass filtering followed by sample dropping is formally called decimation.
| Term | Definition |
|---|---|
| Sampling rate | Number of samples taken per second from a continuous signal |
| Nyquist frequency | Half the sampling rate; the maximum frequency that can be represented without aliasing |
| Aliasing | Distortion caused when high-frequency components are misrepresented as lower frequencies after subsampling |
| Anti-aliasing filter | A low-pass filter applied before downsampling to remove frequencies above the new Nyquist limit |
| Decimation | The full process of low-pass filtering followed by sample-rate reduction |
In image processing and computer vision, downsampling reduces the spatial resolution of an image. A 1024x1024 image downsampled by a factor of 2 becomes a 512x512 image. This is useful for reducing storage and computation, generating image pyramids for multi-scale analysis, and preparing thumbnails or previews.
Several interpolation methods are commonly used for image downsampling:
Just as in signal processing, downsampling an image without appropriate filtering can introduce aliasing artifacts such as moire patterns or staircase effects along diagonal edges. Applying a Gaussian blur or box filter before subsampling mitigates these artifacts.
Modern deep learning architectures, especially convolutional neural networks (CNNs), use downsampling as a core building block. Reducing the spatial dimensions of feature maps at successive layers serves several purposes: it increases the receptive field of later layers, reduces the number of parameters and floating-point operations, and encourages the network to learn progressively more abstract representations.
Pooling is the most traditional form of downsampling in CNNs. A pooling layer slides a fixed-size window (typically 2x2 or 3x3) across each feature map with a stride of 2 or more, summarizing each window with a single value.
| Pooling type | Operation | Characteristics |
|---|---|---|
| Max pooling | Takes the maximum value in each window | Preserves the most activated feature; widely used in classification networks such as VGGNet and ResNet |
| Average pooling | Computes the mean of values in each window | Produces smoother output; preserves more spatial context |
| Global average pooling | Averages the entire spatial extent of each feature map into a single value | Used as a replacement for fully connected layers before the output; reduces overfitting |
Pooling operations are parameter-free, meaning they introduce no learnable weights. This makes them computationally cheap but also inflexible, since the summarization rule is fixed regardless of the input content.
An alternative to pooling is the strided convolution, which performs convolution with a stride greater than 1. Instead of convolving at every spatial position and then pooling, a strided convolution directly produces a lower-resolution feature map in a single step. Because the convolution filter weights are learned during training, strided convolutions can adapt their downsampling behavior to the data.
Springenberg et al. (2014) demonstrated in "Striving for Simplicity: The All Convolutional Net" that max pooling can be replaced by convolutional layers with increased stride without sacrificing accuracy on image recognition benchmarks including CIFAR-10, CIFAR-100, and ImageNet. This finding influenced later architectures that rely primarily or exclusively on strided convolutions for spatial reduction.
Classical CNNs that use max pooling or strided convolutions violate the sampling theorem because they subsample feature maps without first applying a low-pass filter. Richard Zhang showed in "Making Convolutional Networks Shift-Invariant Again" (ICML 2019) that this causes modern CNNs to lack shift invariance, meaning small translations of the input can produce dramatically different outputs.
The proposed fix is simple and borrows directly from signal processing: insert a low-pass (blur) filter before every downsampling operation. In practice, this means decomposing a stride-2 max-pool into two steps: (1) compute the dense max (stride 1), and (2) apply a blur filter followed by stride-2 subsampling. The same principle applies to strided convolutions and average pooling.
Experimental results showed that anti-aliased versions of ResNet, DenseNet, and MobileNet achieved higher ImageNet classification accuracy and significantly improved shift-equivariance of internal feature representations, leading to more robust and stable predictions.
Feature Pyramid Networks (FPNs), introduced by Lin et al. (2017), exploit the natural downsampling that occurs in a CNN backbone to build multi-scale feature representations. As an input image passes through successive convolutional blocks (the "bottom-up pathway"), spatial resolution is halved at each stage while the number of channels and semantic richness increases.
FPN adds a "top-down pathway" that upsamples the deepest, most semantically rich features and merges them with higher-resolution features from earlier stages via lateral connections. This produces a pyramid of feature maps, each combining strong semantics with fine spatial detail. The architecture is foundational in modern object detection (for example, Faster R-CNN with FPN) and instance segmentation frameworks.
The key insight is that downsampling in the backbone is not merely a cost-saving step; it creates a hierarchy of representations at different scales that, when combined appropriately, enable the network to detect objects ranging from small to large.
In machine learning classification tasks, datasets often exhibit class imbalance, where one class (the majority class) has far more examples than another (the minority class). Training a model directly on such data tends to produce a classifier biased toward the majority class. Downsampling (also called undersampling) addresses this by reducing the number of majority class instances so that the class distribution becomes more balanced.
The following table summarizes widely used undersampling techniques:
| Technique | Description | Key property |
|---|---|---|
| Random undersampling | Randomly removes instances from the majority class until a desired ratio is reached | Simple and fast; may discard informative examples |
| Tomek links | Identifies pairs of nearest neighbors from different classes (Tomek links) and removes the majority class member of each pair | Cleans the decision boundary region; often used as a preprocessing step |
| Edited Nearest Neighbors (ENN) | Removes majority class samples whose nearest neighbors mostly belong to a different class | Smooths the decision boundary; removes noisy or ambiguous examples |
| Condensed Nearest Neighbors (CNN) | Iteratively builds a consistent subset by adding samples only when they are misclassified by the current subset | Produces a minimal subset that preserves classification accuracy |
| NearMiss-1 | Selects majority class samples with the smallest average distance to their three closest minority class neighbors | Retains majority examples near the decision boundary |
| NearMiss-2 | Selects majority class samples with the smallest average distance to their three farthest minority class neighbors | Selects majority examples that are globally close to the minority class |
| NearMiss-3 | For each minority instance, keeps a set number of closest majority neighbors | Surrounds each minority instance with nearby majority examples |
| Cluster centroids | Applies clustering (e.g., k-means) to the majority class and replaces clusters with their centroids | Produces synthetic representative points rather than selecting originals |
These techniques are available in the imbalanced-learn library for Python, which provides a consistent API compatible with scikit-learn.
Downsampling and oversampling represent opposite strategies for addressing class imbalance.
| Aspect | Downsampling (undersampling) | Oversampling |
|---|---|---|
| Approach | Removes majority class instances | Adds minority class instances (duplicates or synthetic) |
| Dataset size | Decreases | Increases |
| Training speed | Faster (less data) | Slower (more data) |
| Risk of information loss | Higher; informative majority samples may be discarded | Lower; all original data is retained |
| Risk of overfitting | Lower; fewer duplicate patterns | Higher; especially with simple duplication |
| Best suited for | Large datasets where majority class has redundant samples | Small datasets or severely imbalanced distributions |
Research by Amin et al. (2023) comparing undersampling, oversampling, and SMOTE on educational data mining tasks found that oversampling tends to outperform undersampling on extremely imbalanced datasets, while both approaches perform comparably on moderately imbalanced data. In practice, combining undersampling of the majority class with oversampling of the minority class (hybrid methods) often yields the best results.
All forms of downsampling involve a trade-off between efficiency and information preservation:
Choosing the right amount and method of downsampling depends on the specific task, the nature of the data, and the acceptable level of information loss. In neural network design, architects often balance downsampling with skip connections, feature pyramids, or attention mechanisms to recover lost spatial information at later stages.
Downsampling operations are supported natively in all major deep learning frameworks:
| Framework | Pooling | Strided convolution | Image resize |
|---|---|---|---|
| PyTorch | torch.nn.MaxPool2d, torch.nn.AvgPool2d | torch.nn.Conv2d(stride=2) | torch.nn.functional.interpolate |
| TensorFlow/Keras | tf.keras.layers.MaxPool2D, tf.keras.layers.AveragePooling2D | tf.keras.layers.Conv2D(strides=2) | tf.image.resize |
| JAX/Flax | flax.linen.max_pool, flax.linen.avg_pool | flax.linen.Conv(kernel_size, strides=2) | jax.image.resize |
For class imbalance undersampling, the Python library imbalanced-learn provides implementations of all the techniques described above through classes such as RandomUnderSampler, TomekLinks, EditedNearestNeighbours, CondensedNearestNeighbours, and NearMiss.
Imagine you have a huge box of crayons with every color you can think of. But your little backpack can only hold 24 crayons. Downsampling is like carefully picking 24 crayons that still give you all the main colors you need to draw a nice picture. You lose some of the rarer shades, but you can still color almost anything. In computers, downsampling works the same way: it takes a big pile of data and makes it smaller so that programs can work with it faster, while keeping the most important parts.