Downsampling

Introduction

Downsampling is a broad technique used across signal processing, image processing, and machine learning to reduce the size or resolution of data. Depending on the context, it can mean reducing the sampling rate of a signal, shrinking the spatial dimensions of an image or feature map, or removing examples from a majority class to address class imbalance in a dataset. The central motivation behind downsampling is to lower computational costs and memory requirements while preserving as much useful information as possible.

The term appears in several distinct but related settings. In digital signal processing, downsampling (also called decimation) reduces the number of samples per unit of time. In convolutional neural network architectures, downsampling reduces the height and width of feature maps through pooling layers or strided convolutions. In data science and statistical learning, downsampling (also called undersampling) refers to reducing the number of instances in the majority class of a class-imbalanced dataset. Each of these uses shares a common thread: systematically discarding some portion of data in a controlled way to achieve a practical benefit.

Downsampling in signal processing

In signal processing, downsampling means reducing the sampling rate of a discrete-time signal. If a signal is originally sampled at rate f_s, downsampling by a factor of M produces a new signal sampled at rate f_s / M by retaining only every M-th sample and discarding the rest.

A critical concern when downsampling a signal is aliasing. According to the Nyquist-Shannon sampling theorem, a signal must be sampled at a rate at least twice its highest frequency component in order to be reconstructed without distortion. Reducing the sampling rate without first removing high-frequency content can cause those frequencies to "fold" into lower frequency bands, producing artifacts known as aliasing.

To prevent aliasing, the standard approach is to apply a low-pass anti-aliasing filter before the sample-rate reduction step. This filter attenuates frequency components above the new Nyquist frequency (f_s / 2M), ensuring that the remaining samples faithfully represent the signal at the lower rate. The combined process of low-pass filtering followed by sample dropping is formally called decimation.

Term	Definition
Sampling rate	Number of samples taken per second from a continuous signal
Nyquist frequency	Half the sampling rate; the maximum frequency that can be represented without aliasing
Aliasing	Distortion caused when high-frequency components are misrepresented as lower frequencies after subsampling
Anti-aliasing filter	A low-pass filter applied before downsampling to remove frequencies above the new Nyquist limit
Decimation	The full process of low-pass filtering followed by sample-rate reduction

Downsampling in image processing

In image processing and computer vision, downsampling reduces the spatial resolution of an image. A 1024x1024 image downsampled by a factor of 2 becomes a 512x512 image. This is useful for reducing storage and computation, generating image pyramids for multi-scale analysis, and preparing thumbnails or previews.

Several interpolation methods are commonly used for image downsampling:

Nearest-neighbor interpolation selects the value of the closest pixel in the original image. It is fast but can produce blocky, jagged artifacts.
Bilinear interpolation computes a weighted average of the four nearest pixels. It produces smoother results than nearest-neighbor at modest additional cost.
Bicubic interpolation considers a 4x4 neighborhood of pixels (16 total) and fits a cubic polynomial to determine each output pixel. It yields smoother gradients and finer detail retention than bilinear interpolation, though it is more computationally expensive.
Area-based (box filter) averaging computes the mean value over the region of the source image that maps to each output pixel. This method is well-suited for integer downsampling factors and naturally acts as a low-pass filter.

Just as in signal processing, downsampling an image without appropriate filtering can introduce aliasing artifacts such as moire patterns or staircase effects along diagonal edges. Applying a Gaussian blur or box filter before subsampling mitigates these artifacts.

Downsampling in neural networks

Modern deep learning architectures, especially convolutional neural networks (CNNs), use downsampling as a core building block. Reducing the spatial dimensions of feature maps at successive layers serves several purposes: it increases the receptive field of later layers, reduces the number of parameters and floating-point operations, and encourages the network to learn progressively more abstract representations.

Pooling layers

Pooling is the most traditional form of downsampling in CNNs. A pooling layer slides a fixed-size window (typically 2x2 or 3x3) across each feature map with a stride of 2 or more, summarizing each window with a single value.

Pooling type	Operation	Characteristics
Max pooling	Takes the maximum value in each window	Preserves the most activated feature; widely used in classification networks such as VGGNet and ResNet
Average pooling	Computes the mean of values in each window	Produces smoother output; preserves more spatial context
Global average pooling	Averages the entire spatial extent of each feature map into a single value	Used as a replacement for fully connected layers before the output; reduces overfitting

Pooling operations are parameter-free, meaning they introduce no learnable weights. This makes them computationally cheap but also inflexible, since the summarization rule is fixed regardless of the input content.

Strided convolutions

An alternative to pooling is the strided convolution, which performs convolution with a stride greater than 1. Instead of convolving at every spatial position and then pooling, a strided convolution directly produces a lower-resolution feature map in a single step. Because the convolution filter weights are learned during training, strided convolutions can adapt their downsampling behavior to the data.

Springenberg et al. (2014) demonstrated in "Striving for Simplicity: The All Convolutional Net" that max pooling can be replaced by convolutional layers with increased stride without sacrificing accuracy on image recognition benchmarks including CIFAR-10, CIFAR-100, and ImageNet. This finding influenced later architectures that rely primarily or exclusively on strided convolutions for spatial reduction.

Anti-aliased CNNs

Classical CNNs that use max pooling or strided convolutions violate the sampling theorem because they subsample feature maps without first applying a low-pass filter. Richard Zhang showed in "Making Convolutional Networks Shift-Invariant Again" (ICML 2019) that this causes modern CNNs to lack shift invariance, meaning small translations of the input can produce dramatically different outputs.

The proposed fix is simple and borrows directly from signal processing: insert a low-pass (blur) filter before every downsampling operation. In practice, this means decomposing a stride-2 max-pool into two steps: (1) compute the dense max (stride 1), and (2) apply a blur filter followed by stride-2 subsampling. The same principle applies to strided convolutions and average pooling.

Experimental results showed that anti-aliased versions of ResNet, DenseNet, and MobileNet achieved higher ImageNet classification accuracy and significantly improved shift-equivariance of internal feature representations, leading to more robust and stable predictions.

Downsampling in feature pyramid networks

Feature Pyramid Networks (FPNs), introduced by Lin et al. (2017), exploit the natural downsampling that occurs in a CNN backbone to build multi-scale feature representations. As an input image passes through successive convolutional blocks (the "bottom-up pathway"), spatial resolution is halved at each stage while the number of channels and semantic richness increases.

FPN adds a "top-down pathway" that upsamples the deepest, most semantically rich features and merges them with higher-resolution features from earlier stages via lateral connections. This produces a pyramid of feature maps, each combining strong semantics with fine spatial detail. The architecture is foundational in modern object detection (for example, Faster R-CNN with FPN) and instance segmentation frameworks.

The key insight is that downsampling in the backbone is not merely a cost-saving step; it creates a hierarchy of representations at different scales that, when combined appropriately, enable the network to detect objects ranging from small to large.

Downsampling for class imbalance

In machine learning classification tasks, datasets often exhibit class imbalance, where one class (the majority class) has far more examples than another (the minority class). Training a model directly on such data tends to produce a classifier biased toward the majority class. Downsampling (also called undersampling) addresses this by reducing the number of majority class instances so that the class distribution becomes more balanced.

Techniques for undersampling

The following table summarizes widely used undersampling techniques:

Technique	Description	Key property
Random undersampling	Randomly removes instances from the majority class until a desired ratio is reached	Simple and fast; may discard informative examples
Tomek links	Identifies pairs of nearest neighbors from different classes (Tomek links) and removes the majority class member of each pair	Cleans the decision boundary region; often used as a preprocessing step
Edited Nearest Neighbors (ENN)	Removes majority class samples whose nearest neighbors mostly belong to a different class	Smooths the decision boundary; removes noisy or ambiguous examples
Condensed Nearest Neighbors (CNN)	Iteratively builds a consistent subset by adding samples only when they are misclassified by the current subset	Produces a minimal subset that preserves classification accuracy
NearMiss-1	Selects majority class samples with the smallest average distance to their three closest minority class neighbors	Retains majority examples near the decision boundary
NearMiss-2	Selects majority class samples with the smallest average distance to their three farthest minority class neighbors	Selects majority examples that are globally close to the minority class
NearMiss-3	For each minority instance, keeps a set number of closest majority neighbors	Surrounds each minority instance with nearby majority examples
Cluster centroids	Applies clustering (e.g., k-means) to the majority class and replaces clusters with their centroids	Produces synthetic representative points rather than selecting originals

These techniques are available in the imbalanced-learn library for Python, which provides a consistent API compatible with scikit-learn.

Downsampling vs. oversampling

Downsampling and oversampling represent opposite strategies for addressing class imbalance.

Aspect	Downsampling (undersampling)	Oversampling
Approach	Removes majority class instances	Adds minority class instances (duplicates or synthetic)
Dataset size	Decreases	Increases
Training speed	Faster (less data)	Slower (more data)
Risk of information loss	Higher; informative majority samples may be discarded	Lower; all original data is retained
Risk of overfitting	Lower; fewer duplicate patterns	Higher; especially with simple duplication
Best suited for	Large datasets where majority class has redundant samples	Small datasets or severely imbalanced distributions

Research by Amin et al. (2023) comparing undersampling, oversampling, and SMOTE on educational data mining tasks found that oversampling tends to outperform undersampling on extremely imbalanced datasets, while both approaches perform comparably on moderately imbalanced data. In practice, combining undersampling of the majority class with oversampling of the minority class (hybrid methods) often yields the best results.

Trade-offs and considerations

All forms of downsampling involve a trade-off between efficiency and information preservation:

Signal processing. Aggressive downsampling saves bandwidth and storage but can destroy high-frequency detail that may be important for downstream tasks such as speech recognition or music analysis.
Image and feature map downsampling. Reducing spatial resolution lowers computation but sacrifices fine-grained spatial detail, which is critical for tasks like small object detection or pixel-level segmentation.
Class imbalance downsampling. Removing majority class examples speeds up training and can improve minority class recall, but risks discarding patterns that the model needs to learn about the majority class.

Choosing the right amount and method of downsampling depends on the specific task, the nature of the data, and the acceptable level of information loss. In neural network design, architects often balance downsampling with skip connections, feature pyramids, or attention mechanisms to recover lost spatial information at later stages.

Implementation across frameworks

Downsampling operations are supported natively in all major deep learning frameworks:

Framework	Pooling	Strided convolution	Image resize
PyTorch	`torch.nn.MaxPool2d`, `torch.nn.AvgPool2d`	`torch.nn.Conv2d(stride=2)`	`torch.nn.functional.interpolate`
TensorFlow/Keras	`tf.keras.layers.MaxPool2D`, `tf.keras.layers.AveragePooling2D`	`tf.keras.layers.Conv2D(strides=2)`	`tf.image.resize`
JAX/Flax	`flax.linen.max_pool`, `flax.linen.avg_pool`	`flax.linen.Conv(kernel_size, strides=2)`	`jax.image.resize`

For class imbalance undersampling, the Python library imbalanced-learn provides implementations of all the techniques described above through classes such as RandomUnderSampler, TomekLinks, EditedNearestNeighbours, CondensedNearestNeighbours, and NearMiss.

Explain Like I'm 5 (ELI5)

Imagine you have a huge box of crayons with every color you can think of. But your little backpack can only hold 24 crayons. Downsampling is like carefully picking 24 crayons that still give you all the main colors you need to draw a nice picture. You lose some of the rarer shades, but you can still color almost anything. In computers, downsampling works the same way: it takes a big pile of data and makes it smaller so that programs can work with it faster, while keeping the most important parts.

References

Nyquist, H. (1928). "Certain Topics in Telegraph Transmission Theory." *Transactions of the American Institute of Electrical Engineers*, 47(2), 617-644.
Springenberg, J.T., Dosovitskiy, A., Brox, T., & Riedmiller, M. (2014). "Striving for Simplicity: The All Convolutional Net." *arXiv preprint arXiv:1412.6806*.
He, K., Zhang, X., Ren, S., & Sun, J. (2015). "Deep Residual Learning for Image Recognition." *arXiv preprint arXiv:1512.03385*.
Lin, T.-Y., Dollar, P., Girshick, R., He, K., Hariharan, B., & Belongie, S. (2017). "Feature Pyramid Networks for Object Detection." *CVPR 2017*.
Zhang, R. (2019). "Making Convolutional Networks Shift-Invariant Again." *Proceedings of the 36th International Conference on Machine Learning (ICML)*.
Chawla, N.V., Bowyer, K.W., Hall, L.O., & Kegelmeyer, W.P. (2002). "SMOTE: Synthetic Minority Over-sampling Technique." *Journal of Artificial Intelligence Research*, 16, 321-357.
Tomek, I. (1976). "Two Modifications of CNN." *IEEE Transactions on Systems, Man, and Cybernetics*, 6(11), 769-772.
Wilson, D.L. (1972). "Asymptotic Properties of Nearest Neighbor Rules Using Edited Data." *IEEE Transactions on Systems, Man, and Cybernetics*, 2(3), 408-421.
Amin, A., Rahim, F., Ali, I., Khan, C., & Anwar, S. (2023). "A Comparison of Undersampling, Oversampling, and SMOTE Methods for Dealing with Imbalanced Classification in Educational Data Mining." *Information*, 14(1), 54.
Lemaitre, G., Nogueira, F., & Aridas, C.K. (2017). "Imbalanced-learn: A Python Toolbox to Tackle the Curse of Imbalanced Datasets in Machine Learning." *Journal of Machine Learning Research*, 18(17), 1-5.

Introduction

Downsampling in signal processing

Downsampling in image processing

Downsampling in neural networks

Pooling layers

Strided convolutions

Anti-aliased CNNs

Downsampling in feature pyramid networks

Downsampling for class imbalance

Techniques for undersampling

Downsampling vs. oversampling

Trade-offs and considerations

Implementation across frameworks

Explain Like I'm 5 (ELI5)

References

Improve this article

Related Articles

Sparse autoencoder

ARC-AGI 2

Data Augmentation

GELU (Gaussian Error Linear Unit)

LeNet

Context window

Introduction

Downsampling in signal processing

Downsampling in image processing

Downsampling in neural networks

Pooling layers

Strided convolutions

Anti-aliased CNNs

Downsampling in feature pyramid networks

Downsampling for class imbalance

Techniques for undersampling

Downsampling vs. oversampling

Trade-offs and considerations

Implementation across frameworks

Explain Like I'm 5 (ELI5)

References

Related Articles

Sparse autoencoder

ARC-AGI 2

Data Augmentation

GELU (Gaussian Error Linear Unit)

LeNet

Context window