# Downsampling

> Source: https://aiwiki.ai/wiki/downsampling
> Updated: 2026-06-23
> Categories: Data & Datasets, Deep Learning, Machine Learning
> License: CC BY 4.0 (https://creativecommons.org/licenses/by/4.0/)
> From AI Wiki (https://aiwiki.ai), the free encyclopedia of artificial intelligence. Reuse freely with attribution to "AI Wiki (aiwiki.ai)".

*See also: [Machine learning terms](/wiki/machine_learning_terms)*

Downsampling is the process of reducing the number of samples, the spatial resolution, or the number of data instances in a signal, image, or dataset in order to lower computational cost and memory use while preserving as much useful information as possible. The term carries three distinct but related meanings in [machine learning](/wiki/machine_learning): reducing the sampling rate of a signal (decimation), shrinking the height and width of an image or feature map (via [pooling](/wiki/pooling) or strided convolution), and removing examples from the majority class of a [class-imbalanced dataset](/wiki/class-imbalanced_dataset) (undersampling). In every case the operation systematically discards a controlled portion of data to gain a practical benefit.

## What is downsampling?

Downsampling is a broad technique used across signal processing, image processing, and machine learning to reduce the size or resolution of data. Depending on the context, it can mean reducing the sampling rate of a signal, shrinking the spatial dimensions of an image or feature map, or removing examples from a majority class to address class imbalance in a dataset. The central motivation behind downsampling is to lower computational costs and memory requirements while preserving as much useful information as possible.

The term appears in several distinct but related settings. In digital signal processing, downsampling (also called decimation) reduces the number of samples per unit of time. In [convolutional neural network](/wiki/convolutional_neural_network) architectures, downsampling reduces the height and width of feature maps through pooling layers or strided convolutions. In data science and statistical learning, downsampling (also called undersampling) refers to reducing the number of instances in the majority class of a class-imbalanced dataset. Each of these uses shares a common thread: systematically discarding some portion of data in a controlled way to achieve a practical benefit.

## How does downsampling work in signal processing?

In signal processing, downsampling means reducing the sampling rate of a discrete-time signal. If a signal is originally sampled at rate *f_s*, downsampling by a factor of *M* produces a new signal sampled at rate *f_s / M* by retaining only every *M*-th sample and discarding the rest.

A critical concern when downsampling a signal is aliasing. According to the Nyquist-Shannon sampling theorem, a signal must be sampled at a rate at least twice its highest frequency component in order to be reconstructed without distortion.[1] Reducing the sampling rate without first removing high-frequency content can cause those frequencies to "fold" into lower frequency bands, producing artifacts known as aliasing.

To prevent aliasing, the standard approach is to apply a low-pass anti-aliasing filter before the sample-rate reduction step. This filter attenuates frequency components above the new Nyquist frequency (*f_s / 2M*), ensuring that the remaining samples faithfully represent the signal at the lower rate. The combined process of low-pass filtering followed by sample dropping is formally called decimation.

| Term | Definition |
|---|---|
| Sampling rate | Number of samples taken per second from a continuous signal |
| Nyquist frequency | Half the sampling rate; the maximum frequency that can be represented without aliasing |
| Aliasing | Distortion caused when high-frequency components are misrepresented as lower frequencies after [subsampling](/wiki/subsampling) |
| Anti-aliasing filter | A low-pass filter applied before downsampling to remove frequencies above the new Nyquist limit |
| Decimation | The full process of low-pass filtering followed by sample-rate reduction |

## How does downsampling work in image processing?

In image processing and computer vision, downsampling reduces the spatial resolution of an image. A 1024x1024 image downsampled by a factor of 2 becomes a 512x512 image, a 4x reduction in pixel count. This is useful for reducing storage and computation, generating image pyramids for multi-scale analysis, and preparing thumbnails or previews.

Several interpolation methods are commonly used for image downsampling:

- **Nearest-neighbor interpolation** selects the value of the closest pixel in the original image. It is fast but can produce blocky, jagged artifacts.
- **Bilinear interpolation** computes a weighted average of the four nearest pixels. It produces smoother results than nearest-neighbor at modest additional cost.
- **Bicubic interpolation** considers a 4x4 neighborhood of pixels (16 total) and fits a cubic polynomial to determine each output pixel. It yields smoother gradients and finer detail retention than bilinear interpolation, though it is more computationally expensive.
- **Area-based (box filter) averaging** computes the mean value over the region of the source image that maps to each output pixel. This method is well-suited for integer downsampling factors and naturally acts as a low-pass filter.

Just as in signal processing, downsampling an image without appropriate filtering can introduce aliasing artifacts such as moire patterns or staircase effects along diagonal edges. Applying a Gaussian blur or box filter before subsampling mitigates these artifacts.

## Why do neural networks use downsampling?

Modern deep learning architectures, especially convolutional neural networks (CNNs), use downsampling as a core building block. Reducing the spatial dimensions of feature maps at successive layers serves several purposes: it increases the receptive field of later layers, reduces the number of parameters and floating-point operations, and encourages the network to learn progressively more abstract representations.

### Pooling layers

Pooling is the most traditional form of downsampling in CNNs. A pooling layer slides a fixed-size window (typically 2x2 or 3x3) across each feature map with a [stride](/wiki/stride) of 2 or more, summarizing each window with a single value. A 2x2 max-pooling window with stride 2 reduces each spatial dimension by half, cutting the feature map area to one quarter.

| Pooling type | Operation | Characteristics |
|---|---|---|
| Max pooling | Takes the maximum value in each window | Preserves the most activated feature; widely used in classification networks such as VGGNet and ResNet |
| Average pooling | Computes the mean of values in each window | Produces smoother output; preserves more spatial context |
| Global average pooling | Averages the entire spatial extent of each feature map into a single value | Used as a replacement for fully connected layers before the output; reduces overfitting |

Pooling operations are parameter-free, meaning they introduce no learnable weights. This makes them computationally cheap but also inflexible, since the summarization rule is fixed regardless of the input content.

### Strided convolutions

An alternative to pooling is the strided convolution, which performs convolution with a stride greater than 1. Instead of convolving at every spatial position and then pooling, a strided convolution directly produces a lower-resolution feature map in a single step. Because the convolution filter weights are learned during training, strided convolutions can adapt their downsampling behavior to the data.

Springenberg et al. (2014) demonstrated in "Striving for Simplicity: The All Convolutional Net" (later accepted as an ICLR 2015 workshop contribution) that "max-pooling can simply be replaced by a convolutional layer with increased stride without loss in accuracy on several image recognition benchmarks."[2] Their pooling-free All-CNN-C model reached state-of-the-art accuracy on CIFAR-10 at the time, with a test error of 9.08% (about 90.9% accuracy), and the approach generalized across CIFAR-10, CIFAR-100, and ImageNet.[2] This finding influenced later architectures that rely primarily or exclusively on strided convolutions for spatial reduction.

### Anti-aliased CNNs

Classical CNNs that use max pooling or strided convolutions violate the sampling theorem because they subsample feature maps without first applying a low-pass filter. Richard Zhang showed in "Making Convolutional Networks Shift-Invariant Again" (ICML 2019) that "commonly used downsampling methods, such as max-pooling, strided-convolution, and average-pooling, ignore the sampling theorem," which causes modern CNNs to lack shift invariance: small translations of the input can produce dramatically different outputs.[5]

The proposed fix is simple and borrows directly from signal processing: insert a low-pass (blur) filter before every downsampling operation, an operation Zhang packages as a "BlurPool" layer.[5] In practice, this means decomposing a stride-2 max-pool into two steps: (1) compute the dense max (stride 1), and (2) apply a blur filter followed by stride-2 subsampling. The same principle applies to strided convolutions and average pooling.

Experimental results showed that anti-aliased versions of ResNet, DenseNet, and MobileNet achieved higher ImageNet classification accuracy and significantly improved shift-equivariance of internal feature representations, leading to more robust and stable predictions, including greater robustness to corruptions and perturbations.[5]

### Downsampling in feature pyramid networks

Feature Pyramid Networks (FPNs), introduced by Lin et al. (2017), exploit the natural downsampling that occurs in a CNN backbone to build multi-scale feature representations.[4] As an input image passes through successive convolutional blocks (the "bottom-up pathway"), spatial resolution is halved at each stage while the number of channels and semantic richness increases.

FPN adds a "top-down pathway" that upsamples the deepest, most semantically rich features and merges them with higher-resolution features from earlier stages via lateral connections (each a 1x1 convolution followed by elementwise addition). This produces a pyramid of feature maps, each combining strong semantics with fine spatial detail. Used in a basic Faster R-CNN system, FPN achieved state-of-the-art single-model results on the COCO detection benchmark, surpassing the COCO 2016 challenge winners, and the architecture remains foundational in modern object detection and instance segmentation frameworks.[4]

The key insight is that downsampling in the backbone is not merely a cost-saving step; it creates a hierarchy of representations at different scales that, when combined appropriately, enable the network to detect objects ranging from small to large.

## How is downsampling used for class imbalance?

In machine learning classification tasks, datasets often exhibit class imbalance, where one class (the majority class) has far more examples than another (the minority class). Training a model directly on such data tends to produce a classifier biased toward the majority class. Downsampling (also called undersampling) addresses this by reducing the number of majority class instances so that the class distribution becomes more balanced. It is the direct counterpart to [oversampling](/wiki/oversampling), which instead grows the minority class.

### Techniques for undersampling

The following table summarizes widely used undersampling techniques:

| Technique | Description | Key property |
|---|---|---|
| Random undersampling | Randomly removes instances from the majority class until a desired ratio is reached | Simple and fast; may discard informative examples |
| Tomek links | Identifies pairs of nearest neighbors from different classes (Tomek links) and removes the majority class member of each pair | Cleans the decision boundary region; often used as a preprocessing step |
| Edited Nearest Neighbors (ENN) | Removes majority class samples whose nearest neighbors mostly belong to a different class | Smooths the decision boundary; removes noisy or ambiguous examples |
| Condensed Nearest Neighbors (CNN) | Iteratively builds a consistent subset by adding samples only when they are misclassified by the current subset | Produces a minimal subset that preserves classification accuracy |
| NearMiss-1 | Selects majority class samples with the smallest average distance to their three closest minority class neighbors | Retains majority examples near the decision boundary |
| NearMiss-2 | Selects majority class samples with the smallest average distance to their three farthest minority class neighbors | Selects majority examples that are globally close to the minority class |
| NearMiss-3 | For each minority instance, keeps a set number of closest majority neighbors | Surrounds each minority instance with nearby majority examples |
| Cluster centroids | Applies clustering (e.g., k-means) to the majority class and replaces clusters with their centroids | Produces synthetic representative points rather than selecting originals |

These techniques are available in the imbalanced-learn library for Python, an open-source toolbox first released in 2017 that provides a consistent API fully compatible with scikit-learn and is part of the scikit-learn-contrib project.[10] The Tomek links and edited nearest neighbors methods derive from cleaning rules introduced by Tomek (1976) and Wilson (1972) respectively.[7][8]

### Downsampling vs. oversampling

Downsampling and oversampling represent opposite strategies for addressing class imbalance.

| Aspect | Downsampling (undersampling) | Oversampling |
|---|---|---|
| Approach | Removes majority class instances | Adds minority class instances (duplicates or synthetic) |
| Dataset size | Decreases | Increases |
| Training speed | Faster (less data) | Slower (more data) |
| Risk of information loss | Higher; informative majority samples may be discarded | Lower; all original data is retained |
| Risk of overfitting | Lower; fewer duplicate patterns | Higher; especially with simple duplication |
| Best suited for | Large datasets where majority class has redundant samples | Small datasets or severely imbalanced distributions |

A widely cited synthetic oversampling alternative, SMOTE, was introduced by Chawla et al. (2002) and generates new minority-class points by interpolating between existing ones.[6] Research by Amin et al. (2023) comparing undersampling, oversampling, and SMOTE on educational data mining tasks found that oversampling tends to outperform undersampling on extremely imbalanced datasets, while both approaches perform comparably on moderately imbalanced data.[9] In practice, combining undersampling of the majority class with oversampling of the minority class (hybrid methods) often yields the best results.

## What are the trade-offs of downsampling?

All forms of downsampling involve a trade-off between efficiency and information preservation:

- **Signal processing.** Aggressive downsampling saves bandwidth and storage but can destroy high-frequency detail that may be important for downstream tasks such as speech recognition or music analysis.
- **Image and feature map downsampling.** Reducing spatial resolution lowers computation but sacrifices fine-grained spatial detail, which is critical for tasks like small object detection or pixel-level segmentation.
- **Class imbalance downsampling.** Removing majority class examples speeds up training and can improve minority class recall, but risks discarding patterns that the model needs to learn about the majority class.

Choosing the right amount and method of downsampling depends on the specific task, the nature of the data, and the acceptable level of information loss. In neural network design, architects often balance downsampling with skip connections, feature pyramids, or attention mechanisms to recover lost spatial information at later stages.

## How is downsampling implemented across frameworks?

Downsampling operations are supported natively in all major deep learning frameworks:

| Framework | Pooling | Strided convolution | Image resize |
|---|---|---|---|
| PyTorch | `torch.nn.MaxPool2d`, `torch.nn.AvgPool2d` | `torch.nn.Conv2d(stride=2)` | `torch.nn.functional.interpolate` |
| TensorFlow/Keras | `tf.keras.layers.MaxPool2D`, `tf.keras.layers.AveragePooling2D` | `tf.keras.layers.Conv2D(strides=2)` | `tf.image.resize` |
| JAX/Flax | `flax.linen.max_pool`, `flax.linen.avg_pool` | `flax.linen.Conv(kernel_size, strides=2)` | `jax.image.resize` |

For class imbalance undersampling, the Python library imbalanced-learn provides implementations of all the techniques described above through classes such as `RandomUnderSampler`, `TomekLinks`, `EditedNearestNeighbours`, `CondensedNearestNeighbours`, and `NearMiss`.[10]

## Explain Like I'm 5 (ELI5)

Imagine you have a huge box of crayons with every color you can think of. But your little backpack can only hold 24 crayons. Downsampling is like carefully picking 24 crayons that still give you all the main colors you need to draw a nice picture. You lose some of the rarer shades, but you can still color almost anything. In computers, downsampling works the same way: it takes a big pile of data and makes it smaller so that programs can work with it faster, while keeping the most important parts.

## References

1. Nyquist, H. (1928). "Certain Topics in Telegraph Transmission Theory." *Transactions of the American Institute of Electrical Engineers*, 47(2), 617-644.
2. Springenberg, J.T., Dosovitskiy, A., Brox, T., & Riedmiller, M. (2014). "Striving for Simplicity: The All Convolutional Net." *arXiv preprint arXiv:1412.6806* (ICLR 2015 Workshop).
3. He, K., Zhang, X., Ren, S., & Sun, J. (2015). "Deep Residual Learning for Image Recognition." *arXiv preprint arXiv:1512.03385*.
4. Lin, T.-Y., Dollar, P., Girshick, R., He, K., Hariharan, B., & Belongie, S. (2017). "Feature Pyramid Networks for Object Detection." *CVPR 2017*. arXiv:1612.03144.
5. Zhang, R. (2019). "Making Convolutional Networks Shift-Invariant Again." *Proceedings of the 36th International Conference on Machine Learning (ICML)*. arXiv:1904.11486.
6. Chawla, N.V., Bowyer, K.W., Hall, L.O., & Kegelmeyer, W.P. (2002). "SMOTE: Synthetic Minority Over-sampling Technique." *Journal of Artificial Intelligence Research*, 16, 321-357.
7. Tomek, I. (1976). "Two Modifications of CNN." *IEEE Transactions on Systems, Man, and Cybernetics*, 6(11), 769-772.
8. Wilson, D.L. (1972). "Asymptotic Properties of Nearest Neighbor Rules Using Edited Data." *IEEE Transactions on Systems, Man, and Cybernetics*, 2(3), 408-421.
9. Amin, A., Rahim, F., Ali, I., Khan, C., & Anwar, S. (2023). "A Comparison of Undersampling, Oversampling, and SMOTE Methods for Dealing with Imbalanced Classification in Educational Data Mining." *Information*, 14(1), 54.
10. Lemaitre, G., Nogueira, F., & Aridas, C.K. (2017). "Imbalanced-learn: A Python Toolbox to Tackle the Curse of Imbalanced Datasets in Machine Learning." *Journal of Machine Learning Research*, 18(17), 1-5.