See also: Machine learning terms, pandas, scikit-learn
NumPy (short for Numerical Python) is the foundational open-source library for numerical and scientific computing in Python. It provides a powerful n-dimensional array object called ndarray, a large collection of mathematical functions, tools for linear algebra, Fourier transforms, and random number generation, all built on highly optimized C and Fortran code. NumPy serves as the backbone of the scientific Python ecosystem: nearly every major library for machine learning, data analysis, and scientific research depends on it, including pandas, scikit-learn, SciPy, Matplotlib, TensorFlow, and PyTorch.
NumPy is released under a permissive BSD license and is fiscally sponsored by NumFOCUS. Its source code is maintained on GitHub, where it has attracted hundreds of contributors from around the world. As of 2026, the latest stable release is version 2.4.1.
NumPy's origins trace back to 1995, when Jim Hugunin created a library called Numeric (also known as Numerical Python). Numeric grew out of the matrix-sig special interest group, which was founded that same year with the goal of defining an array computing package for Python. Among the members of matrix-sig was Guido van Rossum, the creator and maintainer of Python. Numeric provided Python with its first high-performance multidimensional array object, and it attracted contributions from several early scientific Python pioneers including David Ascher, Konrad Hinsen, and Travis Oliphant.
In the early 2000s, a competing library called Numarray emerged as a more flexible replacement for Numeric. Numarray offered faster operations on large arrays and better support for memory-mapped files, but it was significantly slower than Numeric for small arrays. For several years, the Python scientific computing community was split between the two packages, creating compatibility headaches for library authors and users alike.
In 2005, Travis Oliphant set out to unify the community around a single array package. He merged the best features of Numarray into the Numeric codebase, applied extensive modifications and improvements, and released the result as NumPy 1.0 in October 2006. This unification was a turning point for scientific Python: it gave the community a single, authoritative array library that everyone could build upon. Oliphant also authored the definitive Guide to NumPy in 2006, which served as the primary reference for the library for many years.
Since its initial release, NumPy has evolved steadily. Python 3 support arrived with version 1.5.0 in 2011. The library has seen continuous performance improvements, API refinements, and new features across dozens of releases. In June 2024, the project reached a major milestone with the release of NumPy 2.0, its first major version bump since the original 1.0 release in 2006.
At the heart of NumPy is the ndarray (n-dimensional array), a fixed-size container of items that all share the same data type. Unlike Python lists, which store pointers to individual objects scattered across memory, an ndarray stores its elements in a single contiguous block of memory. This design has several critical advantages.
| Feature | Python List | NumPy ndarray |
|---|---|---|
| Data types | Heterogeneous (mixed types) | Homogeneous (single type) |
| Memory layout | Array of pointers to objects | Contiguous block of typed values |
| Memory efficiency | High overhead per element | Low overhead, 3 to 10x less memory |
| Element access | Pointer dereference per element | Direct offset calculation |
| Vectorized math | Not supported natively | Built-in, optimized in C |
| Dimensionality | 1-D only (nested lists for more) | True n-dimensional support |
| Broadcasting | Not supported | Automatic shape alignment |
Internally, an ndarray is described by its shape (a tuple specifying the size of each dimension), its dtype (the data type of each element, such as float64 or int32), and its strides (the number of bytes to skip in memory to move along each dimension). The stride-based architecture allows NumPy to create views of arrays, slicing and reshaping data without copying the underlying memory. For example, transposing a matrix in NumPy does not move any data; it simply rearranges the strides.
NumPy supports a wide range of data types, including various sizes of integers and floating-point numbers, complex numbers, booleans, strings, and user-defined structured types. NumPy 2.0 introduced StringDType, a new variable-length UTF-8 string type that replaces the older fixed-width and object-based string representations.
Vectorization is one of the key features that makes NumPy dramatically faster than plain Python for numerical work. Instead of writing explicit Python loops to process each element of an array, users express operations on entire arrays at once. NumPy then delegates the actual computation to pre-compiled C code that processes elements in tight loops, taking advantage of CPU-level optimizations such as SIMD (Single Instruction, Multiple Data) instructions.
For example, squaring every element in an array of one million numbers takes a single NumPy expression (arr ** 2) and runs roughly 100 to 200 times faster than an equivalent Python for-loop. This speedup comes from eliminating Python's per-element overhead: type checking, reference counting, and interpreter dispatch are all bypassed when NumPy hands the work off to compiled code.
NumPy implements vectorized operations through universal functions (ufuncs), which are functions that operate element-by-element on one or more arrays. NumPy includes over 60 built-in ufuncs covering arithmetic, trigonometry, exponentiation, logarithms, comparisons, and bitwise operations. Ufuncs also support advanced features such as reduction (for example, summing all elements), accumulation (running totals), and outer products.
Broadcasting is a mechanism that allows NumPy to perform arithmetic on arrays of different shapes without explicitly copying data. When two arrays with different shapes are combined in an operation, NumPy automatically "broadcasts" the smaller array across the larger one so that they have compatible shapes.
Broadcasting follows three rules:
Internally, broadcasting is implemented using zero-valued strides: no actual data is copied, and the "stretched" dimension simply reads the same memory location repeatedly. This makes broadcasting both memory-efficient and fast.
A common example is adding a 1-D bias vector to every row of a 2-D matrix, an operation that appears frequently in neural network implementations. Without broadcasting, this would require either an explicit loop or manual replication of the bias vector.
NumPy's numpy.linalg module provides a comprehensive set of linear algebra routines. Under the hood, these routines call into highly optimized BLAS (Basic Linear Algebra Subprograms) and LAPACK (Linear Algebra Package) libraries, delivering near-peak hardware performance for matrix computations.
| Operation | Function | Description |
|---|---|---|
| Matrix multiplication | np.matmul(), @ operator | Standard matrix product |
| Dot product | np.dot() | Scalar dot product or matrix multiplication |
| Determinant | np.linalg.det() | Compute matrix determinant |
| Inverse | np.linalg.inv() | Compute matrix inverse |
| Eigenvalues | np.linalg.eig() | Eigenvalue decomposition |
| SVD | np.linalg.svd() | Singular value decomposition |
| Cholesky | np.linalg.cholesky() | Cholesky factorization |
| Least squares | np.linalg.lstsq() | Least-squares solution |
| Norms | np.linalg.norm() | Vector and matrix norms |
| QR decomposition | np.linalg.qr() | QR factorization |
These linear algebra tools are essential for machine learning algorithms, physics simulations, signal processing, and many other scientific applications. Tensors in deep learning frameworks build upon the same mathematical foundations that NumPy pioneered for Python.
NumPy's numpy.random module provides facilities for generating random numbers from a wide variety of probability distributions, including uniform, normal (Gaussian), binomial, Poisson, and many others. Starting with NumPy 1.17, the random module was redesigned around a new Generator class and pluggable BitGenerator backends (such as PCG64 and Philox), replacing the older RandomState interface. This redesign brought better statistical quality, improved performance, and the ability to create independent random streams for parallel computation.
The numpy.fft module provides functions for computing the discrete Fourier transform (DFT) and its inverse. Fourier transforms are widely used in signal processing, image analysis, and solving partial differential equations. NumPy 2.0 expanded FFT support to include float32 and longdouble precision, whereas earlier versions only supported float64. For more advanced FFT functionality, the SciPy library's scipy.fft module provides a superset of NumPy's FFT capabilities.
NumPy includes a broad set of statistical functions for computing descriptive statistics on arrays. These include mean, median, std (standard deviation), var (variance), min, max, percentile, quantile, histogram, and corrcoef (correlation coefficient). NumPy 2.0 added support for weights in percentile and quantile calculations, and introduced a correction keyword for var and std as an alternative to ddof.
Python's built-in lists are general-purpose containers that can hold objects of any type. While flexible, this generality comes at a significant performance cost for numerical computation. The following table summarizes the key differences.
| Aspect | Python List | NumPy Array |
|---|---|---|
| Speed (element-wise operations) | Baseline | 10x to 200x faster |
| Memory usage | Higher (stores pointers + objects) | Lower (contiguous typed data) |
| Mathematical operations | Manual loops required | Vectorized, single expression |
| Multidimensional support | Nested lists, no shape enforcement | True n-D arrays with shape, strides |
| Type safety | Mixed types allowed | Single dtype enforced |
| Slicing | Creates a copy | Creates a view (no copy) |
| Ecosystem integration | General Python | Scientific computing stack |
Benchmarks consistently show that NumPy arrays are 10x to 200x faster than Python lists for numerical operations, depending on the specific task. For summation, NumPy is roughly 20x faster. For element-wise squaring, the advantage can reach 200x. Memory usage is typically 3x to 10x lower with NumPy, because Python lists store a pointer (8 bytes on 64-bit systems) plus a full Python object (at least 28 bytes for an integer) per element, while a NumPy float64 array stores just 8 bytes per element with negligible per-element overhead.
That said, NumPy arrays have trade-offs. Appending elements to a NumPy array is slower than appending to a Python list because the array must be reallocated and copied. For tasks that involve frequent dynamic resizing rather than bulk numerical computation, Python lists may be the better choice.
Released on June 16, 2024, NumPy 2.0 was the first major version update since NumPy 1.0 in 2006. It was the product of 11 months of development by 212 contributors across 1,078 pull requests. The release introduced several significant changes.
New features in NumPy 2.0:
numpy.dtypes.StringDType) paired with a new numpy.strings namespace containing performant string ufuncs.acos, asin, concat, and permute_dims, plus new functions like numpy.isdtype() and ndarray.device.float32 and longdouble precision in all numpy.fft functions.linalg.svdvals(), linalg.matrix_norm(), linalg.vector_norm(), linalg.diagonal(), linalg.trace(), and others.Breaking changes in NumPy 2.0:
int32 to int64 on 64-bit Windows, matching other platforms.np.float_, np.Inf, np.NaN) and functions (such as np.in1d, np.msort, np.trapz) were removed.np.float64(3.0) instead of 3.0.As the Python scientific computing ecosystem has grown, several libraries have reimplemented the NumPy API for specialized hardware and use cases: CuPy for NVIDIA GPUs, JAX for accelerator-backed differentiable computing, and Dask for parallel and distributed computation on datasets larger than memory. NumPy provides a set of interoperability protocols that allow these libraries to work seamlessly with NumPy's API.
| Protocol | Purpose | Device Support |
|---|---|---|
__array__() | Convert foreign object to ndarray | Any (via library) |
__array_ufunc__ | Override NumPy universal functions | Any |
__array_function__ | Override general NumPy functions | Any |
__array_interface__ | Legacy memory layout description | CPU only |
DLPack (__dlpack__()) | Device-agnostic zero-copy data exchange | Multi-device (CPU, GPU, etc.) |
| Buffer protocol | Python's standard C-level memory sharing | CPU only |
The __array_ufunc__ protocol (NEP 13) allows libraries like CuPy to intercept NumPy ufunc calls and redirect them to GPU-optimized implementations. The __array_function__ protocol (NEP 18) extends this to arbitrary NumPy functions such as np.linalg.qr() or np.concatenate(). Together, these protocols mean that code written with NumPy's API can often run on GPUs or distributed clusters with minimal changes.
The DLPack protocol is especially important for exchanging data between frameworks without copying. For instance, a PyTorch tensor on a GPU can be shared with CuPy via DLPack without moving data off the device. NumPy adopted DLPack through the numpy.from_dlpack() function.
The Python array API standard, which NumPy 2.0 fully supports, defines a common interface that array libraries can implement, making it easier to write library code that is agnostic to the specific backend (NumPy, CuPy, JAX, PyTorch, or others).
NumPy occupies a unique position as the foundation layer of the scientific Python stack. Virtually every major scientific and data library in Python depends on NumPy arrays as their primary data exchange format.
This central role means that improvements to NumPy propagate across the entire ecosystem. When NumPy added SIMD-accelerated sorting in version 2.0, every library that uses NumPy's sort functions benefited automatically.
Imagine you have a big box of numbered blocks. If you wanted to double every number, you could pick up each block one by one, read the number, multiply it by two, and put it back. That would take a long time if you had a million blocks.
NumPy is like a magic table that lets you dump all your blocks on it and say "double everything," and it does the entire job at once, almost instantly. It works so fast because, behind the scenes, it uses a super-efficient helper (written in the C programming language) that can process blocks way faster than picking them up one at a time in Python.
NumPy also makes sure all the blocks are the same kind (all integers, or all decimals), lined up neatly in a row in memory. This makes it much easier and faster for the computer to work with them compared to a messy pile where each block might be a different type.
Scientists, data analysts, and people who build AI use NumPy every day because it makes working with huge amounts of numbers quick and easy. Almost every other Python tool for math, data, or machine learning is built on top of NumPy.