The Earth Mover's Distance (EMD), also known as the Wasserstein distance or Mallows distance, is a measure of dissimilarity between two probability distributions in machine learning, statistics, and computer vision. It was originally introduced by Y. Rubner, C. Tomasi, and L.J. Guibas in their 1998 paper titled "A Metric for Distributions with Applications to Image Databases". EMD is especially useful when comparing distributions over continuous domains, such as histograms or point clouds, and has been applied in various fields including image retrieval, shape analysis, and natural language processing.
The EMD is defined as the minimum amount of "work" required to transform one distribution into another, where "work" is quantified as the product of the mass being moved and the distance it is moved. Mathematically, the EMD between two probability distributions P and Q, defined over a common metric space, can be formulated as a transportation problem:
where
The EMD possesses several desirable properties that make it a useful distance metric:
The EMD has been employed in various machine learning tasks and domains, including:
The Earth Mover's Distance (EMD) is a way to measure how different two groups of things are. Imagine you have two piles of sand, each pile having sand of different colors. You want to change the colors of the first pile to match the second pile. EMD helps you find out the minimum effort needed to move the sand around to make both piles look the same. The less effort needed, the more similar the piles are. In the world of computers, this idea is used to compare things like images, shapes, or even words, by measuring how much effort is needed to transform one into the other.