Proving triangle inequality for Wasserstein distance in Optimal transport using Kantorovich duality
Abstract.
Optimal transport is the problem of moving mass from one place to another while minimizing the total cost function. When the cost function in the optimal transport (OT) problem is measured by a power of the distance in the ground space, OT defines a distance known as the Wasserstein distance between probability measures. In particular, one can prove that the Wasserstein distance satisfies the triangle inequality.
However, proving the triangle inequality is difficult (at least for me ^^). The standard proof in most textbooks is based on the so-called “glueing of couplings” technique, see e.g. in the book of Fields medalist Cédric Villani [2].
This post introduces a new, elegant, constructive and short proof recently proposed by François Golse [1] based on Kantorovich duality. To convey the main idea, we focus in this post on finite support measures, i.e., each measure can be identified with a vector. This restriction allows us to avoid complicated concepts such as Polish metric spaces, measurable functions, and measure coupling. We hope that this post can help readers quickly grasp the key ideas in his proof.
1. Discrete measure
Let be a finite set of points. In the following, we consider three discrete measures supported on :
where denotes a unit mass (called Dirac mass) at location .
For a discrete measure , we may think of it as a collection of mass with weight at location for . So each measure can be identified with a vector . Since we are interested in moving mass from one measure to another, we further assume in the following that all the measures have the same total mass, i.e.
2. Optimal transport
In the optimal transport problems, we are interested in the cost efficient way to move mass between two measures, for example, from to . Denote by the amount of mass moved from at to at , for . Then the total optimal transport cost is
where . is called the -Wasserstein distance. Here the two constraints in OT basically means the sum of th row equals to and the sum of th column equals to .
In this discrete setting, OT is basically a linear programming problem. The dual problem of linear programming, in this case called Kantorovich duality, is given by the following identity:
3. Triangle inequality
Our goal is to prove that, for distinct discrete measures supported on , we have the following triangle inequality:
Let us split the proof into steps.
Step 1. We first notice that the above triangle inequality can be equivalently rewritten as follows:
where is well-defined by
Step 2. Let be the vectors in the Kantorovich duality w.r.t. and , i.e. , and
Step 3. Let us define in terms of vector and parameter as follows:
Clearly, turns out to satisfy
Surprisingly, one can also prove that has a connection with and (see the proof in [1]) as follows, for all ,
This is perhaps the most tricky part of the proof.
By Kantorovich duality, above definition and property of imply that
Step 4. Combining Step 2 and 3, we then obtain the target triangle inequality as follows:
From the view of Step 1, the proof is completed.
References
[1]: Golse, François. “A Duality-Based Proof of the Triangle Inequality for the Wasserstein Distances.” La Matematica (2024): 1-15.
[2]: Villani, Cédric. Topics in optimal transportation. Vol. 58. American Mathematical Soc., 2021.