Proving triangle inequality for Wasserstein distance in Optimal Transport using Kantorovich duality

Proving triangle inequality for Wasserstein distance in Optimal transport using Kantorovich duality

Abstract.
Optimal transport is the problem of moving mass from one place to another while minimizing the total cost function. When the cost function in the optimal transport (OT) problem is measured by a power of the distance in the ground space, OT defines a distance known as the Wasserstein distance between probability measures. In particular, one can prove that the Wasserstein distance satisfies the triangle inequality.

However, proving the triangle inequality is difficult (at least for me ^^). The standard proof in most textbooks is based on the so-called “glueing of couplings” technique, see e.g. in the book of Fields medalist Cédric Villani [2].

This post introduces a new, elegant, constructive and short proof recently proposed by François Golse [1] based on Kantorovich duality. To convey the main idea, we focus in this post on finite support measures, i.e., each measure can be identified with a vector. This restriction allows us to avoid complicated concepts such as Polish metric spaces, measurable functions, and measure coupling. We hope that this post can help readers quickly grasp the key ideas in his proof.

1. Discrete measure

Let $X=\{x_1,..., x_n\} \subset \mathbb R^d$ be a finite set of $n$ points. In the following, we consider three discrete measures supported on $X$ :
$\begin{align*}\mu & = \sum_{i=1}^n u_i \delta_{x_i}\\ \nu & = \sum_{i=1}^n v_i \delta_{x_i}\\ \omega & = \sum_{i=1}^n w_i \delta_{x_i}\end{align*}$
where $\delta_{x_i}$ denotes a unit mass (called Dirac mass) at location $x_i$ .

For a discrete measure $\mu$ , we may think of it as a collection of mass with weight $u_i\geq 0$ at location $x_i$ for $i=1,..., n$ . So each measure $\mu$ can be identified with a vector $u=[u_i] \in \mathbb R^n$ . Since we are interested in moving mass from one measure to another, we further assume in the following that all the measures have the same total mass, i.e. $\sum_{i=1}^n u_i = \sum_{i=1}^n v_i=\sum_{i=1}^n w_i=1$

2. Optimal transport

In the optimal transport problems, we are interested in the cost efficient way to move mass between two measures, for example, from $\mu$ to $\nu$ . Denote by $p_{ij}$ the amount of mass moved from $u_i$ at $x_i$ to $v_j$ at $x_j$ , for $i, =1,..., n$ . Then the total optimal transport cost is
$W_2^2(\mu, \nu) = \min \left\{\sum_{i,j =1}^n p_{ij} d(x_i, x_j)^2: P 1_n =u, P^T 1_n = v\right\},$
where $P=[p_{ij}] \in \mathbb R^{n \times n}$ . $W_2$ is called the $2$ -Wasserstein distance. Here the two constraints in OT basically means the sum of $i$ th row equals to $u_i$ and the sum of $j$ th column equals to $v_j$ .

In this discrete setting, OT is basically a linear programming problem. The dual problem of linear programming, in this case called Kantorovich duality, is given by the following identity:

$W_2^2(\mu, \nu) = \max_{i, j: \, \, \, a_i + b_j \leq d^2(x_i, x_j)} \langle a, u\rangle + \langle b,v\rangle.$

3. Triangle inequality

Our goal is to prove that, for distinct discrete measures $\mu, \nu, \omega$ supported on $X$ , we have the following triangle inequality:
$W_2(\mu, \nu) \leq W_2(\mu, \omega) + W_2(\omega, \nu).$

Let us split the proof into $4$ steps.

Step 1. We first notice that the above triangle inequality can be equivalently rewritten as follows:
$W_2^2(\mu, \nu) \leq (1+ \eta)W_2^2(\mu, \omega) + (1+ 1/\eta) W_2^2(\omega, \nu),$
where $\eta>0$ is well-defined by
$\eta = \frac{W_2(\omega, \nu)}{W_2(\mu, \omega)}.$

Step 2. Let $a, b \in \mathbb R^n$ be the vectors in the Kantorovich duality w.r.t. $\mu$ and $\nu$ , i.e. $a_k + b_j \leq d^2(x_k, x_j)$ $k, j =1,...,n$ , and
$W_2^2(\mu, \nu) = \langle a, u\rangle+ \langle b, v\rangle.$

Step 3. Let us define $c= [c_i] \in \mathbb R^n$ in terms of vector $b$ and parameter $\eta$ as follows:
$c_i = (1+1/\eta) \min_{j=1,..., n} (d^2(x_i, x_j) - \frac{b_j}{1+1/\eta}).$

Clearly, $c$ turns out to satisfy
$\frac{c_i}{1+1/\eta} + \frac{b_j}{1+1/\eta} \leq d^2(x_i, x_j).$

Surprisingly, one can also prove that $c$ has a connection with $a$ and $\eta$ (see the proof in [1]) as follows, for all $k, i =1,..., n$ ,
$\frac{a_k}{1+\eta} - \frac{c_i}{1+\eta} \leq d^2(x_k, x_i).$
This is perhaps the most tricky part of the proof.

By Kantorovich duality, above definition and property of $c$ imply that
$\begin{align*}\langle c, w\rangle + \langle b, v\rangle & \leq (1+1/\eta) W_2^2(\omega, \nu),\\ \langle a, u\rangle - \langle c, w\rangle & \leq (1+\eta) W_2^2(\mu, \omega).\end{align*}$

Step 4. Combining Step 2 and 3, we then obtain the target triangle inequality as follows:

$\begin{align*}W_2^2(\mu, \nu) & = \langle a, u\rangle + \langle b, v\rangle\\ & = \langle a, u\rangle - \langle c, w\rangle + \langle c, w\rangle+\langle b, v\rangle\\ & \leq (1+\eta)W_2^2(\mu, \omega) + (1+1/\eta)W_2^2(\omega, \nu). \end{align*}$

From the view of Step 1, the proof is completed.

References

[1]: Golse, François. “A Duality-Based Proof of the Triangle Inequality for the Wasserstein Distances.” La Matematica (2024): 1-15.
[2]: Villani, Cédric. Topics in optimal transportation. Vol. 58. American Mathematical Soc., 2021.

Tran Thu Le

Pages

Thursday, March 28, 2024