Thursday, March 28, 2024

Proving triangle inequality for Wasserstein distance in Optimal Transport using Kantorovich duality

Proving triangle inequality for Wasserstein distance in Optimal Transport using Kantorovich duality

Proving triangle inequality for Wasserstein distance in Optimal transport using Kantorovich duality

Abstract.
Optimal transport is the problem of moving mass from one place to another while minimizing the total cost function. When the cost function in the optimal transport (OT) problem is measured by a power of the distance in the ground space, OT defines a distance known as the Wasserstein distance between probability measures. In particular, one can prove that the Wasserstein distance satisfies the triangle inequality.

However, proving the triangle inequality is difficult (at least for me ^^). The standard proof in most textbooks is based on the so-called “glueing of couplings” technique, see e.g. in the book of Fields medalist Cédric Villani [2].

This post introduces a new, elegant, constructive and short proof recently proposed by François Golse [1] based on Kantorovich duality. To convey the main idea, we focus in this post on finite support measures, i.e., each measure can be identified with a vector. This restriction allows us to avoid complicated concepts such as Polish metric spaces, measurable functions, and measure coupling. We hope that this post can help readers quickly grasp the key ideas in his proof.

1. Discrete measure

Let X={x1,...,xn}RdX=\{x_1,..., x_n\} \subset \mathbb R^d be a finite set of nn points. In the following, we consider three discrete measures supported on XX:
μ=i=1nuiδxiν=i=1nviδxiω=i=1nwiδxi\begin{align*}\mu & = \sum_{i=1}^n u_i \delta_{x_i}\\ \nu & = \sum_{i=1}^n v_i \delta_{x_i}\\ \omega & = \sum_{i=1}^n w_i \delta_{x_i}\end{align*}
where δxi\delta_{x_i} denotes a unit mass (called Dirac mass) at location xix_i.

For a discrete measure μ\mu, we may think of it as a collection of mass with weight ui0u_i\geq 0 at location xix_i for i=1,...,ni=1,..., n. So each measure μ\mu can be identified with a vector u=[ui]Rnu=[u_i] \in \mathbb R^n. Since we are interested in moving mass from one measure to another, we further assume in the following that all the measures have the same total mass, i.e. i=1nui=i=1nvi=i=1nwi=1\sum_{i=1}^n u_i = \sum_{i=1}^n v_i=\sum_{i=1}^n w_i=1

2. Optimal transport

In the optimal transport problems, we are interested in the cost efficient way to move mass between two measures, for example, from μ\mu to ν\nu. Denote by pijp_{ij} the amount of mass moved from uiu_i at xix_i to vjv_j at xjx_j, for i,=1,...,ni, =1,..., n. Then the total optimal transport cost is
W22(μ,ν)=min{i,j=1npijd(xi,xj)2:P1n=u,PT1n=v},W_2^2(\mu, \nu) = \min \left\{\sum_{i,j =1}^n p_{ij} d(x_i, x_j)^2: P 1_n =u, P^T 1_n = v\right\},
where P=[pij]Rn×nP=[p_{ij}] \in \mathbb R^{n \times n}. W2W_2 is called the 22-Wasserstein distance. Here the two constraints in OT basically means the sum of iith row equals to uiu_i and the sum of jjth column equals to vjv_j.

In this discrete setting, OT is basically a linear programming problem. The dual problem of linear programming, in this case called Kantorovich duality, is given by the following identity:

W22(μ,ν)=maxi,j:   ai+bjd2(xi,xj)a,u+b,v.W_2^2(\mu, \nu) = \max_{i, j: \, \, \, a_i + b_j \leq d^2(x_i, x_j)} \langle a, u\rangle + \langle b,v\rangle.

3. Triangle inequality

Our goal is to prove that, for distinct discrete measures μ,ν,ω\mu, \nu, \omega supported on XX, we have the following triangle inequality:
W2(μ,ν)W2(μ,ω)+W2(ω,ν).W_2(\mu, \nu) \leq W_2(\mu, \omega) + W_2(\omega, \nu).

Let us split the proof into 44 steps.

Step 1. We first notice that the above triangle inequality can be equivalently rewritten as follows:
W22(μ,ν)(1+η)W22(μ,ω)+(1+1/η)W22(ω,ν),W_2^2(\mu, \nu) \leq (1+ \eta)W_2^2(\mu, \omega) + (1+ 1/\eta) W_2^2(\omega, \nu),
where η>0\eta>0 is well-defined by
η=W2(ω,ν)W2(μ,ω).\eta = \frac{W_2(\omega, \nu)}{W_2(\mu, \omega)}.

Step 2. Let a,bRna, b \in \mathbb R^n be the vectors in the Kantorovich duality w.r.t. μ\mu and ν\nu, i.e. ak+bjd2(xk,xj)a_k + b_j \leq d^2(x_k, x_j) k,j=1,...,nk, j =1,...,n, and
W22(μ,ν)=a,u+b,v.W_2^2(\mu, \nu) = \langle a, u\rangle+ \langle b, v\rangle.

Step 3. Let us define c=[ci]Rnc= [c_i] \in \mathbb R^n in terms of vector bb and parameter η\eta as follows:
ci=(1+1/η)minj=1,...,n(d2(xi,xj)bj1+1/η).c_i = (1+1/\eta) \min_{j=1,..., n} (d^2(x_i, x_j) - \frac{b_j}{1+1/\eta}).

Clearly, cc turns out to satisfy
ci1+1/η+bj1+1/ηd2(xi,xj).\frac{c_i}{1+1/\eta} + \frac{b_j}{1+1/\eta} \leq d^2(x_i, x_j).

Surprisingly, one can also prove that cc has a connection with aa and η\eta (see the proof in [1]) as follows, for all k,i=1,...,nk, i =1,..., n,
ak1+ηci1+ηd2(xk,xi).\frac{a_k}{1+\eta} - \frac{c_i}{1+\eta} \leq d^2(x_k, x_i).
This is perhaps the most tricky part of the proof.

By Kantorovich duality, above definition and property of cc imply that
c,w+b,v(1+1/η)W22(ω,ν),a,uc,w(1+η)W22(μ,ω).\begin{align*}\langle c, w\rangle + \langle b, v\rangle & \leq (1+1/\eta) W_2^2(\omega, \nu),\\ \langle a, u\rangle - \langle c, w\rangle & \leq (1+\eta) W_2^2(\mu, \omega).\end{align*}

Step 4. Combining Step 2 and 3, we then obtain the target triangle inequality as follows:

W22(μ,ν)=a,u+b,v=a,uc,w+c,w+b,v(1+η)W22(μ,ω)+(1+1/η)W22(ω,ν).\begin{align*}W_2^2(\mu, \nu) & = \langle a, u\rangle + \langle b, v\rangle\\ & = \langle a, u\rangle - \langle c, w\rangle + \langle c, w\rangle+\langle b, v\rangle\\ & \leq (1+\eta)W_2^2(\mu, \omega) + (1+1/\eta)W_2^2(\omega, \nu). \end{align*}

From the view of Step 1, the proof is completed.

References

[1]: Golse, François. “A Duality-Based Proof of the Triangle Inequality for the Wasserstein Distances.” La Matematica (2024): 1-15.
[2]: Villani, Cédric. Topics in optimal transportation. Vol. 58. American Mathematical Soc., 2021.

Popular Posts