Sunday, May 12, 2024

Unbalanced Optimal Transport as Sparse Convex Regression: model, algorithm and acceleration

Unbalanced Optimal Transport as Sparse Convex Regression: model, algorithm and acceleration

Unbalanced Optimal Transport as Sparse Convex Regression: Model, Algorithm and Acceleration using Safe Screening

1. Introduction

Optimal transport (OT) is widely used in machine learning to capture the geometric relationships between data distributions.

When dealing with distributions of differing mass in real-world scenarios, Unbalanced Optimal Transport (UOT) with entropy regularization was introduced [1]. However, due to optimization instability and loss of sparsity in solutions, researchers have turned their focus to exploring alternative model, such as sparse convex regression models [2].

Nevertheless, solving large-scale UOT problems (in any model) is challenging. This post aims to elucidate the main ideas of the paper [3], where the authors demonstrate the application of the “safe screening” technique, originally developed for Lasso-like problems. This technique may discard the zeros in the optimal transport plan, thereby accelerating the resolution of sparse convex regression model of UOT [2].

2. From Unbalanced Optimal Transport to Sparse Convex Regression Model

In the Optimal Transport (OT), we are given two distributions, say aa and bb, and a cost matrix CC. Specifically, aR+ma \in \mathbb R^m_+ and bR+nb\in \mathbb R^n_+ and CRm×nC \in \mathbb R^{ m \times n}. The OT problem seeks for an optimal transport plan TR+m×nT \in \mathbb R^{m \times n}_+ that minimizes the following transportation cost

minTC,Ts. t.T1m=a,T1n=b.\min_{T} \quad \langle C, T\rangle \quad \text{s. t.} \quad T 1_m=a, T1_n=b.

In the (balanced) OT, we assume that the distributions have unit mass, i.e. a1=b1||a||_1=||b||_1.

It is clear that OT problem is actually a linear programming written in the matrix form. One can convert it to vector form as follows. Let t=vec(T)Rmnt=vec(T) \in \mathbb R^{m n} and MRm×mnM\in \mathbb R^{ m \times m n } and NRn×mnN\in \mathbb R^{ n \times m n } be the index matrices such that:
T1m=aMt=aT 1_m=a \Longleftrightarrow M t = a
and
T1n=bNt=b.T 1_n=b \Longleftrightarrow N t = b.
Let c=vec(C)Rmnc=vec(C) \in \mathbb R^{m n}, y=[aT,bT]TRm+ny= [a^T, b^T]^T \in \mathbb R^{m +n} and X=[MT,NT]TX= [M^T, N^T]^T, the OT problem can be rewritten in vector form with a single constraint:
mint0c,ts. t.Xt=y.(SCRUOT)\min_{t\geq 0} \quad \langle c, t\rangle \quad \text{s. t.} \quad Xt=y. \quad \quad (SCR-UOT)

In the UOT, the constraint a1=b1||a||_1=||b||_1 will be relaxed thus the conditions T1m=a,T1n=bT 1_m=a, T1_n=b is not satisfied. One comes up with a regularization technique instead, i.e. one considers the following problem
mint0λc,t+D(Xt,y)\min_{t\geq 0} \quad \lambda \langle c, t\rangle + D(Xt, y)
where λ>0\lambda>0 is a tuning parameter and D(,)D(\cdot , \cdot) is some “distance” function (not necessarily symmetric), e.g. 2\ell_2-norm or Kullback-Leibler divergence.

One can easily see that this model is actually a 1\ell_1-norm regression model, e.g. (non-negative) Lasso problem, when c=1m×nc=1_{m \times n}.

Furthermore, it is well known that 1\ell_1-norm regularization problem promotes the sparsity in the optimal solutions.

We therefore, refer to this problem as sparse convex regression for UOT.

2. On the selection of solver

As mentioned by the authors in [3], using (SCRUOT)(SCR-UOT) one can leverage a huge number of numerical solvers for UOT, see e.g. FISTA method [4].

3. Acceleration using safe screening

The authors [3], show how to leverage safe screening rule to discard the zero in the optimal solution tt^\star and the corresponding columns in XX. Thus accelerate the solving procedure.

The first safe screening rule was first introduced in [5]. One of the core concepts in many state-of-the-art safe screening techniques is the concept of safe region which is a set containing the dual optimal solution. In [3], the authors revisited some old safe regions, e.g. GAP safe ball region [6]. However, the readers should notice the new unifying frameworks of safe region with ball geometry [7] and dome geometry [8].

Notably, one of the interesting contributions in [3] is the “Shift Projection” which removes the degeneracy in the so-called “Residuals Scaling”
θ:=θmax(0,XTθλc)\theta := \frac{\theta}{\max\left(0, ||\frac{X^T\theta}{\lambda c}||_\infty \right)}
when there are some entry of cc close to 00.

4. Conclusion

In this blog post, we have revisited an interesting paper [3] regarding the study of UOT using sparse convex regression model and solving procedure and acceleration technique based on safe screening.

5. References

[1] Marco Cuturi. Sinkhorn distances: Lightspeed computation of optimal transport. In NeurIPS, 2013.
[2] Laetitia Chapel, Rémi Flamary, Haoran Wu, Cédric Févotte, and Gilles Gasso. Unbalanced optimal transport through non-negative penalized linear regression. In NeurIPS, 2021.
[3] Su, Xun, Zhongxi Fang, and Hiroyuki Kasai. Safe Screening for Unbalanced Optimal Transport." arXiv preprint arXiv:2307.00247 (2023).
[4] Amir Beck and Marc Teboulle. A fast iterative shrinkage thresholding algorithm for linear inverse problems. SIAM, 2(1):182–202, 2009.
[5] Ghaoui, Laurent El, Vivian Viallon, and Tarek Rabbani. Safe feature elimination for the lasso and sparse supervised learning problems. arXiv preprint arXiv:1009.4219 (2010).
[6] Eugene Ndiaye, Olivier Fercoq, Alex, re Gramfort, and Joseph Salmon. Gap safe screening rules for sparsity enforcing penalties. Journal of Machine Learning Research, 18(128):1–33, 2017.
[7] Tran, Thu-Le, et al. One to beat them all:" RYU’’–a unifying framework for the construction of safe balls. arXiv preprint arXiv:2312.00640 2023.
[8] Tran, Thu-Le, et al. Beyond GAP screening for Lasso by exploiting new dual cutting half-spaces. 2022 30th European Signal Processing Conference (EUSIPCO). IEEE, 2022.

Popular Posts