Monday, November 6, 2023

Safe Screening for Lasso problem with squared L1-norm penalization

Safe Screening for Lasso problem with squared L1-norm penalization

Safe Screening for Lasso problem with squared L1-norm penalization

In [1], the authors introduce a safe screening rule for the following Lasso-like problem with L1-norm penalization
minxRn12bAx22+λx12\min_{x\in \mathbb R^n}\quad \frac{1}{2}||b-Ax||_2^2+\lambda||x||_1^2
In this blog post, we will study their techniques and key ideas.

Safe screening for the standard Lasso problem

The Lasso problem is a non-smooth convex optimization problem with penalization term defined by an L1-norm
minxRn12bAx22+λx1\min_{x\in \mathbb R^n}\quad \frac{1}{2}||b-Ax||_2^2+\lambda||x||_1
where bRmb\in \mathbb R^m, A=[a1,...,an]Rm×nA=[a_1,..., a_n]\in \mathbb R^{m\times n} and λ>0\lambda>0.

It is known that the problem has sparse solution, i.e. only a small number of entries of optimal solution xx^\star is non-zero. Therefore, removing the zero entries of xx^\star before solving it may significantly accelerate the solving process. This can be achieved by using safe screening. We now briefly recall the idea of safe screening rule.

The Fenchel-Rockafellar dual problem of Lasso is
maxuUλ12b2212bu22\max_{u\in U_\lambda} \quad \frac{1}{2}||b||_2^2 - \frac{1}{2}||b-u||_2^2
where
Uλ={uRm:ATuλ}U_\lambda = \{u \in \mathbb R^m: ||A^Tu||_\infty\leq \lambda\}
In other words, the dual problem is the closest point projection problem from bb onto the polytope UλU_\lambda.

The optimal pair (x,u)(x^\star, u^\star) verifies the following optimality condition (known as the Complementary Slackness condition):
xi(ai,uλ)=0x_i^\star(|\langle a_i, u^\star\rangle| - \lambda)=0
As a direct consequence, we have
ai,u<λxi=0|\langle a_i, u^\star\rangle| <\lambda \Longrightarrow x_i^\star=0
This implication is called a safe screening rule since it provides a sufficient condition to identify zero entries of (any) xx^\star. Note that, in practice, uu^\star is unknown. We therefore “replace” it by a set called safe region SS such that uSu^\star\in S, in this case, a practical screening rule reads as follows:
supuSai,u<λxi=0\sup_{u\in S} |\langle a_i, u\rangle| <\lambda \Longrightarrow x_i^\star=0

Safe screening for squared L1-norm penalization

In a recent paper [1], the author introduce the safe screening for a problem akin to Lasso with squared L1-norm penalization. In this blog post, we will study their main ideas.

They consider the following problem [1, Eq. 3.1]:
minxRn12bAx22+λx12(1)\min_{x\in \mathbb R^n}\quad \frac{1}{2}||b-Ax||_2^2+\lambda||x||_1^2\tag{1}
In this case, the dual problem is [1, Eq. 3.12]:
maxuRm12b2212bu221λATu2(2)\max_{u\in \mathbb R^m}\quad \frac{1}{2}||b||_2^2 - \frac{1}{2}||b-u||_2^2 - \frac{1}{\lambda} ||A^Tu||_\infty^2\tag{2}

Here, note that above duality can be derived as an instance of the general duality
minxRnf(Ax)+g(x)=maxuRmf(u)g(ATu)\min_{x\in \mathbb R^n} \quad f(Ax)+g(x) = \max_{u\in \mathbb R^m} \quad -f^*(-u) - g^*(A^Tu)
and the observation that if φ()=0.52\varphi(\cdot) = 0.5||\cdot||^2, then φ()=0.52\varphi^*(\cdot) = 0.5||\cdot||_*^2 where ||\cdot||_* denotes the dual norm of ||\cdot||, in particular, the dual norm of L2-norm is L2-norm and the dual norm of L1-norm is max-norm.

The authors in [1] show that the dual problem (2) can be interpreted as a closest projection problem onto a cone in the augmented space Rm+1\mathbb R^{m+1}. Indeed, let tATu/λt \geq ||A^Tu||_\infty/\sqrt{\lambda}. Define b=[bT,0]Tb' = [b^T, 0]^T, u=[uT,t]Tu'=[u^T, t]^T, then (2) can be rewritten as
maxuC12b2212bu22(3)\max_{u'\in C}\quad \frac{1}{2}||b||_2^2 - \frac{1}{2}||b'-u'||_2^2 \tag{3}
where CC [1, Eq. 3.17] is a cone defined by
C={uRm+1:ATuλt}C = \{u'\in \mathbb R^{m+1}: ||A^Tu||_\infty\leq \sqrt{\lambda}t\}

Any optimal pair of solution (x,u)(x^\star, u^\star) verifies the following condition:
xi(ai,uATu)=0x^\star_i (|\langle a_i, u^\star\rangle| - ||A^Tu^\star||_\infty )=0

Thus obtain a safe screening rule
ai,uATu<0xi|\langle a_i, u^\star\rangle| - ||A^Tu^\star||_\infty <0 \Longrightarrow x_i^\star

To derive a practical safe screening one needs to derive an upper bound for ai,uATu|\langle a_i, u^\star\rangle| - ||A^Tu^\star||_\infty using only information of arbitrary uu instead of uu^\star. This can be done as follows.

In [1], by exploiting the framework of GAP-ball safe region [2] for (3), the authors show that
[(u)T,t]TS=B([uT,t]T,2gap(x,u))[(u^\star)^T, t^\star]^T\in S' = B([u^T, t]^T, \sqrt{2gap(x, u)})
where gap(x,u)gap(x, u) (called dual gap) denotes the difference between the primal function in (1) and dual function in (2). Here B(c,r)B(c, r) denotes the ball with center cc and radius rr.

We now provide a upper bound for ai,uATu|\langle a_i, u^\star\rangle| - ||A^Tu^\star||_\infty. Let ϵ=2gap(x,u)\epsilon = 2gap(x, u), we have
uu22+(tt)2ϵ||u-u^\star||_2^2 + (t-t^\star)^2\leq \epsilon
then, by Cauchy’s inequality, we have
ai2uu2λ(tt)ϵ(ai2+λ)||a_i||_2||u^\star-u||_2 - \sqrt{\lambda}(t^\star-t) \leq \epsilon(||a_i||^2+ \lambda)
Therefore,
ai,uλtai,uλt+ϵ(ai2+λ)\langle a_i, u^\star\rangle - \sqrt{\lambda}t^\star \leq \langle a_i, u\rangle - \sqrt{\lambda}t + \sqrt{\epsilon(||a_i||^2+ \lambda)}
Note that t=ATu/λt = ||A^Tu^\star||_\infty/\sqrt{\lambda} and choose t=ATu/λt = ||A^Tu||_\infty/\sqrt{\lambda}, we have
ai,uATuai,uATu+ϵ(ai2+λ)\langle a_i, u^\star\rangle - ||A^Tu^\star||_\infty \leq \langle a_i, u\rangle - ||A^Tu||_\infty + \sqrt{\epsilon(||a_i||^2+ \lambda)}
The LHS of above inequality is a desired upper bound which can be evaluated when any uu is given. This upper bound is then leveraged to derive the practical safe screening rule in [1, Theorem 3.6].

References

[1]: Sagnol, Guillaume, and Luc Pronzato. “Fast Screening Rules for Optimal Design via Quadratic Lasso Reformulation.” Journal of Machine Learning Research 24.307 (2023): 1-32.
[2]: Ndiaye, Eugene, et al. “Gap safe screening rules for sparsity enforcing penalties.” The Journal of Machine Learning Research 18.1 (2017): 4671-4703.

Popular Posts