Safe Screening for Lasso problem with squared L1-norm penalization
Safe Screening for Lasso problem with squared L1-norm penalization
Safe Screening for Lasso problem with squared L1-norm penalization
In [1], the authors introduce a safe screening rule for the following Lasso-like problem with L1-norm penalization x∈Rnmin21∣∣b−Ax∣∣22+λ∣∣x∣∣12
In this blog post, we will study their techniques and key ideas.
Safe screening for the standard Lasso problem
The Lasso problem is a non-smooth convex optimization problem with penalization term defined by an L1-norm x∈Rnmin21∣∣b−Ax∣∣22+λ∣∣x∣∣1
where b∈Rm, A=[a1,...,an]∈Rm×n and λ>0.
It is known that the problem has sparse solution, i.e. only a small number of entries of optimal solution x⋆ is non-zero. Therefore, removing the zero entries of x⋆ before solving it may significantly accelerate the solving process. This can be achieved by using safe screening. We now briefly recall the idea of safe screening rule.
The Fenchel-Rockafellar dual problem of Lasso is u∈Uλmax21∣∣b∣∣22−21∣∣b−u∣∣22
where Uλ={u∈Rm:∣∣ATu∣∣∞≤λ}
In other words, the dual problem is the closest point projection problem from b onto the polytope Uλ.
The optimal pair (x⋆,u⋆) verifies the following optimality condition (known as the Complementary Slackness condition): xi⋆(∣⟨ai,u⋆⟩∣−λ)=0
As a direct consequence, we have ∣⟨ai,u⋆⟩∣<λ⟹xi⋆=0
This implication is called a safe screening rule since it provides a sufficient condition to identify zero entries of (any) x⋆. Note that, in practice, u⋆ is unknown. We therefore “replace” it by a set called safe regionS such that u⋆∈S, in this case, a practical screening rule reads as follows: u∈Ssup∣⟨ai,u⟩∣<λ⟹xi⋆=0
Safe screening for squared L1-norm penalization
In a recent paper [1], the author introduce the safe screening for a problem akin to Lasso with squared L1-norm penalization. In this blog post, we will study their main ideas.
They consider the following problem [1, Eq. 3.1]: x∈Rnmin21∣∣b−Ax∣∣22+λ∣∣x∣∣12(1)
In this case, the dual problem is [1, Eq. 3.12]: u∈Rmmax21∣∣b∣∣22−21∣∣b−u∣∣22−λ1∣∣ATu∣∣∞2(2)
Here, note that above duality can be derived as an instance of the general duality x∈Rnminf(Ax)+g(x)=u∈Rmmax−f∗(−u)−g∗(ATu)
and the observation that if φ(⋅)=0.5∣∣⋅∣∣2, then φ∗(⋅)=0.5∣∣⋅∣∣∗2 where ∣∣⋅∣∣∗ denotes the dual norm of ∣∣⋅∣∣, in particular, the dual norm of L2-norm is L2-norm and the dual norm of L1-norm is max-norm.
The authors in [1] show that the dual problem (2) can be interpreted as a closest projection problem onto a cone in the augmented space Rm+1. Indeed, let t≥∣∣ATu∣∣∞/λ. Define b′=[bT,0]T, u′=[uT,t]T, then (2) can be rewritten as u′∈Cmax21∣∣b∣∣22−21∣∣b′−u′∣∣22(3)
where C [1, Eq. 3.17] is a cone defined by C={u′∈Rm+1:∣∣ATu∣∣∞≤λt}
Any optimal pair of solution (x⋆,u⋆) verifies the following condition: xi⋆(∣⟨ai,u⋆⟩∣−∣∣ATu⋆∣∣∞)=0
Thus obtain a safe screening rule ∣⟨ai,u⋆⟩∣−∣∣ATu⋆∣∣∞<0⟹xi⋆
To derive a practical safe screening one needs to derive an upper bound for ∣⟨ai,u⋆⟩∣−∣∣ATu⋆∣∣∞ using only information of arbitrary u instead of u⋆. This can be done as follows.
In [1], by exploiting the framework of GAP-ball safe region [2] for (3), the authors show that [(u⋆)T,t⋆]T∈S′=B([uT,t]T,2gap(x,u))
where gap(x,u) (called dual gap) denotes the difference between the primal function in (1) and dual function in (2). Here B(c,r) denotes the ball with center c and radius r.
We now provide a upper bound for ∣⟨ai,u⋆⟩∣−∣∣ATu⋆∣∣∞. Let ϵ=2gap(x,u), we have ∣∣u−u⋆∣∣22+(t−t⋆)2≤ϵ
then, by Cauchy’s inequality, we have ∣∣ai∣∣2∣∣u⋆−u∣∣2−λ(t⋆−t)≤ϵ(∣∣ai∣∣2+λ)
Therefore, ⟨ai,u⋆⟩−λt⋆≤⟨ai,u⟩−λt+ϵ(∣∣ai∣∣2+λ)
Note that t=∣∣ATu⋆∣∣∞/λ and choose t=∣∣ATu∣∣∞/λ, we have ⟨ai,u⋆⟩−∣∣ATu⋆∣∣∞≤⟨ai,u⟩−∣∣ATu∣∣∞+ϵ(∣∣ai∣∣2+λ)
The LHS of above inequality is a desired upper bound which can be evaluated when any u is given. This upper bound is then leveraged to derive the practical safe screening rule in [1, Theorem 3.6].
References
[1]: Sagnol, Guillaume, and Luc Pronzato. “Fast Screening Rules for Optimal Design via Quadratic Lasso Reformulation.” Journal of Machine Learning Research 24.307 (2023): 1-32.
[2]: Ndiaye, Eugene, et al. “Gap safe screening rules for sparsity enforcing penalties.” The Journal of Machine Learning Research 18.1 (2017): 4671-4703.