Tran Thu Le: Safe Screening for Lasso problem with squared L1-norm penalization

Safe Screening for Lasso problem with squared L1-norm penalization

In [1], the authors introduce a safe screening rule for the following Lasso-like problem with L1-norm penalization
$\min_{x\in \mathbb R^n}\quad \frac{1}{2}||b-Ax||_2^2+\lambda||x||_1^2$
In this blog post, we will study their techniques and key ideas.

Safe screening for the standard Lasso problem

The Lasso problem is a non-smooth convex optimization problem with penalization term defined by an L1-norm
$\min_{x\in \mathbb R^n}\quad \frac{1}{2}||b-Ax||_2^2+\lambda||x||_1$
where $b\in \mathbb R^m$ , $A=[a_1,..., a_n]\in \mathbb R^{m\times n}$ and $\lambda>0$ .

It is known that the problem has sparse solution, i.e. only a small number of entries of optimal solution $x^\star$ is non-zero. Therefore, removing the zero entries of $x^\star$ before solving it may significantly accelerate the solving process. This can be achieved by using safe screening. We now briefly recall the idea of safe screening rule.

The Fenchel-Rockafellar dual problem of Lasso is
$\max_{u\in U_\lambda} \quad \frac{1}{2}||b||_2^2 - \frac{1}{2}||b-u||_2^2$
where
$U_\lambda = \{u \in \mathbb R^m: ||A^Tu||_\infty\leq \lambda\}$
In other words, the dual problem is the closest point projection problem from $b$ onto the polytope $U_\lambda$ .

The optimal pair $(x^\star, u^\star)$ verifies the following optimality condition (known as the Complementary Slackness condition):
$x_i^\star(|\langle a_i, u^\star\rangle| - \lambda)=0$
As a direct consequence, we have
$|\langle a_i, u^\star\rangle| <\lambda \Longrightarrow x_i^\star=0$
This implication is called a safe screening rule since it provides a sufficient condition to identify zero entries of (any) $x^\star$ . Note that, in practice, $u^\star$ is unknown. We therefore “replace” it by a set called safe region $S$ such that $u^\star\in S$ , in this case, a practical screening rule reads as follows:
$\sup_{u\in S} |\langle a_i, u\rangle| <\lambda \Longrightarrow x_i^\star=0$

Safe screening for squared L1-norm penalization

In a recent paper [1], the author introduce the safe screening for a problem akin to Lasso with squared L1-norm penalization. In this blog post, we will study their main ideas.

They consider the following problem [1, Eq. 3.1]:
$\min_{x\in \mathbb R^n}\quad \frac{1}{2}||b-Ax||_2^2+\lambda||x||_1^2\tag{1}$
In this case, the dual problem is [1, Eq. 3.12]:
$\max_{u\in \mathbb R^m}\quad \frac{1}{2}||b||_2^2 - \frac{1}{2}||b-u||_2^2 - \frac{1}{\lambda} ||A^Tu||_\infty^2\tag{2}$

Here, note that above duality can be derived as an instance of the general duality
$\min_{x\in \mathbb R^n} \quad f(Ax)+g(x) = \max_{u\in \mathbb R^m} \quad -f^*(-u) - g^*(A^Tu)$
and the observation that if $\varphi(\cdot) = 0.5||\cdot||^2$ , then $\varphi^*(\cdot) = 0.5||\cdot||_*^2$ where $||\cdot||_*$ denotes the dual norm of $||\cdot||$ , in particular, the dual norm of L2-norm is L2-norm and the dual norm of L1-norm is max-norm.

The authors in [1] show that the dual problem (2) can be interpreted as a closest projection problem onto a cone in the augmented space $\mathbb R^{m+1}$ . Indeed, let $t \geq ||A^Tu||_\infty/\sqrt{\lambda}$ . Define $b' = [b^T, 0]^T$ , $u'=[u^T, t]^T$ , then (2) can be rewritten as
$\max_{u'\in C}\quad \frac{1}{2}||b||_2^2 - \frac{1}{2}||b'-u'||_2^2 \tag{3}$
where $C$ [1, Eq. 3.17] is a cone defined by
$C = \{u'\in \mathbb R^{m+1}: ||A^Tu||_\infty\leq \sqrt{\lambda}t\}$

Any optimal pair of solution $(x^\star, u^\star)$ verifies the following condition:
$x^\star_i (|\langle a_i, u^\star\rangle| - ||A^Tu^\star||_\infty )=0$

Thus obtain a safe screening rule
$|\langle a_i, u^\star\rangle| - ||A^Tu^\star||_\infty <0 \Longrightarrow x_i^\star$

To derive a practical safe screening one needs to derive an upper bound for $|\langle a_i, u^\star\rangle| - ||A^Tu^\star||_\infty$ using only information of arbitrary $u$ instead of $u^\star$ . This can be done as follows.

In [1], by exploiting the framework of GAP-ball safe region [2] for (3), the authors show that
$[(u^\star)^T, t^\star]^T\in S' = B([u^T, t]^T, \sqrt{2gap(x, u)})$
where $gap(x, u)$ (called dual gap) denotes the difference between the primal function in (1) and dual function in (2). Here $B(c, r)$ denotes the ball with center $c$ and radius $r$ .

We now provide a upper bound for $|\langle a_i, u^\star\rangle| - ||A^Tu^\star||_\infty$ . Let $\epsilon = 2gap(x, u)$ , we have
$||u-u^\star||_2^2 + (t-t^\star)^2\leq \epsilon$
then, by Cauchy’s inequality, we have
$||a_i||_2||u^\star-u||_2 - \sqrt{\lambda}(t^\star-t) \leq \epsilon(||a_i||^2+ \lambda)$
Therefore,
$\langle a_i, u^\star\rangle - \sqrt{\lambda}t^\star \leq \langle a_i, u\rangle - \sqrt{\lambda}t + \sqrt{\epsilon(||a_i||^2+ \lambda)}$
Note that $t = ||A^Tu^\star||_\infty/\sqrt{\lambda}$ and choose $t = ||A^Tu||_\infty/\sqrt{\lambda}$ , we have
$\langle a_i, u^\star\rangle - ||A^Tu^\star||_\infty \leq \langle a_i, u\rangle - ||A^Tu||_\infty + \sqrt{\epsilon(||a_i||^2+ \lambda)}$
The LHS of above inequality is a desired upper bound which can be evaluated when any $u$ is given. This upper bound is then leveraged to derive the practical safe screening rule in [1, Theorem 3.6].

References

[1]: Sagnol, Guillaume, and Luc Pronzato. “Fast Screening Rules for Optimal Design via Quadratic Lasso Reformulation.” Journal of Machine Learning Research 24.307 (2023): 1-32.
[2]: Ndiaye, Eugene, et al. “Gap safe screening rules for sparsity enforcing penalties.” The Journal of Machine Learning Research 18.1 (2017): 4671-4703.

Tran Thu Le

Pages

Monday, November 6, 2023

Safe Screening for Lasso problem with squared L1-norm penalization

Safe Screening for Lasso problem with squared L1-norm penalization

Safe screening for the standard Lasso problem

Safe screening for squared L1-norm penalization

References

Popular Posts