Lasso Safe Screening

Lasso Safe Screening for Feature Elimination

Lasso promotes sparsity in machine learning, statistics, and signal processing but can be computationally expensive. Safe screening mitigates this by identifying zero coefficients early, reducing memory and computation. This post explores its mathematical foundation and compares two state-of-the-art techniques: GAP and RYU ball screening.

1. Lasso problem

Given a feature matrix $A \in \mathbb{R}^{m \times n}$ and a response vector $b \in \mathbb{R}^m$ , the Lasso problem is

$\min_{x \in \mathbb{R}^n} \frac{1}{2} \|b -Ax\|_2^2 + \lambda \|x\|_1$

where $\lambda > 0$ controls sparsity. Large $\lambda$ encourages more zero entries in $x$ .

2. Lasso Dual Problem

The dual problem of Lasso with variable $u \in \mathbb{R}^m$ :

$\max_{u \in \mathbb{R}^m} \frac{1}{2} \|b\|_2^2 - \frac{1}{2} \|b - u\|_2^2$

subject to the constraint:

$\|A^T u\|_\infty \leq \lambda$

Note that for some given $x$ , we can construct a feasible dual variable $u$ using Dual Scaling

$u_x = \frac{\lambda}{\|A(b- A x)\|_\infty} (b- Ax)$

Note that $u_x$ satisfies

$x\rightarrow x^\star \Longrightarrow u_x \rightarrow u^\star.$

where $x^\star$ and $u^\star$ be any optimal solution of Lasso and its dual problem, and that the convergence of Dual Gap holds

$x\rightarrow x^\star \Longrightarrow \text{gap}(x, u_x) \rightarrow 0.$

Here the dual gap is defined as the different between the objective value of lasso and the objective value of its dual.

3. KKT Optimality Condition

The KKT optimality condition ensures the following Complementary Slackness condition between $x^\star$ and $u^\star$ :

$x^\star_i(|\langle a_i, u^\star\rangle| - \lambda)=0, \quad \forall i=1,..., n.$

where $a_i$ is the $i$ -th column of $A$ .

4. Safe Screening Rule

From the Complementary Slackness condition, we have the Safe Screening rule

$| \langle a_i, u^\star \rangle | < \lambda \Longrightarrow x^\star_i=0$

Since $u^\star$ is unknown, above screening rule is useless. We now relax this rule by finding a Safe Ball, i.e. a ball containing $u^\star$ .

The popular choice is GAP ball [1] with center $c \in \mathbb{R}^m$ and radius $r$ defined as

$B(c, r) = B(u, \sqrt{2\cdot\text{gap}(x, u)})$

The subset of GAP ball is RYU ball [2] (leading to more efficient screening)

$B(c, r) = B\left(\frac{u+r_x}{2}, \sqrt{\text{gap}(x, u) - \frac{\|u-r_x\|_2^2}{2}}\right)$

where $r_x = b- Ax$ .

Now, the safe screening reads as follows

$|\langle a_i, c\rangle| + r ||a_i||_2 < \lambda \Longrightarrow x^\star_i=0$

5. Experiment

Here, we compare the safe screening performance of GAP and RYU ball.
The result confirms that using RYU can screening more features than using GAP ball.

The code is in Python, available on my github [3]

6. References

[1]. Fercoq, Olivier, Alexandre Gramfort, and Joseph Salmon. “Mind the duality gap: safer rules for the lasso.” International conference on machine learning. PMLR, 2015.

[2]. Tran, Thu-Le, et al. “ONE TO BEAT THEM ALL:” RYU"-A UNIFYING FRAMEWORK FOR THE CONSTRUCTION OF SAFE BALLS." (2023).

[3]. https://github.com/Tran-Thu-Le/ttl_blog/blob/main/math/lasso-safe-screening.ipynb