Friday, February 7, 2025

Lasso Safe Screening for Feature Elimination

Lasso Safe Screening

Lasso Safe Screening for Feature Elimination

Lasso promotes sparsity in machine learning, statistics, and signal processing but can be computationally expensive. Safe screening mitigates this by identifying zero coefficients early, reducing memory and computation. This post explores its mathematical foundation and compares two state-of-the-art techniques: GAP and RYU ball screening.

1. Lasso problem

Given a feature matrix ARm×nA \in \mathbb{R}^{m \times n} and a response vector bRmb \in \mathbb{R}^m, the Lasso problem is

minxRn12bAx22+λx1 \min_{x \in \mathbb{R}^n} \frac{1}{2} \|b -Ax\|_2^2 + \lambda \|x\|_1

where λ>0\lambda > 0 controls sparsity. Large λ\lambda encourages more zero entries in xx.

2. Lasso Dual Problem

The dual problem of Lasso with variable uRmu \in \mathbb{R}^m:

maxuRm12b2212bu22 \max_{u \in \mathbb{R}^m} \frac{1}{2} \|b\|_2^2 - \frac{1}{2} \|b - u\|_2^2

subject to the constraint:

ATuλ \|A^T u\|_\infty \leq \lambda

Note that for some given xx, we can construct a feasible dual variable uu using Dual Scaling

ux=λA(bAx)(bAx) u_x = \frac{\lambda}{\|A(b- A x)\|_\infty} (b- Ax)

Note that uxu_x satisfies

xxuxu. x\rightarrow x^\star \Longrightarrow u_x \rightarrow u^\star.

where xx^\star and uu^\star be any optimal solution of Lasso and its dual problem, and that the convergence of Dual Gap holds

xxgap(x,ux)0. x\rightarrow x^\star \Longrightarrow \text{gap}(x, u_x) \rightarrow 0.

Here the dual gap is defined as the different between the objective value of lasso and the objective value of its dual.

3. KKT Optimality Condition

The KKT optimality condition ensures the following Complementary Slackness condition between xx^\star and uu^\star:

xi(ai,uλ)=0,i=1,...,n. x^\star_i(|\langle a_i, u^\star\rangle| - \lambda)=0, \quad \forall i=1,..., n.

where aia_i is the ii-th column of AA.

4. Safe Screening Rule

From the Complementary Slackness condition, we have the Safe Screening rule

ai,u<λxi=0 | \langle a_i, u^\star \rangle | < \lambda \Longrightarrow x^\star_i=0

Since uu^\star is unknown, above screening rule is useless. We now relax this rule by finding a Safe Ball, i.e. a ball containing uu^\star.

The popular choice is GAP ball [1] with center cRmc \in \mathbb{R}^m and radius rr defined as

B(c,r)=B(u,2gap(x,u))B(c, r) = B(u, \sqrt{2\cdot\text{gap}(x, u)})

The subset of GAP ball is RYU ball [2] (leading to more efficient screening)

B(c,r)=B(u+rx2,gap(x,u)urx222)B(c, r) = B\left(\frac{u+r_x}{2}, \sqrt{\text{gap}(x, u) - \frac{\|u-r_x\|_2^2}{2}}\right)

where rx=bAxr_x = b- Ax.

Now, the safe screening reads as follows

ai,c+rai2<λxi=0|\langle a_i, c\rangle| + r ||a_i||_2 < \lambda \Longrightarrow x^\star_i=0

5. Experiment

Here, we compare the safe screening performance of GAP and RYU ball.
The result confirms that using RYU can screening more features than using GAP ball.

The code is in Python, available on my github [3]

Sample Image Sample Image Sample Image Sample Image

6. References

[1]. Fercoq, Olivier, Alexandre Gramfort, and Joseph Salmon. “Mind the duality gap: safer rules for the lasso.” International conference on machine learning. PMLR, 2015.

[2]. Tran, Thu-Le, et al. “ONE TO BEAT THEM ALL:” RYU"-A UNIFYING FRAMEWORK FOR THE CONSTRUCTION OF SAFE BALLS." (2023).

[3]. https://github.com/Tran-Thu-Le/ttl_blog/blob/main/math/lasso-safe-screening.ipynb

Popular Posts