Wednesday, July 2, 2025

similarity-between-Lasso-and-SVM

similarity-between-Lasso-and-SVM

Duality structure between Lasso and SVM

In this post, we reveal a striking connection between two fundamental models in machine learning — Lasso and Support Vector Machines (SVM) — through the lens of Fenchel–Rockafellar duality.

Fenchel–Rockafellar duality is a central result in convex analysis that systematically constructs dual problems using conjugate functions. This framework not only generalizes classical Lagrangian duality but also unifies seemingly different optimization models, such as Lasso and SVM, under a common duality structure. As we will see, SVM shares the same structure as the dual of Lasso.

Fenchel–Rockafellar Duality

Let f:RmR{+}f : \mathbb{R}^m \to \mathbb{R} \cup \{+\infty\} and g:RnR{+}g : \mathbb{R}^n \to \mathbb{R} \cup \{+\infty\} be proper, convex, and lower semicontinuous functions. Let A:RnRmA : \mathbb{R}^n \to \mathbb{R}^m be a linear map. Consider the primal problem:

minxRn{f(Ax)+g(x)} \min_{x \in \mathbb{R}^n} \left\{ f(Ax) + g(x) \right\}

The Fenchel–Rockafellar dual is:

maxuRm{f(u)g(Au)} \max_{u \in \mathbb{R}^m} \left\{ -f^*(-u) - g^*(A^\top u) \right\}

Note that weak duality holds. Let xdom(g)x \in \operatorname{dom}(g) and udom(gA)u \in \operatorname{dom}(g^* \circ A^\top), and define y=Axy = Ax. Then, by the Fenchel–Young inequality:

  • f(y)+f(u)y,u=u,Axf(y) + f^*(-u) \ge \langle y, -u \rangle = -\langle u, Ax \rangle
  • g(x)+g(Au)x,Au=Ax,ug(x) + g^*(A^\top u) \ge \langle x, A^\top u \rangle = \langle Ax, u \rangle

Adding both inequalities:

f(Ax)+g(x)+f(u)+g(Au)0 f(Ax) + g(x) + f^*(-u) + g^*(A^\top u) \ge 0

This yields:

f(Ax)+g(x)f(u)g(Au) f(Ax) + g(x) \ge -f^*(-u) - g^*(A^\top u)

Hence, weak duality:

minx{f(Ax)+g(x)}maxu{f(u)g(Au)} \min_x \left\{ f(Ax) + g(x) \right\} \ge \max_u \left\{ -f^*(-u) - g^*(A^\top u) \right\}

Under mild regularity conditions, strong duality holds — i.e., the optimal values of the primal and dual problems coincide, and solutions exist.


Applications

SVM

We consider the hard-margin support vector machine (SVM) for binary classification. Given training data {(xi,yi)}i=1m\{(x_i, y_i)\}_{i=1}^m with xiRnx_i \in \mathbb{R}^n and yi{1,+1}y_i \in \{-1, +1\}, the SVM problem with margin parameter γ>0\gamma > 0 is:

minwRn,  w0R{12w22  :  yi(wxi+w0)γ,  i=1,,m} \min_{w \in \mathbb{R}^n,\; w_0 \in \mathbb{R}} \left\{ \frac{1}{2} \|w\|_2^2 \;:\; y_i(w^\top x_i + w_0) \ge \gamma,\; i = 1, \dots, m \right\}

Matrix-vector form

Define:

  • XRm×nX \in \mathbb{R}^{m \times n} with rows xix_i^\top,
  • yRmy \in \mathbb{R}^m is the label vector,
  • D=diag(y)Rm×mD = \mathrm{diag}(y) \in \mathbb{R}^{m \times m},
  • 1m\mathbf{1}_m is the vector of all ones in Rm\mathbb{R}^m.

Then the problem becomes:

minwRn,  w0R{12w22  :  D(Xw+w01m)γ1m} \min_{w \in \mathbb{R}^n,\; w_0 \in \mathbb{R}} \left\{ \frac{1}{2} \|w\|_2^2 \;:\; D(Xw + w_0 \mathbf{1}_m) \ge \gamma \mathbf{1}_m \right\}

Remark: If w0=0w_0 = 0, then the problem reduces to:

minwRn{12w22  :  DXwγ1m} \min_{w \in \mathbb{R}^n} \left\{ \frac{1}{2} \|w\|_2^2 \;:\; DXw \ge \gamma \mathbf{1}_m \right\}


Lasso and Its Dual

The Lasso problem is a fundamental model in sparse learning and signal recovery:

minxRn{12Axb22+λx1} \min_{x \in \mathbb{R}^n} \left\{ \frac{1}{2} \|Ax - b\|_2^2 + \lambda \|x\|_1 \right\}

Here:

  • ARm×nA \in \mathbb{R}^{m \times n} is the design matrix,
  • bRmb \in \mathbb{R}^m is the observation vector,
  • λ>0\lambda > 0 is the regularization parameter.

Nonnegative Lasso

The nonnegative Lasso adds a positivity constraint:

minxRn{12Axb22+λ1,x  :  x0} \min_{x \in \mathbb{R}^n} \left\{ \frac{1}{2} \|Ax - b\|_2^2 + \lambda \langle \mathbf{1}, x \rangle \;:\; x \ge 0 \right\}

Choose:

  • f(y)=12yb22f(y) = \frac{1}{2} \|y - b\|_2^2,
  • g(x)=λ1,x+δR+n(x)g(x) = \lambda \langle \mathbf{1}, x \rangle + \delta_{\mathbb{R}_+^n}(x)

Dual problem of nonnegative Lasso

Applying Fenchel–Rockafellar duality:

maxuRm{f(u)g(Au)} \max_{u \in \mathbb{R}^m} \left\{ -f^*(-u) - g^*(A^\top u) \right\}

Since f(u)=12u22b,uf^*(-u) = \frac{1}{2} \|u\|_2^2 - \langle b, u \rangle, we get:

minuRm{12u22b,u  :  Auλ1} \min_{u \in \mathbb{R}^m} \left\{ \frac{1}{2} \|u\|_2^2 - \langle b, u \rangle \;:\; A^\top u \le \lambda \mathbf{1} \right\}

This is equivalent to:

minuRm{12bu2212b22  :  Auλ1} \min_{u \in \mathbb{R}^m} \left\{ \frac{1}{2} \|b - u\|_2^2 - \frac{1}{2} \|b\|_2^2 \;:\; A^\top u \le \lambda \mathbf{1} \right\}

Let v=buv = b - u, then the dual becomes:

minvRm{12v22  :  AvAbλ1} \min_{v \in \mathbb{R}^m} \left\{ \frac{1}{2} \|v\|_2^2 \;:\; A^\top v \ge A^\top b - \lambda \mathbf{1} \right\}

Remark:
This dual Lasso is also a quadratic optimization problem with linear inequality constraints.

Now observe the similarity with the SVM formulation:

minwRn{12w22  :  DXwγ1m} \min_{w \in \mathbb{R}^n} \left\{ \frac{1}{2} \|w\|_2^2 \;:\; DXw \ge \gamma \mathbf{1}_m \right\}


Summary:
This post explores the deep connection between Lasso and Support Vector Machines (SVM) by deriving their dual formulations using Fenchel–Rockafellar duality. It demonstrates that both models share a common dual structure — minimizing a quadratic function under linear inequality constraints. This duality perspective not only unifies the two models but also provides insight into their geometric and optimization similarities.

Popular Posts