similarity-between-Lasso-and-SVM

Duality structure between Lasso and SVM

In this post, we reveal a striking connection between two fundamental models in machine learning — Lasso and Support Vector Machines (SVM) — through the lens of Fenchel–Rockafellar duality.

Fenchel–Rockafellar duality is a central result in convex analysis that systematically constructs dual problems using conjugate functions. This framework not only generalizes classical Lagrangian duality but also unifies seemingly different optimization models, such as Lasso and SVM, under a common duality structure. As we will see, SVM shares the same structure as the dual of Lasso.

Fenchel–Rockafellar Duality

Let $f : \mathbb{R}^m \to \mathbb{R} \cup \{+\infty\}$ and $g : \mathbb{R}^n \to \mathbb{R} \cup \{+\infty\}$ be proper, convex, and lower semicontinuous functions. Let $A : \mathbb{R}^n \to \mathbb{R}^m$ be a linear map. Consider the primal problem:

$\min_{x \in \mathbb{R}^n} \left\{ f(Ax) + g(x) \right\}$

The Fenchel–Rockafellar dual is:

$\max_{u \in \mathbb{R}^m} \left\{ -f^*(-u) - g^*(A^\top u) \right\}$

Note that weak duality holds. Let $x \in \operatorname{dom}(g)$ and $u \in \operatorname{dom}(g^* \circ A^\top)$ , and define $y = Ax$ . Then, by the Fenchel–Young inequality:

$f(y) + f^*(-u) \ge \langle y, -u \rangle = -\langle u, Ax \rangle$
$g(x) + g^*(A^\top u) \ge \langle x, A^\top u \rangle = \langle Ax, u \rangle$

Adding both inequalities:

$f(Ax) + g(x) + f^*(-u) + g^*(A^\top u) \ge 0$

This yields:

$f(Ax) + g(x) \ge -f^*(-u) - g^*(A^\top u)$

Hence, weak duality:

$\min_x \left\{ f(Ax) + g(x) \right\} \ge \max_u \left\{ -f^*(-u) - g^*(A^\top u) \right\}$

Under mild regularity conditions, strong duality holds — i.e., the optimal values of the primal and dual problems coincide, and solutions exist.

Applications

SVM

We consider the hard-margin support vector machine (SVM) for binary classification. Given training data $\{(x_i, y_i)\}_{i=1}^m$ with $x_i \in \mathbb{R}^n$ and $y_i \in \{-1, +1\}$ , the SVM problem with margin parameter $\gamma > 0$ is:

$\min_{w \in \mathbb{R}^n,\; w_0 \in \mathbb{R}} \left\{ \frac{1}{2} \|w\|_2^2 \;:\; y_i(w^\top x_i + w_0) \ge \gamma,\; i = 1, \dots, m \right\}$

Matrix-vector form

Define:

$X \in \mathbb{R}^{m \times n}$ with rows $x_i^\top$ ,
$y \in \mathbb{R}^m$ is the label vector,
$D = \mathrm{diag}(y) \in \mathbb{R}^{m \times m}$ ,
$\mathbf{1}_m$ is the vector of all ones in $\mathbb{R}^m$ .

Then the problem becomes:

$\min_{w \in \mathbb{R}^n,\; w_0 \in \mathbb{R}} \left\{ \frac{1}{2} \|w\|_2^2 \;:\; D(Xw + w_0 \mathbf{1}_m) \ge \gamma \mathbf{1}_m \right\}$

Remark: If $w_0 = 0$ , then the problem reduces to:

$\min_{w \in \mathbb{R}^n} \left\{ \frac{1}{2} \|w\|_2^2 \;:\; DXw \ge \gamma \mathbf{1}_m \right\}$

Lasso and Its Dual

The Lasso problem is a fundamental model in sparse learning and signal recovery:

$\min_{x \in \mathbb{R}^n} \left\{ \frac{1}{2} \|Ax - b\|_2^2 + \lambda \|x\|_1 \right\}$

Here:

$A \in \mathbb{R}^{m \times n}$ is the design matrix,
$b \in \mathbb{R}^m$ is the observation vector,
$\lambda > 0$ is the regularization parameter.

Nonnegative Lasso

The nonnegative Lasso adds a positivity constraint:

$\min_{x \in \mathbb{R}^n} \left\{ \frac{1}{2} \|Ax - b\|_2^2 + \lambda \langle \mathbf{1}, x \rangle \;:\; x \ge 0 \right\}$

Choose:

$f(y) = \frac{1}{2} \|y - b\|_2^2$ ,
$g(x) = \lambda \langle \mathbf{1}, x \rangle + \delta_{\mathbb{R}_+^n}(x)$

Dual problem of nonnegative Lasso

Applying Fenchel–Rockafellar duality:

$\max_{u \in \mathbb{R}^m} \left\{ -f^*(-u) - g^*(A^\top u) \right\}$

Since $f^*(-u) = \frac{1}{2} \|u\|_2^2 - \langle b, u \rangle$ , we get:

$\min_{u \in \mathbb{R}^m} \left\{ \frac{1}{2} \|u\|_2^2 - \langle b, u \rangle \;:\; A^\top u \le \lambda \mathbf{1} \right\}$

This is equivalent to:

$\min_{u \in \mathbb{R}^m} \left\{ \frac{1}{2} \|b - u\|_2^2 - \frac{1}{2} \|b\|_2^2 \;:\; A^\top u \le \lambda \mathbf{1} \right\}$

Let $v = b - u$ , then the dual becomes:

$\min_{v \in \mathbb{R}^m} \left\{ \frac{1}{2} \|v\|_2^2 \;:\; A^\top v \ge A^\top b - \lambda \mathbf{1} \right\}$

Remark:
This dual Lasso is also a quadratic optimization problem with linear inequality constraints.

Now observe the similarity with the SVM formulation:

$\min_{w \in \mathbb{R}^n} \left\{ \frac{1}{2} \|w\|_2^2 \;:\; DXw \ge \gamma \mathbf{1}_m \right\}$

Summary:
This post explores the deep connection between Lasso and Support Vector Machines (SVM) by deriving their dual formulations using Fenchel–Rockafellar duality. It demonstrates that both models share a common dual structure — minimizing a quadratic function under linear inequality constraints. This duality perspective not only unifies the two models but also provides insight into their geometric and optimization similarities.

Tran Thu Le

Pages

Wednesday, July 2, 2025