Robust-SVM-as-a-Cone-Programming

From Quadratic Programming to Cone Programming: SVM vs Robust SVM

In convex optimization, many machine learning models are not just algorithms but specific instances of structured optimization problems. One of the cleanest examples is the Support Vector Machine.

A standard soft-margin SVM naturally leads to a quadratic program. However, once we introduce uncertainty into the data, we obtain Robust-SVM, the structure of the problem changes in a fundamental way. The optimization problem is no longer purely quadratic with linear constraints. Instead, norms appear, and norms bring us into the world of cone programming.

The following image depicts a robust SVM in a 2D feature space, where input uncertainty is modeled as $\ell_2$ balls around each data point in a linearly separable setting.

The transition can be summarized as:

Standard soft-margin SVM → Quadratic Programming (QP)
Robust SVM → Second-Order Cone Programming (SOCP)

The key idea is that uncertainty introduces norms, and norms naturally lead to second-order cone constraints.

Soft-Margin SVM as a Quadratic Program
Robust SVM and Uncertainty
From Robust Constraints to Cone Constraints
Final Cone Program
Takeaway

1. Soft-Margin SVM as a Quadratic Program

We consider the standard binary classification setting:

Data: $(x_i, y_i)$ with $y_i \in \{-1, +1\}$
Linear classifier: $f(x) = w^T x$

1.1 Hinge loss formulation

The soft-margin SVM minimizes hinge loss with $\ell_2$ regularization:

$\min_w \quad \frac{1}{2}\|w\|_2^2 + C \sum_{i=1}^n \max(0,\, 1 - y_i w^T x_i)$

1.2 Introduce slack variables

Introduce variables $\xi_i \ge 0$ such that:

$\xi_i \ge 1 - y_i w^T x_i$

Then the problem becomes:

$\begin{aligned} \min_{w,\xi} \quad & \frac{1}{2}\|w\|_2^2 + C \sum_{i=1}^n \xi_i \\ \text{s.t.} \quad & y_i w^T x_i \ge 1 - \xi_i, \quad i=1,\dots,n \\ & \xi_i \ge 0 \end{aligned}$

1.3 Quadratic Programming form

This is a quadratic program:

Objective: quadratic in $w$
Constraints: linear

$\min_x \; \frac{1}{2} x^T Q x + c^T x \quad \text{s.t. } Ax \le b$

Therefore:

Soft-margin SVM is a convex QP

2. Robust SVM and Uncertainty

Now suppose the input data is uncertain:

$x_i \to x_i + \delta_i, \quad \|\delta_i\|_2 \le \rho_i$

We want the classifier to remain feasible under worst-case perturbations.

2.1 Robust formulation

We require:

$y_i w^T (x_i + \delta_i) \ge 1 - \xi_i \quad \forall \|\delta_i\|_2 \le \rho_i$

This leads to:

$\begin{aligned} \min_{w,\xi} \quad & \frac{1}{2}\|w\|_2^2 + C \sum_{i=1}^n \xi_i \\ \text{s.t.} \quad & \min_{\|\delta_i\|_2 \le \rho_i} y_i w^T (x_i + \delta_i) \ge 1 - \xi_i \end{aligned}$

2.2 Solving the inner minimization

Focus on:

$\min_{\|\delta_i\|_2 \le \rho_i} y_i w^T (x_i + \delta_i)$

Split:

$= y_i w^T x_i + \min_{\|\delta_i\|_2 \le \rho_i} y_i w^T \delta_i$

The inner term becomes:

$\min_{\|\delta_i\|_2 \le \rho_i} y_i w^T \delta_i = -\rho_i \|w\|_2$

by the Cauchy–Schwarz inequality.

Thus the robust constraint becomes:

$y_i w^T x_i - \rho_i \|w\|_2 \ge 1 - \xi_i$

or equivalently:

$1 - y_i w^T x_i + \rho_i \|w\|_2 - \xi_i \le 0$

3. From Robust Constraints to Cone Constraints

The optimization problem is now:

$\begin{aligned} \min_{w,\xi} \quad & \frac{1}{2}\|w\|_2^2 + C \sum_i \xi_i \\ \text{s.t.} \quad & 1 - y_i w^T x_i + \rho_i \|w\|_2 - \xi_i \le 0 \\ & \xi_i \ge 0 \end{aligned}$

The only nonlinearity in the constraints is the norm $\|w\|_2$ .

3.1 Introduce auxiliary variable $t$

Introduce $t$ such that:

$\|w\|_2 \le t$

Then constraints become:

$1 - y_i w^T x_i + \rho_i t - \xi_i \le 0$

So we obtain:

$\begin{aligned} \min_{w,\xi,t} \quad & \frac{1}{2}\|w\|_2^2 + C \sum_i \xi_i \\ \text{s.t.} \quad & 1 - y_i w^T x_i + \rho_i t - \xi_i \le 0 \\ & \|w\|_2 \le t \\ & \xi_i \ge 0 \end{aligned}$

The constraint

$\|w\|_2 \le t$

is a second-order cone constraint:

$(t, w) \in \mathcal{Q}^{d+1}$

3.2 Converting quadratic objective to SOC form

To express everything in conic form, we introduce an auxiliary variable $r$ .

We use the equivalence:

$\|w\|_2^2 \le r \;\;\Longleftrightarrow\;\; \|(2w,\; 1-r)\|_2 \le 1+r$

Indeed,

$\|(2w,\; 1-r)\|_2^2 = 4\|w\|_2^2 + (1-r)^2$

The SOC constraint:

$\|(2w,\; 1-r)\|_2 \le 1+r$

is equivalent to:

$4\|w\|_2^2 + (1-r)^2 \le (1+r)^2$

Expand:

$4\|w\|_2^2 + 1 - 2r + r^2 \le 1 + 2r + r^2$

Cancel terms:

$4\|w\|_2^2 \le 4r \;\;\Longleftrightarrow\;\; \|w\|_2^2 \le r$

3.3 Final Cone Program

We now obtain a second-order cone program:

$\begin{aligned} \min_{w,\xi,t,r} \quad & \frac{1}{2} r + C \sum_i \xi_i \\ \text{s.t.} \quad & 1 - y_i w^T x_i + \rho_i t - \xi_i \le 0 \\ & \|w\|_2 \le t \\ & \|(2w,\; 1-r)\|_2 \le 1+r \\ & \xi_i \ge 0 \end{aligned}$

All constraints are now either linear or second-order cone constraints.

4. Takeaway

Robustification changes the geometry of the problem.

Linear constraints become norm constraints
Polyhedral feasible sets become conic sets
Quadratic programs become second-order cone programs

In short:

Robust SVM = SVM + uncertainty → Cone Programming

Tran Thu Le

Pages

Sunday, March 29, 2026

Robust-SVM-as-a-Cone-Programming

From Quadratic Programming to Cone Programming: SVM vs Robust SVM

Table of Contents

1. Soft-Margin SVM as a Quadratic Program

1.1 Hinge loss formulation

1.2 Introduce slack variables

1.3 Quadratic Programming form

2. Robust SVM and Uncertainty

2.1 Robust formulation

2.2 Solving the inner minimization

3. From Robust Constraints to Cone Constraints

3.1 Introduce auxiliary variable $t$

3.2 Converting quadratic objective to SOC form

3.3 Final Cone Program

4. Takeaway

Popular Posts

Pages

Sunday, March 29, 2026

Robust-SVM-as-a-Cone-Programming

From Quadratic Programming to Cone Programming: SVM vs Robust SVM

Table of Contents

1. Soft-Margin SVM as a Quadratic Program

1.1 Hinge loss formulation

1.2 Introduce slack variables

1.3 Quadratic Programming form

2. Robust SVM and Uncertainty

2.1 Robust formulation

2.2 Solving the inner minimization

3. From Robust Constraints to Cone Constraints

3.1 Introduce auxiliary variable ttt

3.2 Converting quadratic objective to SOC form

3.3 Final Cone Program

4. Takeaway

Popular Posts

3.1 Introduce auxiliary variable $t$