Sunday, March 29, 2026

Robust-SVM-as-a-Cone-Programming

Robust-SVM-as-a-Cone-Programming

From Quadratic Programming to Cone Programming: SVM vs Robust SVM

In convex optimization, many machine learning models are not just algorithms but specific instances of structured optimization problems. One of the cleanest examples is the Support Vector Machine.

A standard soft-margin SVM naturally leads to a quadratic program. However, once we introduce uncertainty into the data, we obtain Robust-SVM, the structure of the problem changes in a fundamental way. The optimization problem is no longer purely quadratic with linear constraints. Instead, norms appear, and norms bring us into the world of cone programming.

The following image depicts a robust SVM in a 2D feature space, where input uncertainty is modeled as 2\ell_2 balls around each data point in a linearly separable setting.

Robust SVM

The transition can be summarized as:

Standard soft-margin SVM → Quadratic Programming (QP)
Robust SVM → Second-Order Cone Programming (SOCP)

The key idea is that uncertainty introduces norms, and norms naturally lead to second-order cone constraints.


Table of Contents

  1. Soft-Margin SVM as a Quadratic Program
  2. Robust SVM and Uncertainty
  3. From Robust Constraints to Cone Constraints
  4. Final Cone Program
  5. Takeaway

1. Soft-Margin SVM as a Quadratic Program

We consider the standard binary classification setting:

  • Data: (xi,yi)(x_i, y_i) with yi{1,+1}y_i \in \{-1, +1\}
  • Linear classifier: f(x)=wTxf(x) = w^T x

1.1 Hinge loss formulation

The soft-margin SVM minimizes hinge loss with 2\ell_2 regularization:

minw12w22+Ci=1nmax(0,1yiwTxi) \min_w \quad \frac{1}{2}\|w\|_2^2 + C \sum_{i=1}^n \max(0,\, 1 - y_i w^T x_i)


1.2 Introduce slack variables

Introduce variables ξi0\xi_i \ge 0 such that:

ξi1yiwTxi \xi_i \ge 1 - y_i w^T x_i

Then the problem becomes:

minw,ξ12w22+Ci=1nξis.t.yiwTxi1ξi,i=1,,nξi0 \begin{aligned} \min_{w,\xi} \quad & \frac{1}{2}\|w\|_2^2 + C \sum_{i=1}^n \xi_i \\ \text{s.t.} \quad & y_i w^T x_i \ge 1 - \xi_i, \quad i=1,\dots,n \\ & \xi_i \ge 0 \end{aligned}


1.3 Quadratic Programming form

This is a quadratic program:

  • Objective: quadratic in ww
  • Constraints: linear

minx  12xTQx+cTxs.t. Axb \min_x \; \frac{1}{2} x^T Q x + c^T x \quad \text{s.t. } Ax \le b

Therefore:

Soft-margin SVM is a convex QP


2. Robust SVM and Uncertainty

Now suppose the input data is uncertain:

xixi+δi,δi2ρi x_i \to x_i + \delta_i, \quad \|\delta_i\|_2 \le \rho_i

We want the classifier to remain feasible under worst-case perturbations.


2.1 Robust formulation

We require:

yiwT(xi+δi)1ξiδi2ρi y_i w^T (x_i + \delta_i) \ge 1 - \xi_i \quad \forall \|\delta_i\|_2 \le \rho_i

This leads to:

minw,ξ12w22+Ci=1nξis.t.minδi2ρiyiwT(xi+δi)1ξi \begin{aligned} \min_{w,\xi} \quad & \frac{1}{2}\|w\|_2^2 + C \sum_{i=1}^n \xi_i \\ \text{s.t.} \quad & \min_{\|\delta_i\|_2 \le \rho_i} y_i w^T (x_i + \delta_i) \ge 1 - \xi_i \end{aligned}


2.2 Solving the inner minimization

Focus on:

minδi2ρiyiwT(xi+δi) \min_{\|\delta_i\|_2 \le \rho_i} y_i w^T (x_i + \delta_i)

Split:

=yiwTxi+minδi2ρiyiwTδi = y_i w^T x_i + \min_{\|\delta_i\|_2 \le \rho_i} y_i w^T \delta_i

The inner term becomes:

minδi2ρiyiwTδi=ρiw2 \min_{\|\delta_i\|_2 \le \rho_i} y_i w^T \delta_i = -\rho_i \|w\|_2

by the Cauchy–Schwarz inequality.

Thus the robust constraint becomes:

yiwTxiρiw21ξi y_i w^T x_i - \rho_i \|w\|_2 \ge 1 - \xi_i

or equivalently:

1yiwTxi+ρiw2ξi0 1 - y_i w^T x_i + \rho_i \|w\|_2 - \xi_i \le 0


3. From Robust Constraints to Cone Constraints

The optimization problem is now:

minw,ξ12w22+Ciξis.t.1yiwTxi+ρiw2ξi0ξi0 \begin{aligned} \min_{w,\xi} \quad & \frac{1}{2}\|w\|_2^2 + C \sum_i \xi_i \\ \text{s.t.} \quad & 1 - y_i w^T x_i + \rho_i \|w\|_2 - \xi_i \le 0 \\ & \xi_i \ge 0 \end{aligned}

The only nonlinearity in the constraints is the norm w2\|w\|_2.


3.1 Introduce auxiliary variable tt

Introduce tt such that:

w2t \|w\|_2 \le t

Then constraints become:

1yiwTxi+ρitξi0 1 - y_i w^T x_i + \rho_i t - \xi_i \le 0

So we obtain:

minw,ξ,t12w22+Ciξis.t.1yiwTxi+ρitξi0w2tξi0 \begin{aligned} \min_{w,\xi,t} \quad & \frac{1}{2}\|w\|_2^2 + C \sum_i \xi_i \\ \text{s.t.} \quad & 1 - y_i w^T x_i + \rho_i t - \xi_i \le 0 \\ & \|w\|_2 \le t \\ & \xi_i \ge 0 \end{aligned}

The constraint

w2t \|w\|_2 \le t

is a second-order cone constraint:

(t,w)Qd+1 (t, w) \in \mathcal{Q}^{d+1}


3.2 Converting quadratic objective to SOC form

To express everything in conic form, we introduce an auxiliary variable rr.

We use the equivalence:

w22r        (2w,  1r)21+r \|w\|_2^2 \le r \;\;\Longleftrightarrow\;\; \|(2w,\; 1-r)\|_2 \le 1+r

Indeed,

(2w,  1r)22=4w22+(1r)2 \|(2w,\; 1-r)\|_2^2 = 4\|w\|_2^2 + (1-r)^2

The SOC constraint:

(2w,  1r)21+r \|(2w,\; 1-r)\|_2 \le 1+r

is equivalent to:

4w22+(1r)2(1+r)2 4\|w\|_2^2 + (1-r)^2 \le (1+r)^2

Expand:

4w22+12r+r21+2r+r2 4\|w\|_2^2 + 1 - 2r + r^2 \le 1 + 2r + r^2

Cancel terms:

4w224r        w22r 4\|w\|_2^2 \le 4r \;\;\Longleftrightarrow\;\; \|w\|_2^2 \le r


3.3 Final Cone Program

We now obtain a second-order cone program:

minw,ξ,t,r12r+Ciξis.t.1yiwTxi+ρitξi0w2t(2w,  1r)21+rξi0 \begin{aligned} \min_{w,\xi,t,r} \quad & \frac{1}{2} r + C \sum_i \xi_i \\ \text{s.t.} \quad & 1 - y_i w^T x_i + \rho_i t - \xi_i \le 0 \\ & \|w\|_2 \le t \\ & \|(2w,\; 1-r)\|_2 \le 1+r \\ & \xi_i \ge 0 \end{aligned}

All constraints are now either linear or second-order cone constraints.


4. Takeaway

Robustification changes the geometry of the problem.

  • Linear constraints become norm constraints
  • Polyhedral feasible sets become conic sets
  • Quadratic programs become second-order cone programs

In short:

Robust SVM = SVM + uncertainty → Cone Programming

Popular Posts