From Quadratic Programming to Cone Programming: SVM vs Robust SVM
In convex optimization, many machine learning models are not just algorithms but specific instances of structured optimization problems. One of the cleanest examples is the Support Vector Machine.
A standard soft-margin SVM naturally leads to a quadratic program. However, once we introduce uncertainty into the data, we obtain Robust-SVM, the structure of the problem changes in a fundamental way. The optimization problem is no longer purely quadratic with linear constraints. Instead, norms appear, and norms bring us into the world of cone programming.
The following image depicts a robust SVM in a 2D feature space, where input uncertainty is modeled as ℓ2 balls around each data point in a linearly separable setting.
The transition can be summarized as:
Standard soft-margin SVM → Quadratic Programming (QP)
Robust SVM → Second-Order Cone Programming (SOCP)
The key idea is that uncertainty introduces norms, and norms naturally lead to second-order cone constraints.
Table of Contents
- Soft-Margin SVM as a Quadratic Program
- Robust SVM and Uncertainty
- From Robust Constraints to Cone Constraints
- Final Cone Program
- Takeaway
1. Soft-Margin SVM as a Quadratic Program
We consider the standard binary classification setting:
- Data: (xi,yi) with yi∈{−1,+1}
- Linear classifier: f(x)=wTx
The soft-margin SVM minimizes hinge loss with ℓ2 regularization:
wmin21∥w∥22+Ci=1∑nmax(0,1−yiwTxi)
1.2 Introduce slack variables
Introduce variables ξi≥0 such that:
ξi≥1−yiwTxi
Then the problem becomes:
w,ξmins.t.21∥w∥22+Ci=1∑nξiyiwTxi≥1−ξi,i=1,…,nξi≥0
This is a quadratic program:
- Objective: quadratic in w
- Constraints: linear
xmin21xTQx+cTxs.t. Ax≤b
Therefore:
Soft-margin SVM is a convex QP
2. Robust SVM and Uncertainty
Now suppose the input data is uncertain:
xi→xi+δi,∥δi∥2≤ρi
We want the classifier to remain feasible under worst-case perturbations.
We require:
yiwT(xi+δi)≥1−ξi∀∥δi∥2≤ρi
This leads to:
w,ξmins.t.21∥w∥22+Ci=1∑nξi∥δi∥2≤ρiminyiwT(xi+δi)≥1−ξi
2.2 Solving the inner minimization
Focus on:
∥δi∥2≤ρiminyiwT(xi+δi)
Split:
=yiwTxi+∥δi∥2≤ρiminyiwTδi
The inner term becomes:
∥δi∥2≤ρiminyiwTδi=−ρi∥w∥2
by the Cauchy–Schwarz inequality.
Thus the robust constraint becomes:
yiwTxi−ρi∥w∥2≥1−ξi
or equivalently:
1−yiwTxi+ρi∥w∥2−ξi≤0
3. From Robust Constraints to Cone Constraints
The optimization problem is now:
w,ξmins.t.21∥w∥22+Ci∑ξi1−yiwTxi+ρi∥w∥2−ξi≤0ξi≥0
The only nonlinearity in the constraints is the norm ∥w∥2.
3.1 Introduce auxiliary variable t
Introduce t such that:
∥w∥2≤t
Then constraints become:
1−yiwTxi+ρit−ξi≤0
So we obtain:
w,ξ,tmins.t.21∥w∥22+Ci∑ξi1−yiwTxi+ρit−ξi≤0∥w∥2≤tξi≥0
The constraint
∥w∥2≤t
is a second-order cone constraint:
(t,w)∈Qd+1
To express everything in conic form, we introduce an auxiliary variable r.
We use the equivalence:
∥w∥22≤r⟺∥(2w,1−r)∥2≤1+r
Indeed,
∥(2w,1−r)∥22=4∥w∥22+(1−r)2
The SOC constraint:
∥(2w,1−r)∥2≤1+r
is equivalent to:
4∥w∥22+(1−r)2≤(1+r)2
Expand:
4∥w∥22+1−2r+r2≤1+2r+r2
Cancel terms:
4∥w∥22≤4r⟺∥w∥22≤r
3.3 Final Cone Program
We now obtain a second-order cone program:
w,ξ,t,rmins.t.21r+Ci∑ξi1−yiwTxi+ρit−ξi≤0∥w∥2≤t∥(2w,1−r)∥2≤1+rξi≥0
All constraints are now either linear or second-order cone constraints.
4. Takeaway
Robustification changes the geometry of the problem.
- Linear constraints become norm constraints
- Polyhedral feasible sets become conic sets
- Quadratic programs become second-order cone programs
In short:
Robust SVM = SVM + uncertainty → Cone Programming