Wednesday, September 20, 2023

convexity_and_smoothness

convexity_and_smoothness

On the Duality Between Uniform Convexity and Uniform Smoothness

The duality result between strong convexity and strong smoothness is a well known result, see e.g. [1]. A generalization of this duality is a duality result between more general concepts uniform convexity and uniform smoothness, its proof can be found e.g. in the book of Zalinescu [3, Prop. 3.5.3]. Although the proof idea of Zalinescu is correct, there are some errors and typos in the definitions, theorem statement and even the proof. In this note, we revisit and rigorously prove this result of Zalinescu.

More importantly, we show that
uniform convexity on domainuniform smoothness on entire space.\text{uniform convexity on domain} \Longleftrightarrow \text{uniform smoothness on entire space.}

As a consequence
strong convexity on domainstrong smoothness on entire space.\text{strong convexity on domain} \Longleftrightarrow \text{strong smoothness on entire space.}

1. Basic concepts

Let (X,)(X, ||\cdot||) be a normed space and (X,)(X^*, ||\cdot||_*) be its continuous dual space. Let AX\mathcal{A}_X be class of non-negative vector-variable functions vanishing at 00, i.e. AX={φ:X[0,+] so that φ(0)=0}\mathcal{A}_X=\{\varphi: X\rightarrow [0, +\infty] \text{ so that } \varphi(0)=0\}. Here, note that we do not assume that the functions in AX\mathcal A_X are symmetric.

For f:X[,+]f: X\rightarrow [-\infty, +\infty], its convex conjugate f:X[,+]f^*: X^*\rightarrow [-\infty, +\infty] is defined as follows

f(x)=supxXx,xf(x)f^*(x^*)=\sup_{x\in X} \quad \langle x^*, x\rangle-f(x)

It is clear that if φAX\varphi\in \mathcal{A}_X, then φAX\varphi^*\in \mathcal{A}_{X^*}.

For ρAX\rho\in \mathcal A_X, ff is said to be ρ\rho-uniformly convex on its domain domfdom f, if

f((1λ)x+λy)+(1λ)λρ(xy)(1λ)f(x)+λf(y)f((1-\lambda)x+\lambda y)+ (1-\lambda)\lambda \textcolor{blue}{\rho(x-y)}\textcolor{red}{\leq}(1-\lambda)f(x)+\lambda f(y)

for all x,ydomfx, y\in dom f and λ[0,1]\lambda \in [0,1]. By the non-negativity of ρ\rho, uniform convexity implies convexity. When ρ(x)=α2xX2\rho(x)=\frac{\alpha}{2}||x||^2_X, ff is said to be α\alpha-strongly convex.

Similarly, for σAX\sigma\in \mathcal A_X, ff is said to be σ\sigma-uniformly smooth if it is convex and satisfies

f((1λ)x+λy)+(1λ)λσ(xy)(1λ)f(x)+λf(y)f((1-\lambda)x+\lambda y)+ (1-\lambda)\lambda \textcolor{blue}{\sigma(x-y)}\textcolor{red}{\geq} (1-\lambda)f(x)+\lambda f(y)

for all x,ydomfx, y\in dom f, λ[0,1]\lambda \in [0,1]. When σ(x)=β2xX2\sigma(x)=\frac{\beta}{2}||x||^2_X, ff is said to be β\beta-strongly smooth. In particular, we say that ff is globally σ\sigma-uniformly smooth, if it is σ\sigma-uniformly smooth and domf=Xdom f= X.

In the following, we will exploit the following transformation:

{x=(1λ)x+λyy=x+y{x=xλyy=x+(1λ)y\begin{cases} x' = (1-\lambda)x + \lambda y\\ y ' = -x+y\end{cases} \Longleftrightarrow \begin{cases} x = x' - \lambda y'\\ y = x' + (1-\lambda) y'\end{cases}

By this transformation, σ\sigma-uniform smoothness has an equivalent definition

f(x)+(1λ)λσ(y)(1λ)f(xλy)+λf(x+(1λ)y),f(x')+ (1-\lambda)\lambda \textcolor{blue}{\sigma(-y')}\geq (1-\lambda)f(x'-\lambda y')+\lambda f(x'+(1-\lambda)y'),

for all x,yXx', y'\in X and λ[0,1]\lambda\in [0,1] such that xλyx'-\lambda y' and x+(1λ)ydomfx' + (1-\lambda)y' \in dom f.

Remark. Note that if ff is uniformly smooth, then the Frechet derivative of ff exists and is uniformly continuous, see Theorem 3.5.6. page (xi) 207-208 [3]. The existence of Frechet derivative explains why we call it smoothness. Furthermore, if ff is strongly smooth, then the Frechet derivative is even Lipschitz continuous, see Corollary 3.5.7. page 212-213 (vi) [2].

2. Duality Result

Theorem (Revise [3, Prop. 3.5.3])
Assume that ff is a closed proper convex function.

  • If ff is ρ\rho-uniformly convex on domfdom f, then ff^* is ρ\rho^*-uniformly smooth on XX^*.
  • If ff is σ\sigma-uniformly smooth on XX, then ff^* is σ\sigma^*-uniformly convex on domfdom f^*.

Proof.

(s.convex \Rightarrow s. smooth) Assume that ff is ρ\rho-uniformly convex.

Note that
f(x)=arg maxxXx,xf(x),xX.\partial f^*(x^*) = \argmax_{x\in X} \langle x^*, x\rangle -f(x), \quad \forall x^*\in X^*.
Since ff is closed, proper and uniformly convex, the argmax is well-defined and singleton. So is f\partial f^*. Thus, ff^* is differentiable everywhere on XX^* and domf=Xdom f^* =X^*.

Let x0,x1Xx_0^*, x_1^* \in X^* be arbitrary and λ[0,1]\lambda\in [0,1].

To prove that ff^* is globally ρ\rho^*-uniformly smooth, we show that

(1λ)f(x0)+λf(x1)f(x)+λ(1λ)ρ(y).(1-\lambda)f^*(x_0^*)+\lambda f^*(x_1^*)\leq f^*(x^*)+ \lambda(1-\lambda) \rho^*(-y^*).

where x:=(1λ)x0+λx1x^* := (1-\lambda) x^*_0+ \lambda x_1^* and y:=x0+x1y^* := -x_0^*+x_1^*. Conversely, we have x0:=xλyx_0^*:=x^*-\lambda y^* and x1:=x+(1λ)yx_1^*:=x^*+(1-\lambda)y^*.

Let γ0<f(x0)\gamma_0<f^*(x_0^*) and γ1<f(x1)\gamma_1<f^*(x_1^*). By definition of conjugate, there exist x0,x1domfx_0, x_1\in dom f such that γ0<x0,x0f(x0)\gamma_0<\langle x_0, x_0^*\rangle - f(x_0) and γ1<x1,x1f(x1)\gamma_1<\langle x_1, x_1^*\rangle - f(x_1). Multiplying the first inequality by 1λ1-\lambda and the second by λ\lambda and then adding both side to Fenchel’s inequality, 0f(xλ)+f(x)xλ,x0\leq f(x_\lambda)+f^*(x^*)-\langle x_\lambda, x^*\rangle, where xλ=(1λ)x0+λx1x_\lambda=(1-\lambda)x_0+\lambda x_1, we can end up with

(1λ)γ0+λγ1<(1λ)[x0,x0f(x0)]+λ[x1,x1f(x1)]+f(xλ)+f(x)xλ,x=f(x)+[f(xλ)(1λ)f(x0)λf(x1)]+(1λ)x0,x0+λx1,x1xλ,xf(x)+λ(1λ)[x1x0,yρ(x0x1)]=f(x)+λ(1λ)[x0x1,yρ(x0x1)]f(x)+λ(1λ)ρ(y).\begin{aligned}(1-\lambda)\gamma_0+\lambda \gamma_1 <& (1-\lambda)[\langle x_0, x_0^*\rangle - f(x_0)] + \lambda [\langle x_1, x_1^*\rangle - f(x_1)] \\ & + f(x_\lambda)+f^*(x^*)-\langle x_\lambda, x^*\rangle\\ = & f^*(x^*) + [f(x_\lambda) - (1-\lambda) f(x_0) - \lambda f(x_1)]\\ &+ (1-\lambda)\langle x_0, x_0^*\rangle + \lambda \langle x_1, x_1^*\rangle - \langle x_\lambda, x^*\rangle\\ \leq & f^*(x^*) + \lambda(1-\lambda)[ \langle x_1-x_0, y^* \rangle - \rho(x_0-x_1)] \\ = & f^*(x^*) + \lambda(1-\lambda)[ \langle x_0-x_1, -y^* \rangle - \rho(x_0-x_1)] \\ \leq & f^*(x^*)+ \lambda(1-\lambda) \rho^*(-y^*).\end{aligned}

Here, we implicitly apply the fact that ff is ρ\rho-uniformly convex at the second inequality. Taking γif(xi)\gamma_i\rightarrow f^*(x^*_i) for i=1,2i=1,2, the ρ\rho^*-uniform smoothness of ff^* is proved.

(s.smooth \Rightarrow s.convex) Suppose that ff is globally σ\sigma-uniformly smooth. Let any x0,x1domfx_0^*, x_1^*\in dom f^* and λ[0,1]\lambda \in [0,1]. Denote xλ:=(1λ)x0+λx1x_\lambda^* := (1 - \lambda)x_0^*+\lambda x_1^*. For any x,yXx, y \in X, we have

(1λ)f(x0)+λf(x1)(1λ)[xλy,x0f(xλy)]+λ[x+(1λ)y,x1f(x+(1λ)y)]=[x,xλ+λ(1λ)y,x1x0][(1λ)f(xλy)+λf(x+(1λ)y)][x,xλ+λ(1λ)y,x1x0][f(x)+λ(1λ)σ(y)]=x,xλf(x)+λ(1λ)[y,x1x0σ(y)]=x,xλf(x)+λ(1λ)[y,x0x1σ(y)]\begin{aligned}& (1-\lambda)f^*(x_0^*)+\lambda f^*(x_1^*)\\ \geq & (1-\lambda) [\langle x-\lambda y, x_0^*\rangle -f(x-\lambda y)] + \lambda [ \langle x+(1-\lambda) y, x_1^*\rangle -f(x+(1-\lambda) y) ]\\ =& [\langle x, x^*_\lambda\rangle + \lambda(1-\lambda) \langle y, x_1^*-x_0^*\rangle] -[(1-\lambda) f(x-\lambda y) + \lambda f(x+(1-\lambda) y)] \\ \geq & [\langle x, x^*_\lambda\rangle + \lambda(1-\lambda) \langle y, x_1^*-x_0^*\rangle] -[f(x) + \lambda (1-\lambda) \sigma(-y)] \\ =& \langle x, x^*_\lambda\rangle - f(x) + \lambda(1-\lambda) [\langle y, x_1^*-x_0^*\rangle - \sigma(-y)]\\ =& \langle x, x^*_\lambda\rangle - f(x) + \lambda(1-\lambda) [\langle -y, x_0^*-x_1^*\rangle - \sigma(-y)]\end{aligned}

Here, the second inequality holds since ff is assumed to be globally σ\sigma-uniformly smooth.

Since the result holds for any x,yXx, y\in X, we may take the supremum and obtain

(1λ)f(x0)+λf(x1)f(xλ)+λ(1λ)σ(x0x1).(1-\lambda)f^*(x_0^*)+\lambda f^*(x_1^*) \geq f^*(x^*_\lambda) + \lambda (1-\lambda)\sigma^*(x_0^*-x_1^*).

Therefore, ff^* is σ\sigma^*-uniformly convex. \square

Remark. If we only assume that ff is uniformly smooth on its domain, then it is not sufficient to guarantee the uniform convexity of ff^*. A counter example in Rn\mathbb R^n is f(x)=12x22+I(x21)f(x) = \frac{1}{2}||x||_2^2 + I(||x||_2\leq 1), then

f(x)={12x22,x21x212,otherwise.f^*(x^*) = \begin{cases} \displaystyle \frac{1}{2}||x^*||_2^2 &, \quad ||x^*||_2\leq 1\\ \displaystyle ||x^*||_2-\frac{1}{2} &, \quad \text{otherwise.}\end{cases}

In this case, ff is 11-strongly smooth on its domain, but ff^* contains linear part, thus not strictly/uniformly/strongly convex.

3. References

[1] Kakade, Sham, Shai Shalev-Shwartz, and Ambuj Tewari. “On the duality of strong convexity and strong smoothness: Learning applications and matrix regularization.” (2009): 35.

[2] Zalinescu, C. “On uniformly convex functions.” Journal of Mathematical Analysis and Applications 95.2 (1983): 344-374.

[3] Zalinescu, Constantin. “Convex analysis in general vector spaces”. World scientific, 2002.

Popular Posts