Friday, February 3, 2023

Fenchel inequality with Proximal operator

Fenchel inequality with Proximal operator

Fenchel inequality with Proximal operator

Preliminaries

Let h:HR+h: H\rightarrow \mathbb R\cup{+\infty} be closed proper convex function on the Hilbert space HH.

A vector gHg\in H is called a subgradient of hh at u0Hu_0\in H if
h(u)h(u0)+g,uu0,uH.h(u) \geq h(u_0) +\langle g, u-u_0\rangle, \forall u\in H.
The set of all subgradients at u0u_0 is called subdifferential denoted by h(u0)\partial h(u_0). Thus the subdifferential can be considered as a set-value function, say
h:HH\partial h: H \rightrightarrows H

We shall identify h\partial h with its graph
h{(u,v):vh(u)}.\partial h \equiv \{(u, v): v\in \partial h(u)\}.

The conjugate function (a.k.a Fenchel conjugate or convex conjugate) of hh is h:HR+h^*: H\rightarrow \mathbb R\cup{+\infty} so that
h(v)supuHv,uh(u)h^*(v) \triangleq \sup_{u\in H} \langle v, u\rangle - h(u)

Theorem. [Basic Fenchel inequality]
h(u)+h(v)u,vh(u) +h^*(v)\geq \langle u, v\rangle
The equality holds if (u,v)h(u, v)\in \partial h.

The function hh is said to be α\alpha-strongly convex if
h(u)h(u0)+h(u0),uu0+α2uu02h(u) \geq h(u_0) + \langle \partial h(u_0), u-u_0\rangle + \dfrac{\alpha}{2}||u-u_0||^2

If hh is strongly convex, the Fenchel inequality can be improved

Theorem. [Fenchel inequality with strong convexity] If hh is α\alpha-strongly convex
h(u)+h(v)u,v+α2uh(v)2.h(u) +h^*(v)\geq \langle u, v\rangle + \frac{\alpha}{2}||u - \nabla h^*(v) ||^2.
The equality holds if (u,v)h(u, v)\in \partial h.

Here, It is well-known that if hh is α\alpha-strongly convex, then hh^* is 1/α1/\alpha-strongly smooth (or usually known as 1/α1/\alpha-smooth), i.e.
h(v)={h(v)},vH,\partial h^*(v) = \{\nabla h^*(v)\}, \forall v\in H,
and h\nabla h^* is 1/α1/\alpha-Lipschitz. Here h\nabla h^* is the Frechet differential of hh^*.

Proof.
Let r=h(v).r = \nabla h^*(v). We have
h(r)+h(v)=r,v.h(r) + h^*(v)= \langle r, v\rangle.

Since hh is α\alpha-strongly convex, we have
h(u)h(r)h(r),ur+α2ur2.h(u)- h(r)\geq \langle \partial h(r), u-r\rangle + \frac{\alpha}{2}||u-r||^2.

Take the sum and notice that vh(r)v\in \partial h(r) since r=h(v)r=\nabla h^*(v), we have
h(u)+h(v)v,u+α2ur2.h(u)+ h^*(v)\geq \langle v, u\rangle + \frac{\alpha}{2}||u-r||^2.
The proof is complete.

The equality holds if u=ru=r, which is equivalent to vh(u)(u,v)hv\in \partial h(u)\Leftrightarrow (u, v)\in \partial h.
\square

Fenchel inequality with Proximal operator

In this section we discuss a refined version of Fenchel inequality using proximal operator.
The main reference here is [1, Lemma 1.1, epage 3].

Theorem. [Fenchel inequality with proximal operator] For λ>0\lambda>0, we have
h(u)+h(v)u,v+1λuproxλh(u+λv)2h(u) +h^*(v)\geq \langle u, v\rangle + \frac{1}{\lambda}||u - prox_{\lambda h}(u + \lambda v) ||^2
The equality holds if
proxλh(u+λv)λh(v) and (u+λv)proxλh(u+λv)λh(u).prox_{\lambda h}(u+\lambda v)\in \lambda \partial h^*(v) \text{ and } (u +\lambda v) − prox_{\lambda h}(u +\lambda v) ∈ \lambda \partial h(u).

We now discuss the application of above inequality for strongly convex function hh, i.e.

Remark. Fenchel inequality with proximal operator implies a “weak version” Fenchel inequality with strong convexity

For the ease of notations, we introduce the dual gap. For u,vHu, v\in H, the dual gap of u,vu, v w.r.t hh is
gap(u,v)h(u)+h(v)u,v.gap(u, v) \triangleq h(u)+h^*(v) - \langle u, v\rangle.
From the basic Fenchel inequality, we know that the dual gap is always non-negative.
We now show that

Theorem. If hh is 11-strongly convex, we have
gap(u,v)uproxh(u+v)214ur2gap(u, v) \geq ||u - prox_{h}(u +v) ||^2\geq \textcolor{blue}{\frac{1}{4}} ||u-r||^2
where r=h(v).r = \nabla h^*(v).

Proof.

For vHv\in H, define
rh(v).r\triangleq \nabla h^*(v).
Since hh^* is closed proper convex, we also have
vh(r).v\in \partial h(r).

Moreover, denote
J=proxhJ = prox_{h}

So, we now aim to show that
uJ(u+v)214ur2.||u - J(u +v) ||^2\geq \frac{1}{4}||u-r||^2.

Observe that r=(id+h)1(id+h)r=proxh(r+v)=J(r+v)r = (id + \partial h)^{-1} (id +\partial h) r = prox_{h}(r+ v) = J(r+v), see the theorem in the Appendix for the relation between proximal operator and subdifferential.

Thus, we have
uJ(u+v)2=[(u+v)J(u+v)][(r+v)J(r+v)]2||u - J(u +v) ||^2 = ||[(u+v) - J(u+v)] - [(r+v) - J(r+v)] ||^2
Also notice that Mh(w)=wJw\nabla M_{h} (w)= w-Jw, see Appendix for the relation between proximal operator and gradient of Moreau function. We now have
[(u+v)J(u+v)][(r+v)J(r+v)]2=Mh(u+v)Mh(r+v)2 ||[(u+v) - J(u+v)] - [(r+v) - J(r+v)] ||^2=||\nabla M_{h}(u+v) -\nabla M_{h}(r+v) ||^2

On the other hand, since hh is α\alpha-strongly convex (α=1\alpha=1). The Moreau function is α1+α\dfrac{\alpha}{1+\alpha}-strong convex, see [2, Lemma 2.19, epage 6]. Therefore, by [3, Lemma 3 ii), epage 3]
Mh(u+v)Mh(r+v)214ur2.||\nabla M_{h}(u+v) -\nabla M_{h}(r+v) ||^2\geq \frac{1}{4}||u-r||^2.

Appendix: Proximal operator

The Moreau function of hh with parameter λ>0\lambda>0 is Mλh:HR{+}M_{\lambda h}: H\rightarrow \mathbb R\cup\{+\infty\} defined by

Mλh(v)=minuHλh(u)+12uv2M_{\lambda h}(v) = \min_{u\in H} \lambda h(u) + \frac{1}{2}||u-v||^2

Note that the min\min is well-defined.

The (unique) argmin of the RHS is called the proximal of vv
proxλh(v)=arg minuHλh(u)+12uv2prox_{\lambda h} (v) = \argmin_{u\in H} \lambda h(u) + \frac{1}{2}||u-v||^2

Example. Let KHK\subset H, we have
proxλιK=projKprox_{\lambda \iota_K} = proj_K
and
MλιK(v)=12dist(v,K)2M_{\lambda \iota_K}(v) =\frac{1}{2} dist(v, K)^2

Example. Let h(u)=12bu2h(u) = \frac{1}{2}||b-u||^2, we have
proxh(v)=b+v2prox_{h}(v) = \frac{b+v}{2}
and
Mh(v)=14bv2M_h(v) = \frac{1}{4}||b-v||^2

The proximal operator has some representations:

Theorem. [connection with subdifferential]
proxλh=(id+λh)1prox_{\lambda h} = (id + \lambda \partial h)^{-1}

Proof. Let vHv\in H and w=proxλh(v)w=prox_{\lambda h}(v), we have
0λh(w)+wv0\in \lambda \partial h(w) + w-v
that is
vw+λh(w)=(id+λh)(w)v\in w+ \lambda \partial h(w) = (id+\lambda \partial h)(w)
or
w(id+λh)1(v)w \in (id+ \lambda \partial h)^{-1}(v)
\square

Theorem. [connection with Gradient of Moreau function]
Mλh(v)=1λ(vproxλh(v)).\nabla M_{\lambda h}(v) = \frac{1}{\lambda} (v - prox_{\lambda h}(v)).

Proof. See [2, Fact 2.9, epage 4]

Theorem. The gradient of Moreau function is always 11-Lipschitz
Mλh(u)Mλh(v)uv||\nabla M_{\lambda h} (u)- \nabla M_{\lambda h}(v)||\leq ||u-v||

Proof. See [2, Fact 2.9, epage 4]

Theorem. If hh is α\alpha-strongly convex, then the Moreau function is α1+α\frac{\alpha}{1+\alpha}-strongly convex, thus
Mλh(u)Mλh(v)α1+αuv||\nabla M_{\lambda h} (u)- \nabla M_{\lambda h}(v)|| \textcolor{red}{\geq} \textcolor{blue}{\frac{\alpha}{1+\alpha}} ||u-v||

Proof. See [2, Lemma 2.19, epage 6] and [3, Lemma 3 ii), epage 3].

References

[1]: Carlier, Guillaume. “Fenchel-Young inequality with a remainder and applications to convex duality and optimal transport.” (2022).
[2]: Planiden, Chayne, and Xianfu Wang. “Strongly convex functions, Moreau envelopes, and the generic nature of convex functions with strong minimizers.” SIAM Journal on Optimization 26.2 (2016): 1341-1364.
[3]: Zhou, Xingyu. “On the fenchel duality between strong convexity and lipschitz continuous gradient.” arXiv preprint arXiv:1803.06573 (2018).


Popular Posts