Fenchel inequality with Proximal operator
Preliminaries
Let h:H→R∪+∞ be closed proper convex function on the Hilbert space H.
A vector g∈H is called a subgradient of h at u0∈H if
h(u)≥h(u0)+⟨g,u−u0⟩,∀u∈H.
The set of all subgradients at u0 is called subdifferential denoted by ∂h(u0). Thus the subdifferential can be considered as a set-value function, say
∂h:H⇉H
We shall identify ∂h with its graph
∂h≡{(u,v):v∈∂h(u)}.
The conjugate function (a.k.a Fenchel conjugate or convex conjugate) of h is h∗:H→R∪+∞ so that
h∗(v)≜u∈Hsup⟨v,u⟩−h(u)
Theorem. [Basic Fenchel inequality]
h(u)+h∗(v)≥⟨u,v⟩
The equality holds if (u,v)∈∂h.
The function h is said to be α-strongly convex if
h(u)≥h(u0)+⟨∂h(u0),u−u0⟩+2α∣∣u−u0∣∣2
If h is strongly convex, the Fenchel inequality can be improved
Theorem. [Fenchel inequality with strong convexity] If h is α-strongly convex
h(u)+h∗(v)≥⟨u,v⟩+2α∣∣u−∇h∗(v)∣∣2.
The equality holds if (u,v)∈∂h.
Here, It is well-known that if h is α-strongly convex, then h∗ is 1/α-strongly smooth (or usually known as 1/α-smooth), i.e.
∂h∗(v)={∇h∗(v)},∀v∈H,
and ∇h∗ is 1/α-Lipschitz. Here ∇h∗ is the Frechet differential of h∗.
Proof.
Let r=∇h∗(v). We have
h(r)+h∗(v)=⟨r,v⟩.
Since h is α-strongly convex, we have
h(u)−h(r)≥⟨∂h(r),u−r⟩+2α∣∣u−r∣∣2.
Take the sum and notice that v∈∂h(r) since r=∇h∗(v), we have
h(u)+h∗(v)≥⟨v,u⟩+2α∣∣u−r∣∣2.
The proof is complete.
The equality holds if u=r, which is equivalent to v∈∂h(u)⇔(u,v)∈∂h.
□
Fenchel inequality with Proximal operator
In this section we discuss a refined version of Fenchel inequality using proximal operator.
The main reference here is [1, Lemma 1.1, epage 3].
Theorem. [Fenchel inequality with proximal operator] For λ>0, we have
h(u)+h∗(v)≥⟨u,v⟩+λ1∣∣u−proxλh(u+λv)∣∣2
The equality holds if
proxλh(u+λv)∈λ∂h∗(v) and (u+λv)−proxλh(u+λv)∈λ∂h(u).
We now discuss the application of above inequality for strongly convex function h, i.e.
Remark. Fenchel inequality with proximal operator implies a “weak version” Fenchel inequality with strong convexity
For the ease of notations, we introduce the dual gap. For u,v∈H, the dual gap of u,v w.r.t h is
gap(u,v)≜h(u)+h∗(v)−⟨u,v⟩.
From the basic Fenchel inequality, we know that the dual gap is always non-negative.
We now show that
Theorem. If h is 1-strongly convex, we have
gap(u,v)≥∣∣u−proxh(u+v)∣∣2≥41∣∣u−r∣∣2
where r=∇h∗(v).
Proof.
For v∈H, define
r≜∇h∗(v).
Since h∗ is closed proper convex, we also have
v∈∂h(r).
Moreover, denote
J=proxh
So, we now aim to show that
∣∣u−J(u+v)∣∣2≥41∣∣u−r∣∣2.
Observe that r=(id+∂h)−1(id+∂h)r=proxh(r+v)=J(r+v), see the theorem in the Appendix for the relation between proximal operator and subdifferential.
Thus, we have
∣∣u−J(u+v)∣∣2=∣∣[(u+v)−J(u+v)]−[(r+v)−J(r+v)]∣∣2
Also notice that ∇Mh(w)=w−Jw, see Appendix for the relation between proximal operator and gradient of Moreau function. We now have
∣∣[(u+v)−J(u+v)]−[(r+v)−J(r+v)]∣∣2=∣∣∇Mh(u+v)−∇Mh(r+v)∣∣2
On the other hand, since h is α-strongly convex (α=1). The Moreau function is 1+αα-strong convex, see [2, Lemma 2.19, epage 6]. Therefore, by [3, Lemma 3 ii), epage 3]
∣∣∇Mh(u+v)−∇Mh(r+v)∣∣2≥41∣∣u−r∣∣2.
Appendix: Proximal operator
The Moreau function of h with parameter λ>0 is Mλh:H→R∪{+∞} defined by
Mλh(v)=u∈Hminλh(u)+21∣∣u−v∣∣2
Note that the min is well-defined.
The (unique) argmin of the RHS is called the proximal of v
proxλh(v)=u∈Hargminλh(u)+21∣∣u−v∣∣2
Example. Let K⊂H, we have
proxλιK=projK
and
MλιK(v)=21dist(v,K)2
Example. Let h(u)=21∣∣b−u∣∣2, we have
proxh(v)=2b+v
and
Mh(v)=41∣∣b−v∣∣2
The proximal operator has some representations:
Theorem. [connection with subdifferential]
proxλh=(id+λ∂h)−1
Proof. Let v∈H and w=proxλh(v), we have
0∈λ∂h(w)+w−v
that is
v∈w+λ∂h(w)=(id+λ∂h)(w)
or
w∈(id+λ∂h)−1(v)
□
Theorem. [connection with Gradient of Moreau function]
∇Mλh(v)=λ1(v−proxλh(v)).
Proof. See [2, Fact 2.9, epage 4]
Theorem. The gradient of Moreau function is always 1-Lipschitz
∣∣∇Mλh(u)−∇Mλh(v)∣∣≤∣∣u−v∣∣
Proof. See [2, Fact 2.9, epage 4]
Theorem. If h is α-strongly convex, then the Moreau function is 1+αα-strongly convex, thus
∣∣∇Mλh(u)−∇Mλh(v)∣∣≥1+αα∣∣u−v∣∣
Proof. See [2, Lemma 2.19, epage 6] and [3, Lemma 3 ii), epage 3].
References
[1]: Carlier, Guillaume. “Fenchel-Young inequality with a remainder and applications to convex duality and optimal transport.” (2022).
[2]: Planiden, Chayne, and Xianfu Wang. “Strongly convex functions, Moreau envelopes, and the generic nature of convex functions with strong minimizers.” SIAM Journal on Optimization 26.2 (2016): 1341-1364.
[3]: Zhou, Xingyu. “On the fenchel duality between strong convexity and lipschitz continuous gradient.” arXiv preprint arXiv:1803.06573 (2018).