On the Strong Convexity of Negative Entropy and Kullback-Leibler Divergence.
Author: Tran Thu Le
Date: 15/02/2023
Abstract. In this post, we investigate the locally strong convexity of the negative entropy and its Bregman divergence called Kullback-Leibler Divergence.
For u∈R+m, let
log(u)≜[log(u1),....,log(um)]T∈Rm.
We say that u is a probability if, in addition, ⟨1m,u⟩=∑i=1nui=1.
The negative entropy function of u is denoted by
h(u)=⟨u,log(u)⟩.
with gradient
∇h(u)=1m+log(u)
and Hessian
∇2h(u)=diag(u−1).
Note. Here note that I(u)=−log(u)=1−∇h(u) is called the information of u.
The following theorem summarizes some of basic results of h when u is bounded by a box.
Theorem. If u∈[0,α1]m, then
- ∇2h(u)−αIm is positively semidefinite ∇2h(u)−αIm≥0
- ∇h satisfies ∣∣∇h(u)−∇h(r)∣∣≥α∣∣u−r∣∣
for all u,r∈Rm.
- h is α-strongly convex
Bregh(u,r)≜h(u)−h(r)−⟨∇h(r),u−r⟩≥2α∣∣u−r∣∣2
for all u,r∈Rm.
Here Bregh is called Bregman divergence of h. By this theorem we see that
The negative entropy is α-strongly convex on [0,α1]m.
We call the Bregman divergence of negative entropy Kullback Leibler divergence and denote it as KL(u,r)≜Bregh(u,r). It has the following formula
Theorem. Let u,r be probabilities, then the Kullback Leibler divergence has the formula
KL(u,r)=⟨u,log(u/r)⟩−⟨1m,u⟩+⟨1m,r⟩
In particular, if u,r are probabilities, we have
KL(u,r)=⟨u,log(u/r)⟩
Proof. We have
==h(u)−h(r)−⟨∇h(r),u−r⟩⟨u,log(u)⟩−⟨r,log(r)⟩−⟨1m+log(r),u−r⟩⟨u,log(u/r)⟩−⟨1m,u⟩+⟨1m,r⟩
If u,r are probabilities, on obtains the desired equality since ⟨1m,u⟩=⟨1m,r⟩=1. □
In general, the Bregman divergence is always strongly convex for the first argument if h is.
Theorem. If h is α-strongly convex, then so is Bregh(⋅,r0) for any fixed r0∈Rm.
Proof. This is clear since Bregh(⋅,r0) is the sum of a strongly convex function h and an affine function.□
In general, the Bregman divergence is not strongly convex for the second argument. Hoever, for the Kullback-Leibler divergence, it is locally strongly convex.
Theorem. For u0 fixed on Rm, we have
- KL(u0,⋅) is strongly convex on [0,1/α]m
- KL(u0,⋅) is strongly smooth on [1/β,+∞)
Proof. Indeed, we have ∇rKL(u0,r)=−ru0+1m
and ∇r2KL(u0,r)=diag(r2u0)≥α2u0Im.
for r≤1/α. Note that the RHS is positively defnite since u0≥0.
And that ∇r2KL(u0,r)≤β2u0Im.
for r≥1/β.
□
References
[1]: MSE - How to show 1D negative entropy function is strongly convex?
[2]: MSE - How to show nD negative entropy function is strongly convex?