Wednesday, February 15, 2023

On the Strong Convexity of Negative Entropy and Kullback-Leibler Divergence.

strong_convexity_negative_entropy_and_KL_div

On the Strong Convexity of Negative Entropy and Kullback-Leibler Divergence.

Author: Tran Thu Le
Date: 15/02/2023

Abstract. In this post, we investigate the locally strong convexity of the negative entropy and its Bregman divergence called Kullback-Leibler Divergence.


For uR+mu\in \mathbb R^m_+, let
log(u)[log(u1),....,log(um)]TRm.\log(u) \triangleq [\log(u_1), .... ,\log(u_m)]^T \in \mathbb R^m.

We say that uu is a probability if, in addition, 1m,u=i=1nui=1\langle 1_m, u\rangle = \sum_{i=1}^n u_i=1.

The negative entropy function of uu is denoted by
h(u)=u,log(u).h(u) = \langle u, \log(u)\rangle.
with gradient
h(u)=1m+log(u)\nabla h(u) = 1_m+ \log(u)
and Hessian
2h(u)=diag(u1).\nabla^2 h (u) = diag(u^{-1}).

Note. Here note that I(u)=log(u)=1h(u)I(u) = -\log(u) = 1-\nabla h(u) is called the information of uu.

The following theorem summarizes some of basic results of hh when uu is bounded by a box.

Theorem. If u[0,1α]mu\in [0, \frac{1}{\alpha}]^m, then

  1. 2h(u)αIm\nabla^2 h(u) - \alpha I_m is positively semidefinite 2h(u)αIm0\nabla^2 h(u) - \alpha I_m\geq 0
  2. h\nabla h satisfies h(u)h(r)αur||\nabla h(u) - \nabla h(r)||\geq \alpha ||u-r||
    for all u,rRmu, r\in \mathbb R^m.
  3. hh is α\alpha-strongly convex
    Bregh(u,r)h(u)h(r)h(r),urα2ur2Breg_h(u, r) \triangleq h(u) - h(r) - \langle \nabla h(r), u-r\rangle\geq \frac{\alpha}{2}||u-r||^2
    for all u,rRmu, r\in \mathbb R^m.

Here BreghBreg_h is called Bregman divergence of hh. By this theorem we see that

The negative entropy is α\alpha-strongly convex on [0,1α]m[0, \frac{1}{\alpha}]^m.

We call the Bregman divergence of negative entropy Kullback Leibler divergence and denote it as KL(u,r)Bregh(u,r).KL(u, r)\triangleq Breg_h(u, r). It has the following formula

Theorem. Let u,ru, r be probabilities, then the Kullback Leibler divergence has the formula
KL(u,r)=u,log(u/r)1m,u+1m,r KL(u, r) = \langle u, \log(u/r)\rangle - \langle 1_m, u\rangle + \langle 1_m, r\rangle
In particular, if u,ru, r are probabilities, we have
KL(u,r)=u,log(u/r) KL(u, r) = \langle u, \log(u/r)\rangle

Proof. We have
h(u)h(r)h(r),ur=u,log(u)r,log(r)1m+log(r),ur=u,log(u/r)1m,u+1m,r\begin{aligned} & h(u)-h(r) - \langle \nabla h(r), u-r\rangle\\ = & \langle u, \log(u)\rangle - \langle r, \log(r)\rangle - \langle 1_m+\log(r), u-r\rangle \\ = & \langle u, \log(u/r)\rangle - \langle 1_m, u\rangle + \langle 1_m, r\rangle \end{aligned}
If u,ru, r are probabilities, on obtains the desired equality since 1m,u=1m,r=1\langle 1_m, u\rangle = \langle 1_m, r\rangle=1. \square

In general, the Bregman divergence is always strongly convex for the first argument if hh is.

Theorem. If hh is α\alpha-strongly convex, then so is Bregh(,r0)Breg_h(\cdot, r_0) for any fixed r0Rmr_0\in \mathbb R^m.

Proof. This is clear since Bregh(,r0)Breg_h(\cdot, r_0) is the sum of a strongly convex function hh and an affine function.\square

In general, the Bregman divergence is not strongly convex for the second argument. Hoever, for the Kullback-Leibler divergence, it is locally strongly convex.

Theorem. For u0u_0 fixed on Rm\mathbb R^m, we have

  • KL(u0,)KL(u_0, \cdot) is strongly convex on [0,1/α]m[0, 1/\alpha]^m
  • KL(u0,)KL(u_0, \cdot) is strongly smooth on [1/β,+)[1/\beta, +\infty)

Proof. Indeed, we have rKL(u0,r)=u0r+1m\nabla_r KL(u_0, r) =-\frac{u_0}{r} + 1_m
and r2KL(u0,r)=diag(u0r2)u0α2Im.\nabla^2_r KL(u_0, r) =diag( \frac{u_0}{r^2}) \geq \frac{u_0}{\alpha^2} I_m.
for r1/αr\leq 1/\alpha. Note that the RHS is positively defnite since u00u_0\geq 0.
And that r2KL(u0,r)u0β2Im.\nabla^2_r KL(u_0, r) \leq \frac{u_0}{\beta^2} I_m.
for r1/βr\geq 1/\beta.
\square

References

[1]: MSE - How to show 1D negative entropy function is strongly convex?
[2]: MSE - How to show nD negative entropy function is strongly convex?

Popular Posts