Wednesday, August 25, 2021

Solving blasso with gradient method

Solving blasso with gradient method

Solving Blasso problem using gardient descent method

Let us consider a very simple setup of Blasso problem.
y=i=1nciati+εL2(Ω) y= \sum_{i=1}^n c_ia_{t_i} + \varepsilon \in \mathbf L^2(\Omega)
where ciRc_i\in \mathbb R, tiT=[0,1]t_i\in T=[0,1] and Ω=[0,1]\Omega=[0, 1] and ata_t the Gaussian function centered at tTt\in T with fixed sigma. Here ϵL2(Ω)\epsilon \in \mathbf L^2(\Omega) is some noise.

In the Blasso problem, we aim to recover positions tit_i and coefficients cic_i assuming that the noisy observation yy is known.

Note that we can rewrite yy as follows
y=Aμ+εy= \mathcal A \mu+\varepsilon
where μ=i=1nciδti\mu=\sum_{i=1}^n c_i \delta_{t_i} is a discrete Radon measure, i.e. combination of Dirac masses. And A\mathcal A is a weakly continuous operator from the space of Radon measure M(T)\mathcal M(T) to the Hilbert space L2(Ω)\mathbf L^2(\Omega).

To do that, we solve the Tikhonov regularization problem
minμM(T)12yAμ2+λμTV\min_{\mu \in \mathcal M(T)} \frac{1}{2} ||y-\mathcal A \mu||^2 +\lambda ||\mu||_{TV}
where λ>0\lambda>0 a regularization parameter and TV||\cdot||_{TV} the total variation norm over the Radon measure space, e.g. the total variation norm of a discrete measure is the 1\ell_1-norm of its coefficients.

Under some conditions (see Theorem 2 in Duval and Peyre), the optimizer measure is discrete, this allows us using the gradient descent to solve the non-convex Tikhonov regularization problem.

Our notebook can be found on Github providing a naive method for solving the problem. In our experiment, we assume that
y=1.a0.3+0.8a0.7+εy=1. a_{0.3}+0.8a_{0.7}+\varepsilon
ε\varepsilon is uniformly distributed in [0.1,0.1][-0.1, 0.1] and λ=0.1\lambda=0.1. The result is as follows

To compute the gradient of the objective function, which is very complicated since it should be considered as a function of positions tit_i and coefficients cic_i, we use pure backward method in Pytorch without using torch.optim tools (gradient descent method from scratch). The details is as follows:

def opt_min(func, point, lr=1e-2, max_iters=500, epsilon=1e-2):
    point.clone()
    data = {
        "val": [],
        "grad_max": [],
    }

    for i in range(max_iters):
        point.requires_grad = True 
        point.grad =  torch.zeros(point.shape)
        loss = func(point)
        loss.backward()
        grad = point.grad 
        point = point - lr * grad 
        # detach 
        loss = loss.detach()
        grad = grad.detach()
        point = point.detach()
        # save 
        grad_max = grad.abs().max()
        data["val"] += [loss]
        data["grad_max"] += [grad_max]
        # stop
        if grad_max<epsilon:
            break

    return point, data

Popular Posts