Solving Blasso problem using gardient descent method

Let us consider a very simple setup of Blasso problem.
$y= \sum_{i=1}^n c_ia_{t_i} + \varepsilon \in \mathbf L^2(\Omega)$
where $c_i\in \mathbb R$ , $t_i\in T=[0,1]$ and $\Omega=[0, 1]$ and $a_t$ the Gaussian function centered at $t\in T$ with fixed sigma. Here $\epsilon \in \mathbf L^2(\Omega)$ is some noise.

In the Blasso problem, we aim to recover positions $t_i$ and coefficients $c_i$ assuming that the noisy observation $y$ is known.

Note that we can rewrite $y$ as follows
$y= \mathcal A \mu+\varepsilon$
where $\mu=\sum_{i=1}^n c_i \delta_{t_i}$ is a discrete Radon measure, i.e. combination of Dirac masses. And $\mathcal A$ is a weakly continuous operator from the space of Radon measure $\mathcal M(T)$ to the Hilbert space $\mathbf L^2(\Omega)$ .

To do that, we solve the Tikhonov regularization problem
$\min_{\mu \in \mathcal M(T)} \frac{1}{2} ||y-\mathcal A \mu||^2 +\lambda ||\mu||_{TV}$
where $\lambda>0$ a regularization parameter and $||\cdot||_{TV}$ the total variation norm over the Radon measure space, e.g. the total variation norm of a discrete measure is the $\ell_1$ -norm of its coefficients.

Under some conditions (see Theorem 2 in Duval and Peyre), the optimizer measure is discrete, this allows us using the gradient descent to solve the non-convex Tikhonov regularization problem.

Our notebook can be found on Github providing a naive method for solving the problem. In our experiment, we assume that
$y=1. a_{0.3}+0.8a_{0.7}+\varepsilon$
$\varepsilon$ is uniformly distributed in $[-0.1, 0.1]$ and $\lambda=0.1$ . The result is as follows

To compute the gradient of the objective function, which is very complicated since it should be considered as a function of positions $t_i$ and coefficients $c_i$ , we use pure backward method in Pytorch without using torch.optim tools (gradient descent method from scratch). The details is as follows:

def opt_min(func, point, lr=1e-2, max_iters=500, epsilon=1e-2):
    point.clone()
    data = {
        "val": [],
        "grad_max": [],
    }

    for i in range(max_iters):
        point.requires_grad = True 
        point.grad =  torch.zeros(point.shape)
        loss = func(point)
        loss.backward()
        grad = point.grad 
        point = point - lr * grad 
        # detach 
        loss = loss.detach()
        grad = grad.detach()
        point = point.detach()
        # save 
        grad_max = grad.abs().max()
        data["val"] += [loss]
        data["grad_max"] += [grad_max]
        # stop
        if grad_max<epsilon:
            break

    return point, data