Image processing: Transforming MNIST digits using optimal transport
The goal of this post is to create the following transformation which maps a number 2 to number 4 using Optimal Transport, where the two numbers are hand written digit images, see MNIST digit dataset. The Python code is available on my github.
Here, we briefly recap the Optimal Transport, which aims to move (mass of) points from “source” to the (mass of) points in the “target”.
Mathematically, it is a linear programming problem:
In this formulation, represents the vector of resources, is the vector of targets, and denotes the cost associated with moving a unit from resource to target . The matrix is the solution, representing the plan, with indicating the quantity of product being transported from to . In practical applications, the cost matrix is often constructed using squared distance:
Here and (in ) denote the locations of resources and targets , respectively.
For simplicity, we require that .
Returning to the images labeled 2 and 4, we extract pixels from each image, assigning unit mass to each pixel. Subsequently, we apply Optimal Transport to determine the corresponding pairs (assuming that such pairing exists) and employ linear interpolation between these points. This process results in the creation of the GIF image showcased at the outset of this blog post.