Introduction to Regularizing with 2D Data
Start with some noisy measured data. You want to subsample the data or change the point intervals. You could fit a polynomial to what you measured, but the noise interferes with the fit.
A better method is to regularize the data. This means to construct a numerical dataset similar to what you measured but with better resolution, custom spacing in x, and smoothing in y.
This series of articles shows how to regularize 2D data with a simple and elegant process.
The measured data can be something very simple, like these five points shown in a table of x and y values. We have five points that are irregularly spaced in x; no two points are the same distance apart.
In this example we want to convert the five measured points into a regularized dataset of ten points. The ten points have a custom spacing of 0.5 between adjacent points, except for the last three points which are closer together. We also want to dampen some of the noise so that the ten regularized points do not have wild oscillations, discontinuities, or other artifacts.
How do we do this?
We need to take five input points and use them to create ten output points with the desirable properties listed above. We will solve this problem by setting up a system of linear equations that represent a “wish list” of what we want, namely fidelity (Do the output points agree with the measured data?) and smoothness (Do the output points oscillate wildly or are they well behaved?).
Each row in this system of equations represents an item on the “wish list.” This system of linear equations will be overdetermined, so there is no exact solution. However, we don’t need an exact solution if we solve the system via its normal equations, which let us balance the goodness of fit against the smoothness of the fit. In other words, we can’t have a piece of cake and eat it too, but with the normal equations we can at least have a few bites of cake before it’s all gone. The trick is to scale the linear equations in a way that balances the goodness of fit against the smoothness so that the output dataset embodies the best compromise between fidelity and smoothness.
Start by setting up a system of linear equations that map the measured data to the output points.
None of the input points necessarily correspond directly to any one output point. For example, one of the input points has x=0.55. The two closest output points are at x=0.5 and x=1. We need a way to show how 0.55 relates to 0.5 and 1.
There are many ways to do that, but the simplest way is with linear interpolation. 0.55 can be represented as a weighted average of 0.5 and 1. In this case the weights are 0.9 and 0.1, respectively. If we take 90% of 0.5 and add it to 10% of 1, we get the number 0.55. If the linear combination [0.9, 0.1] works for the x coordinates, then it should also work for the y coordinates. Along this line of thinking, we should be able to write five linear equations that map each of the five input y values to five pairs of output y values. We call these the “fidelity equations” because they represent the goodness of fit, how well the output points agree with the input points.
This table shows the linear interpolation process. There are five rows because there are five input points. Low/High Index refers to the indexes of the two closest points in the output dataset. (These are the two columns in AFidelity where the Low/High weights will be inserted.) For example, Row 2 has an input value of 0.55. In the output dataset this corresponds to the third and fourth points, which have x coordinates of 0.5 and 1 respectively. The Low/High Weights are the linear combination. These should sum to 1 for each row.
This is the resulting set of linear equations. This is saying that (in an ideal world) these five linear combinations of whatever output y values we come up with will equal the respective input y values.
At this point the “wish list” has five items on it.