RegularizeData2D, the Excel Spreadsheet Function for Regularizing 2D Data – Part 1 of 2

Custom Excel Spreadsheet Function for Regularizing 2D Data
RegularizeData2D Example of Use in Excel

This is a custom Excel spreadsheet function that regularizes 2D data. It is an Excel array function that duplicates the results of the spreadsheet calculations we saw in an earlier article about regularizing. This article shows the development of the VBA function and how, as a computer program, its design is different from the mathematical formulation. We arrive at the same result as the spreadsheet example, but we avoid slow operations like multiplying matrices, storing unnecessary data, and solving a matrix equation inefficiently.

RegularizeData2D Function Syntax

Syntax for the Excel Regularizing Function

No Excel Add-Ins!

Hell.dllThe goal of this project is to make a single, self-contained Excel spreadsheet function for regularizing 2D data directly in the spreadsheet. It’s true that there are faster ways of doing this, but those all require you to install other files. If you e-mail somebody a spreadsheet that uses this function, he can use it immediately without choosing add-ins, copying DLLs, or installing anything. It should just work.

Introduction – Math papers and computer programs are different.


These are not the same!

Earlier articles introduced the concept of this type of regularization and showed how to formulate the problem mathematically. So far, so good… but just having a mathematical formulation doesn’t mean we are ready to implement a computer program!

An equation arrives at a result. A computer program arrives at the same result, but not necessarily in the same way. For example, consider the linear system Ax=b. In a math paper you solve for x by taking the inverse of A like this: x=A-1*b. From a purely mathematical point of view, that equation is correct. However, no self-respecting computer programmer would be caught dead doing it like that!

Here’s a little excerpt from the Matlab Documentation Library reiterating the point about matrix inversion. “It takes almost two and one half times as long to compute the solution with y = inv(A)*b as with z = A\b.” And they weren’t even considering matrices with special properties that can be solved even faster! In an algebra class it is appropriate to learn about the concept of a matrix inverse, but it is rare to see a matrix inverse outside of the classroom – and for good reason.

There are many ways to solve a matrix equation, and calculating an inverse is the slowest one. In a computer program we care about finding the answer quickly (so that the user doesn’t die of old age). We care about conserving memory (so that we can solve bigger problems). A computer program is better off solving Ax=b using Gaussian elimination and back-substitution. Seriously! Do it the old-fashioned way and you can’t go wrong! If the matrix A has special properties like symmetry or bandedness, we can solve it even faster using permutations, decompositions, etc.

Computer programs can take shortcuts and get away with it.

Last time we constructed individual rows in the matrix A, then we calculated ATA, then ATb… That process was instructive and arrived at the correct answer, which is fine because the purpose was to demonstrate the concept.

Now we’re going to do things differently. Multiplying matrices by each other takes too long, so we’ll create ATA and ATb directly. To solve the system in a spreadsheet, we used the function MINVERSE() because that is all Excel has. (Tsk, tsk, tsk, Microsoft. You guys should know better!) Now that we aren’t limited by an Excel spreadsheet, we can take advantage of the special properties of ATA to get a solution faster.

The gods proclaim from on high that ATA is a “positive-definite, symmetric, pentadiagonal-banded matrix.” For us mere mortals, the most important property is that it is pentadiagonal. In other words, no matter how big the matrix is, no row has more than two nonzero elements on either side of the diagonal. That’s a good excuse to ignore all of those zeros! For example, a 1000×1000 pentadiagonal matrix is treated like a 1000×5 matrix in terms of calculations and use of the computer’s memory. This alone gives us enormous savings in time and memory!

Here are the programming shortcuts we will get away with.

  • Don’t multiply any matrix by… anything. Instead, calculate the multiplied versions directly.
  • Don’t use memory to store anything nonzero if you can help it. Use a “compressed” matrix format that avoids storing zeros.
  • Don’t use a matrix inverse. Instead, solve the old fashioned way.
  • Don’t bother using more than five elements at a time to solve the final equation.

It is wasteful to store zeros in a sparse matrix.

Computer programs need logistical code.

Before the function jumps into the task of regularizing a 2D dataset, it must handle the logistics necessary for working with an Excel spreadsheet. Everything can be done in one big function, but the program is more organized if it is separated into different functions where each one has a specific purpose in the context of the entire program.

  • The function RegularizeData2D() is the one that interacts with the spreadsheet. Its job is to read in the arguments, check them for errors, convert the arguments to arrays (which are easier to work with than Variant types), and to convert the result back into a format that is compatible with the spreadsheet (i.e. an array of Variant types). If something looks wrong, this is where error messages are generated.
  • The function RegularizeData2DDbl() performs the mathematical steps of regularization. It calls a sequence of helper functions and returns the result.
  • ApplyInputDataPoints() and ApplySecondDerivatives() construct the projected matrices that need to be solved. UpperTriangularForm() and SolveCompressedPentadiagonalUpperTriangularMatrixEquation() have the task of solving the matrix equation. GetLowIndex() is only called by ApplyInputDataPoints() to help with the interpolation process.
  • ShowMatrix1D() and ShowMatrix2D() are used for debugging.
  • The QueryPerformanceTimer module has some handy tools for timing the performance of the code (and it reports the timing in the Debug window). A good programmer thinks he writes “fast” code. A great programmer uses a stopwatch.

One comment

  1. Pingback: RegularizeData2D, the Excel Spreadsheet Function for Regularizing 2D Data – Part 2 of 2 « Math for Mere Mortals

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s