This is a custom Excel function for regularizing 3D data. It uses the same methodology and arrives at the same results as the spreadsheet that goes with Introduction to Regularizing with 3D Data, but it adds the convenience of a single spreadsheet function that automates the calculation processes without taking up space on your spreadsheet or requiring you to install additional add-ins or DLLs.
The VBA code puts a lot of effort into solving the matrix equations to demonstrate several techniques. For example, it uses a sparse matrix data structure that it reorders by a reverse Cuthill-McKee permutation and solves with a Cholesky decomposition. All calculations are performed in VBA and the example spreadsheet includes full source code.
How To Use RegularizeData3D
RegularizeData3D is an array function like its 2D counterpart, and its syntax is the same except for two more parameters. The first three parameters are the x, y, and z input points that you want to regularize. All three must be the same length, and they must each have at least three spreadsheet cells. The fourth parameter is the stiffness constant. The fifth and sixth parameters are the x and y coordinates that describe the output grid. The output grid is the selected area where you enter the formula, and it needs to have the same dimensions as the x and y coordinates of the output grid.
Start with a list of input points and an area where you want the 3D surface data to be filled in. Select that area, cells G7:O15 shown in blue in this example.
With all of those cells selected, begin typing the formula RegularizeData3D.
Sometimes you don’t necessarily know how many x, y, and z input points you will have when you type in the formula. In this example we are specifying Rows 6-17 even though only 6-10 have any numbers in them. RegularizeData3D recognizes that the last cells of input points are blank, so it ignores them. That way you can fill them in later without having to change the formula you typed in. (I admit that with 7 parameters that formula is a bit long.)
When the formula is complete, press Ctrl+Shift+Enter. The cells that you selected (G7:O15) will automatically fill in with the regularized data.
This is a custom Excel function that uses the same methodology and arrives at the same results as we saw before in the spreadsheet calculations. However, it also automates the process of solving the resulting matrix equations, which means managing large amounts of data and other details that we didn’t have to deal with before. The VBA code is over 1,200 lines, and it would take a hundred pages to explain everything it does. For that reason the rest of this post covers but a few highlights. The code is heavily commented, and if you have questions about specific parts of it, you are welcome to leave comments below. The function
Regularize3DDbl() is intentionally short and doubles as a list of the calculation steps for regularizing 3D data. If you want to see how it works, this function is a good place to start.
Focus on the Calculations with a Sparse Matrix
Regularizing data is a computationally expensive process because it involves solving a large system of linear equations. The fastest way to solve these kinds of equations and thereby to regularize data is in Matlab, not Excel. (Perhaps in a future post…) But here we are using Excel and VBA, which many regard as the slowest possible way to solve complex math problems other than using slate and chalk.
However, that slowness gives us an opportunity to take a closer look at how to solve a large linear system without the benefit of someone else’s DLL. As with the 2D case we need to devise a strategy to keep track of data in a large, sparse matrix. However, this time it is a lot more complicated. The 2D case always gave us a pentadiagonal matrix, but now that we’re working in 3 dimensions, the size and shape of the matrix depend on the size of the output grid – which the user can choose to be whatever he wants. We need to solve a matrix equation, but Gaussian elimination by itself is too slow.
To make this work, we have to do things differently. For starters we store the individual matrix equations in a compressed format for sparse matrices. Then we calculate the projection ATA and store the result in the same compressed format. The gods proclaim from on high that this matrix is a “positive-definite symmetric, banded matrix.” It is not pentadiagonal like the 2D case, but the important part is that it is symmetric and banded. We can use those properties to speed things up. Reordering the matrix takes advantage of its bandedness (and sparsity). The reordered matrix has a smaller bandwidth (In this context “bandwidth” may also be called “degree.”) than the original version, and the smaller the bandwidth, the faster it is to solve. The Cholesky decomposition takes advantage of the matrix’s symmetry and has the benefits of being twice as fast and using half as much memory. (In newer versions of Excel these have roughly the same speed, perhaps due to compiler optimization and pipelining. You can try both methods by changing the code.)
Storing a Sparse Matrix
The way we store the sparse matrix is a lot like Matlab and other programs. We create an Nx1 array that stores the number of nonzero elements in each column. Then we have another array that holds the row numbers of each nonzero element. Then a third matrix contains those nonzero values. (There is yet another matrix with the last nonzero row number for each column.) If we need to get, for example, the value of the 4,3 element, we look at the third column of each of these three matrices. The first matrix tells us if there are any nonzero elements in the third column. (None of the algorithms here bother to do that because every column has nonzero elements.) Then we search for the number 4 in the third column of the second matrix. The lists of row numbers are sorted in ascending order, which accommodates the search. If the number 4 is there in a certain position, we can get the 4,3 element’s value at the same position in the third array. Checking three arrays and searching through row listings and using IF statements is more effort than using a single array, but it is ultimately much faster to multiply columns by each other. A 10×10 output grid has 100 output points and therefore as many rows and columns in its sparse matrix. Multiplying one column by another column means 100 iterations, but this sparse matrix format lets us do it in 21 iterations at most (and always fewer in practice).
Calculating a Projection
We can take advantage of the compressed sparse matrix format to make some shortcuts. Calculating the projection ATA is an exercise in multiplying columns by each other. The sparse format accommodates this because it stores matrix values by column. Suppose we need to multiply Columns 1 and 13 by each other. Also suppose that the first nonzero row in Column 1 is Row 1, but the first nonzero row in Column 13 is Row 13. There is no sense in calculating anything for the first 12 rows because we would just be multiplying a bunch of numbers by zero and adding the results together… which is zero and therefore doesn’t contribute anything. The VBA function is smart enough to recognize that, so it skips straight to Row 13 before it tries anything. Another way to save time is to only multiply numbers when there are nonzero values in both columns at the same time. If Column 1 has nonzero values in Rows 1, 4, 5, 7, 12, 15, and 16 and Column 13 has nonzero values in Rows 13, 14, 15, 16, 17, 19, and 21, then the algorithm recognizes that both columns have the rows 15 and 16 in common. It skips over the other entries and only multiplies the values for Rows 15 and 16. The name of the game is avoiding zeroes, and that second matrix of nonzero row numbers helps us do that because it keeps track of exactly what we need to know.
Reverse Cuthill-McKee Reordering
The purpose of the reordering is to reduce the bandwidth of (most of) the rows in the matrix. (At this point the matrix is symmetric, so rows are columns and columns are rows. i.e. 4,5 is the same as 5,4.) The solver then performs operations on each row, but only in as many columns as it needs to based on the results of the reordering. Ciprian Zavoianu has an excellent tutorial for the Cuthill-McKee Algorithm. As a matter of fact, this software was developed here with the help of Ciprian’s tutorial, and the only difference is that it uses the reverse permutation vector instead of the forward permutation vector. To be fair, there are better, more modern ways to reorder sparse matrices. Check out Tim Davis’s website for details about the latest sparse matrix algorithms and how they are used.
Cholesky Decomposition and Solving the Sparse Matrix
Once the projection is calculated, it is possible to solve the matrix equation using Gaussian elimination and back-substitution. However, a Cholesky decomposition is much faster because it takes advantage of the matrix’s symmetry. When the script performs the Cholesky decomposition, it stores the numbers in a single matrix with a compressed format similar to the one we used in the script for regularizing 2D data (where the first entry in each “row” represents a diagonal element in the actual matrix). The size of that matrix is the maximum bandwidth of the system by the number of output points in the grid. This makes sense because the bandwidth of the matrix tells us how far away the nonzero elements are from the diagonal. Because of the matrix’s symmetry we only need to work with the diagonal elements and the elements on one side thereof.
The source code includes facilities to solve with or without Cholesky factorization. Note that one line is commented out while the other is executable. You can test the speed by commenting out one line or the other and reading the timer output in the debug window after each recalculation.
' Solve the matrix. The solution is in a column vector.
' There are two ways to solve the matrix; the Cholesky method is faster.
'Result = SolveNDiagonalMatrix(A, b, 2 * UBound(OutputDatasetX))
Result = SolveSymmetricMatrixCholesky(A, b, 2 * UBound(OutputDatasetX))
Formatting the Results
When the macro finds the solution to the system of equations, that solution vector needs to be rearranged into a grid that fits on the spreadsheet.
ReshapeMatrix() converts the solution vector to a grid.
RegularizeData3D() also accounts for transposing or reversing the axes (e.g. if the user decides that positive is downward on the y axis). That may be necessary to ensure that
RegularizeData3DDbl() works with everything in ascending order and in the most efficient formulation (with the shortest grid dimension along the x axis because of its effect on the vertical derivatives). By the time VBA returns the solution to the user, everything is rearranged so that it is consistent with the user’s grid layout. Either or both of the x and y axes can be in reverse order, and everything should still work.
Download the VBA Source Code with Examples
This XLSM file has five worksheets with various examples of RegularizeData3D. The two VBA modules contain the source code.
RegularizeData3D with Examples and Source Code.xlsm