RegularizeData3D – the Excel Spreadsheet Function to Regularize 3D Data to a Smooth Surface

Surface Regularized to Grids Coarse, Medium, FineThis is a custom Excel function for regularizing 3D data. It uses the same methodology and arrives at the same results as the spreadsheet that goes with Introduction to Regularizing with 3D Data, but it adds the convenience of a single spreadsheet function that automates the calculation processes without taking up space on your spreadsheet or requiring you to install additional add-ins or DLLs.

The VBA code puts a lot of effort into solving the matrix equations to demonstrate several techniques. For example, it uses a sparse matrix data structure that it reorders by a reverse Cuthill-McKee permutation and solves with a Cholesky decomposition. All calculations are performed in VBA and the example spreadsheet includes full source code.

How To Use RegularizeData3D

RegularizeData3D is an array function like its 2D counterpart, and its syntax is the same except for two more parameters. The first three parameters are the x, y, and z input points that you want to regularize. All three must be the same length, and they must each have at least three spreadsheet cells. The fourth parameter is the stiffness constant. The fifth and sixth parameters are the x and y coordinates that describe the output grid. The output grid is the selected area where you enter the formula, and it needs to have the same dimensions as the x and y coordinates of the output grid.

Step 1
Start with a list of input points and an area where you want the 3D surface data to be filled in. Select that area, cells G7:O15 shown in blue in this example.
RegularizeData3D Function Syntax 1

Step 2
With all of those cells selected, begin typing the formula RegularizeData3D.
RegularizeData3D Function Syntax 2

Sometimes you don’t necessarily know how many x, y, and z input points you will have when you type in the formula. In this example we are specifying Rows 6-17 even though only 6-10 have any numbers in them. RegularizeData3D recognizes that the last cells of input points are blank, so it ignores them. That way you can fill them in later without having to change the formula you typed in. (I admit that with 7 parameters that formula is a bit long.)

Step 3
When the formula is complete, press Ctrl+Shift+Enter. The cells that you selected (G7:O15) will automatically fill in with the regularized data.
RegularizeData3D Function Syntax 3

Summary

This is a custom Excel function that uses the same methodology and arrives at the same results as we saw before in the spreadsheet calculations. However, it also automates the process of solving the resulting matrix equations, which means managing large amounts of data and other details that we didn’t have to deal with before. The VBA code is over 1,200 lines, and it would take a hundred pages to explain everything it does. For that reason the rest of this post covers but a few highlights. The code is heavily commented, and if you have questions about specific parts of it, you are welcome to leave comments below. The function Regularize3DDbl() is intentionally short and doubles as a list of the calculation steps for regularizing 3D data. If you want to see how it works, this function is a good place to start.

Focus on the Calculations with a Sparse Matrix

Regularizing data is a computationally expensive process because it involves solving a large system of linear equations. The fastest way to solve these kinds of equations and thereby to regularize data is in Matlab, not Excel. (Perhaps in a future post…) But here we are using Excel and VBA, which many regard as the slowest possible way to solve complex math problems other than using slate and chalk.

However, that slowness gives us an opportunity to take a closer look at how to solve a large linear system without the benefit of someone else’s DLL. As with the 2D case we need to devise a strategy to keep track of data in a large, sparse matrix. However, this time it is a lot more complicated. The 2D case always gave us a pentadiagonal matrix, but now that we’re working in 3 dimensions, the size and shape of the matrix depend on the size of the output grid – which the user can choose to be whatever he wants. We need to solve a matrix equation, but Gaussian elimination by itself is too slow.

To make this work, we have to do things differently. For starters we store the individual matrix equations in a compressed format for sparse matrices. Then we calculate the projection ATA and store the result in the same compressed format. The gods proclaim from on high that this matrix is a “positive-definite symmetric, banded matrix.” It is not pentadiagonal like the 2D case, but the important part is that it is symmetric and banded. We can use those properties to speed things up. Reordering the matrix takes advantage of its bandedness (and sparsity). The reordered matrix has a smaller bandwidth (In this context “bandwidth” may also be called “degree.”) than the original version, and the smaller the bandwidth, the faster it is to solve. The Cholesky decomposition takes advantage of the matrix’s symmetry and has the benefits of being twice as fast and using half as much memory. (In newer versions of Excel these have roughly the same speed, perhaps due to compiler optimization and pipelining. You can try both methods by changing the code.)

Storing a Sparse Matrix

The way we store the sparse matrix is a lot like Matlab and other programs. We create an Nx1 array that stores the number of nonzero elements in each column. Then we have another array that holds the row numbers of each nonzero element. Then a third matrix contains those nonzero values. (There is yet another matrix with the last nonzero row number for each column.) If we need to get, for example, the value of the 4,3 element, we look at the third column of each of these three matrices. The first matrix tells us if there are any nonzero elements in the third column. (None of the algorithms here bother to do that because every column has nonzero elements.) Then we search for the number 4 in the third column of the second matrix. The lists of row numbers are sorted in ascending order, which accommodates the search. If the number 4 is there in a certain position, we can get the 4,3 element’s value at the same position in the third array. Checking three arrays and searching through row listings and using IF statements is more effort than using a single array, but it is ultimately much faster to multiply columns by each other. A 10×10 output grid has 100 output points and therefore as many rows and columns in its sparse matrix. Multiplying one column by another column means 100 iterations, but this sparse matrix format lets us do it in 21 iterations at most (and always fewer in practice).

Calculating a Projection

We can take advantage of the compressed sparse matrix format to make some shortcuts. Calculating the projection ATA is an exercise in multiplying columns by each other. The sparse format accommodates this because it stores matrix values by column. Suppose we need to multiply Columns 1 and 13 by each other. Also suppose that the first nonzero row in Column 1 is Row 1, but the first nonzero row in Column 13 is Row 13. There is no sense in calculating anything for the first 12 rows because we would just be multiplying a bunch of numbers by zero and adding the results together… which is zero and therefore doesn’t contribute anything. The VBA function is smart enough to recognize that, so it skips straight to Row 13 before it tries anything. Another way to save time is to only multiply numbers when there are nonzero values in both columns at the same time. If Column 1 has nonzero values in Rows 1, 4, 5, 7, 12, 15, and 16 and Column 13 has nonzero values in Rows 13, 14, 15, 16, 17, 19, and 21, then the algorithm recognizes that both columns have the rows 15 and 16 in common. It skips over the other entries and only multiplies the values for Rows 15 and 16. The name of the game is avoiding zeroes, and that second matrix of nonzero row numbers helps us do that because it keeps track of exactly what we need to know.

Reverse Cuthill-McKee Reordering

The purpose of the reordering is to reduce the bandwidth of (most of) the rows in the matrix. (At this point the matrix is symmetric, so rows are columns and columns are rows. i.e. 4,5 is the same as 5,4.) The solver then performs operations on each row, but only in as many columns as it needs to based on the results of the reordering. Ciprian Zavoianu has an excellent tutorial for the Cuthill-McKee Algorithm. As a matter of fact, this software was developed here with the help of Ciprian’s tutorial, and the only difference is that it uses the reverse permutation vector instead of the forward permutation vector. To be fair, there are better, more modern ways to reorder sparse matrices. Check out Tim Davis’s website for details about the latest sparse matrix algorithms and how they are used.

Cholesky Decomposition and Solving the Sparse Matrix

Once the projection is calculated, it is possible to solve the matrix equation using Gaussian elimination and back-substitution. However, a Cholesky decomposition is much faster because it takes advantage of the matrix’s symmetry. When the script performs the Cholesky decomposition, it stores the numbers in a single matrix with a compressed format similar to the one we used in the script for regularizing 2D data (where the first entry in each “row” represents a diagonal element in the actual matrix). The size of that matrix is the maximum bandwidth of the system by the number of output points in the grid. This makes sense because the bandwidth of the matrix tells us how far away the nonzero elements are from the diagonal. Because of the matrix’s symmetry we only need to work with the diagonal elements and the elements on one side thereof.

The source code includes facilities to solve with or without Cholesky factorization. Note that one line is commented out while the other is executable. You can test the speed by commenting out one line or the other and reading the timer output in the debug window after each recalculation.

' Solve the matrix. The solution is in a column vector.
' There are two ways to solve the matrix; the Cholesky method is faster.
'Result = SolveNDiagonalMatrix(A, b, 2 * UBound(OutputDatasetX))
Result = SolveSymmetricMatrixCholesky(A, b, 2 * UBound(OutputDatasetX))

Formatting the Results

When the macro finds the solution to the system of equations, that solution vector needs to be rearranged into a grid that fits on the spreadsheet. ReshapeMatrix() converts the solution vector to a grid. RegularizeData3D() also accounts for transposing or reversing the axes (e.g. if the user decides that positive is downward on the y axis). That may be necessary to ensure that RegularizeData3DDbl() works with everything in ascending order and in the most efficient formulation (with the shortest grid dimension along the x axis because of its effect on the vertical derivatives). By the time VBA returns the solution to the user, everything is rearranged so that it is consistent with the user’s grid layout. Either or both of the x and y axes can be in reverse order, and everything should still work.

Download the VBA Source Code with Examples

This XLSM file has five worksheets with various examples of RegularizeData3D. The two VBA modules contain the source code.
RegularizeData3D with Examples and Source Code.xlsm

Advertisements

10 comments

  1. Hi Kintaar,
    Thank You so Very Much for this Post.
    Its a great Help. I couldn’t believe that it works works so well and the error % w.r.to other techniques is actually lesser.
    Cheers !!!
    & Thanks Again,
    Amit

    • Hi, Amit.

      I’m glad it works for you. Let us know if there is anything else you’d like to see in this software or if you have ideas for other topics you’d like to see on Math for Mere Mortals. (We’re not a musical band, but sometimes we do take requests.) Next I’ll put together a Matlab version of the same thing and possibly some optimization routines to use in a spreadsheet.

  2. Hi Amit,

    this code is very useful. but while using the code for some data i am getting Value! error in cell; can you suggest what that error might be.
    possible solution is data type error but i changed the cell formatting to number but still getting the same error.
    any help is appreciated
    regards
    shreepal

    • @Shreepal
      Does this happen in the example spreadsheet you downloaded from here, or is it a different one? Those formulas should work as-is in the example spreadsheet. Otherwise it could be that the function parameters are not all numeric or that the output x-axis or y-axis values are not all in ascending or descending order.

      • Hello,

        Thank you for this code, it appears very useful however I am getting a similar issue.Do you know what causes this error.

        Regards
        Gabe

      • A lot of things can cause that error, for example anything that causes the macro to bomb out. If the row and column headings are not in ascending or descending order, that can cause a #Value error. Does this happen in the example you downloaded, or is it another one?

  3. Hi Kintaar, I tried your code for a larger matrix 13×13, this works well – however at 14×14 i get #value as result. Am I seing some kind of limitation to your code or could it be something else? If the 14th row and colums has x & y values above the max for the points listed as input data, then there is no problem. BR Christian

  4. Hi Christian. Thanks for your feedback. I have tested it in Excel above 41×41, so it should work on 14×14. Try checking the row and column headers and make sure you enlarged those to the same size as the output grid. Also bear in mind that the input data must fit within the output grid’s domain. For example, if the ouput grid’s x goes from 0 to 10, you cannot give it an input point at x=10.5. The function is trying to map the input points to their x,y locations on the output grid, so they all must fit within that output grid. Let us know if you need any other info.

    • If the function tends to naturally curve toward zero, it is possible for some output points to be negative. There are no restrictions on the range of the output points, and they won’t pass through the input points unless the input points are coplanar.

      On Thu, Oct 26, 2017 at 5:42 PM Math for Mere Mortals wrote:

      >


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s