This is a custom Excel function for regularizing 3D data. It uses the same methodology and arrives at the same results as the spreadsheet that goes with Introduction to Regularizing with 3D Data, but it adds the convenience of a single spreadsheet function that automates the calculation processes without taking up space on your spreadsheet or requiring you to install additional add-ins or DLLs.
The VBA code puts a lot of effort into solving the matrix equations to demonstrate several techniques. For example, it uses a sparse matrix data structure that it reorders by a reverse Cuthill-McKee permutation and solves with a Cholesky decomposition. All calculations are performed in VBA and the example spreadsheet includes full source code.
How To Use RegularizeData3D
RegularizeData3D is an array function like its 2D counterpart, and its syntax is the same except for two more parameters. The first three parameters are the x, y, and z input points that you want to regularize. All three must be the same length, and they must each have at least three spreadsheet cells. The fourth parameter is the stiffness constant. The fifth and sixth parameters are the x and y coordinates that describe the output grid. The output grid is the selected area where you enter the formula, and it needs to have the same dimensions as the x and y coordinates of the output grid.
Step 1
Start with a list of input points and an area where you want the 3D surface data to be filled in. Select that area, cells G7:O15 shown in blue in this example.
Step 2
With all of those cells selected, begin typing the formula RegularizeData3D.
Sometimes you don’t necessarily know how many x, y, and z input points you will have when you type in the formula. In this example we are specifying Rows 6-17 even though only 6-10 have any numbers in them. RegularizeData3D recognizes that the last cells of input points are blank, so it ignores them. That way you can fill them in later without having to change the formula you typed in. (I admit that with 7 parameters that formula is a bit long.)
Step 3
When the formula is complete, press Ctrl+Shift+Enter. The cells that you selected (G7:O15) will automatically fill in with the regularized data.
Summary
This is a custom Excel function that uses the same methodology and arrives at the same results as we saw before in the spreadsheet calculations. However, it also automates the process of solving the resulting matrix equations, which means managing large amounts of data and other details that we didn’t have to deal with before. The VBA code is over 1,200 lines, and it would take a hundred pages to explain everything it does. For that reason the rest of this post covers but a few highlights. The code is heavily commented, and if you have questions about specific parts of it, you are welcome to leave comments below. The function Regularize3DDbl()
is intentionally short and doubles as a list of the calculation steps for regularizing 3D data. If you want to see how it works, this function is a good place to start.
Focus on the Calculations with a Sparse Matrix
Regularizing data is a computationally expensive process because it involves solving a large system of linear equations. The fastest way to solve these kinds of equations and thereby to regularize data is in Matlab, not Excel. (Perhaps in a future post…) But here we are using Excel and VBA, which many regard as the slowest possible way to solve complex math problems other than using slate and chalk.
However, that slowness gives us an opportunity to take a closer look at how to solve a large linear system without the benefit of someone else’s DLL. As with the 2D case we need to devise a strategy to keep track of data in a large, sparse matrix. However, this time it is a lot more complicated. The 2D case always gave us a pentadiagonal matrix, but now that we’re working in 3 dimensions, the size and shape of the matrix depend on the size of the output grid – which the user can choose to be whatever he wants. We need to solve a matrix equation, but Gaussian elimination by itself is too slow.
To make this work, we have to do things differently. For starters we store the individual matrix equations in a compressed format for sparse matrices. Then we calculate the projection ATA and store the result in the same compressed format. The gods proclaim from on high that this matrix is a “positive-definite symmetric, banded matrix.” It is not pentadiagonal like the 2D case, but the important part is that it is symmetric and banded. We can use those properties to speed things up. Reordering the matrix takes advantage of its bandedness (and sparsity). The reordered matrix has a smaller bandwidth (In this context “bandwidth” may also be called “degree.”) than the original version, and the smaller the bandwidth, the faster it is to solve. The Cholesky decomposition takes advantage of the matrix’s symmetry and has the benefits of being twice as fast and using half as much memory. (In newer versions of Excel these have roughly the same speed, perhaps due to compiler optimization and pipelining. You can try both methods by changing the code.)
Storing a Sparse Matrix
The way we store the sparse matrix is a lot like Matlab and other programs. We create an Nx1 array that stores the number of nonzero elements in each column. Then we have another array that holds the row numbers of each nonzero element. Then a third matrix contains those nonzero values. (There is yet another matrix with the last nonzero row number for each column.) If we need to get, for example, the value of the 4,3 element, we look at the third column of each of these three matrices. The first matrix tells us if there are any nonzero elements in the third column. (None of the algorithms here bother to do that because every column has nonzero elements.) Then we search for the number 4 in the third column of the second matrix. The lists of row numbers are sorted in ascending order, which accommodates the search. If the number 4 is there in a certain position, we can get the 4,3 element’s value at the same position in the third array. Checking three arrays and searching through row listings and using IF statements is more effort than using a single array, but it is ultimately much faster to multiply columns by each other. A 10×10 output grid has 100 output points and therefore as many rows and columns in its sparse matrix. Multiplying one column by another column means 100 iterations, but this sparse matrix format lets us do it in 21 iterations at most (and always fewer in practice).
Calculating a Projection
We can take advantage of the compressed sparse matrix format to make some shortcuts. Calculating the projection ATA is an exercise in multiplying columns by each other. The sparse format accommodates this because it stores matrix values by column. Suppose we need to multiply Columns 1 and 13 by each other. Also suppose that the first nonzero row in Column 1 is Row 1, but the first nonzero row in Column 13 is Row 13. There is no sense in calculating anything for the first 12 rows because we would just be multiplying a bunch of numbers by zero and adding the results together… which is zero and therefore doesn’t contribute anything. The VBA function is smart enough to recognize that, so it skips straight to Row 13 before it tries anything. Another way to save time is to only multiply numbers when there are nonzero values in both columns at the same time. If Column 1 has nonzero values in Rows 1, 4, 5, 7, 12, 15, and 16 and Column 13 has nonzero values in Rows 13, 14, 15, 16, 17, 19, and 21, then the algorithm recognizes that both columns have the rows 15 and 16 in common. It skips over the other entries and only multiplies the values for Rows 15 and 16. The name of the game is avoiding zeroes, and that second matrix of nonzero row numbers helps us do that because it keeps track of exactly what we need to know.
Reverse Cuthill-McKee Reordering
The purpose of the reordering is to reduce the bandwidth of (most of) the rows in the matrix. (At this point the matrix is symmetric, so rows are columns and columns are rows. i.e. 4,5 is the same as 5,4.) The solver then performs operations on each row, but only in as many columns as it needs to based on the results of the reordering. Ciprian Zavoianu has an excellent tutorial for the Cuthill-McKee Algorithm. As a matter of fact, this software was developed here with the help of Ciprian’s tutorial, and the only difference is that it uses the reverse permutation vector instead of the forward permutation vector. To be fair, there are better, more modern ways to reorder sparse matrices. Check out Tim Davis’s website for details about the latest sparse matrix algorithms and how they are used.
Cholesky Decomposition and Solving the Sparse Matrix
Once the projection is calculated, it is possible to solve the matrix equation using Gaussian elimination and back-substitution. However, a Cholesky decomposition is much faster because it takes advantage of the matrix’s symmetry. When the script performs the Cholesky decomposition, it stores the numbers in a single matrix with a compressed format similar to the one we used in the script for regularizing 2D data (where the first entry in each “row” represents a diagonal element in the actual matrix). The size of that matrix is the maximum bandwidth of the system by the number of output points in the grid. This makes sense because the bandwidth of the matrix tells us how far away the nonzero elements are from the diagonal. Because of the matrix’s symmetry we only need to work with the diagonal elements and the elements on one side thereof.
The source code includes facilities to solve with or without Cholesky factorization. Note that one line is commented out while the other is executable. You can test the speed by commenting out one line or the other and reading the timer output in the debug window after each recalculation.
' Solve the matrix. The solution is in a column vector.
' There are two ways to solve the matrix; the Cholesky method is faster.
'Result = SolveNDiagonalMatrix(A, b, 2 * UBound(OutputDatasetX))
Result = SolveSymmetricMatrixCholesky(A, b, 2 * UBound(OutputDatasetX))
Formatting the Results
When the macro finds the solution to the system of equations, that solution vector needs to be rearranged into a grid that fits on the spreadsheet. ReshapeMatrix()
converts the solution vector to a grid. RegularizeData3D()
also accounts for transposing or reversing the axes (e.g. if the user decides that positive is downward on the y axis). That may be necessary to ensure that RegularizeData3DDbl()
works with everything in ascending order and in the most efficient formulation (with the shortest grid dimension along the x axis because of its effect on the vertical derivatives). By the time VBA returns the solution to the user, everything is rearranged so that it is consistent with the user’s grid layout. Either or both of the x and y axes can be in reverse order, and everything should still work.
Download the VBA Source Code with Examples
This XLSM file has five worksheets with various examples of RegularizeData3D. The two VBA modules contain the source code.
RegularizeData3D with Examples and Source Code.xlsm
Hi Kintaar,
Thank You so Very Much for this Post.
Its a great Help. I couldn’t believe that it works works so well and the error % w.r.to other techniques is actually lesser.
Cheers !!!
& Thanks Again,
Amit
Hi, Amit.
I’m glad it works for you. Let us know if there is anything else you’d like to see in this software or if you have ideas for other topics you’d like to see on Math for Mere Mortals. (We’re not a musical band, but sometimes we do take requests.) Next I’ll put together a Matlab version of the same thing and possibly some optimization routines to use in a spreadsheet.
Hi Amit,
this code is very useful. but while using the code for some data i am getting Value! error in cell; can you suggest what that error might be.
possible solution is data type error but i changed the cell formatting to number but still getting the same error.
any help is appreciated
regards
shreepal
@Shreepal
Does this happen in the example spreadsheet you downloaded from here, or is it a different one? Those formulas should work as-is in the example spreadsheet. Otherwise it could be that the function parameters are not all numeric or that the output x-axis or y-axis values are not all in ascending or descending order.
Hello,
Thank you for this code, it appears very useful however I am getting a similar issue.Do you know what causes this error.
Regards
Gabe
A lot of things can cause that error, for example anything that causes the macro to bomb out. If the row and column headings are not in ascending or descending order, that can cause a #Value error. Does this happen in the example you downloaded, or is it another one?
Unfortunately, the X and the Y column must have exactly the same range that is to say the same min and the same max value.
The x range is in columns and the y range is in rows. There is no need to constrain them to the same domain, but they do need to be in ascending or descending order. If they are out of order or if any input points are outside of the output domains, that will interfere with the interpolation algorithms. See the examples and try to change the first or last output coordinates to extend them outward. They should allow that without causing an error.
On Wed, Mar 20, 2019 at 5:50 AM Math for Mere Mortals wrote:
>
Hi Kintaar, I tried your code for a larger matrix 13×13, this works well – however at 14×14 i get #value as result. Am I seing some kind of limitation to your code or could it be something else? If the 14th row and colums has x & y values above the max for the points listed as input data, then there is no problem. BR Christian
Hi Christian. Thanks for your feedback. I have tested it in Excel above 41×41, so it should work on 14×14. Try checking the row and column headers and make sure you enlarged those to the same size as the output grid. Also bear in mind that the input data must fit within the output grid’s domain. For example, if the ouput grid’s x goes from 0 to 10, you cannot give it an input point at x=10.5. The function is trying to map the input points to their x,y locations on the output grid, so they all must fit within that output grid. Let us know if you need any other info.
why do I get negative values in the regularized data if all my original data are positive?
If the function tends to naturally curve toward zero, it is possible for some output points to be negative. There are no restrictions on the range of the output points, and they won’t pass through the input points unless the input points are coplanar.
On Thu, Oct 26, 2017 at 5:42 PM Math for Mere Mortals wrote:
>
I know you are mentioning MatLab, and I just got done with my first MatLab class. Not sure if you get paid to do articles about MatLab or not, if so, you can delete this post. But for those who can’t afford MatLab, there is Octave. It uses pretty much all the same code as MatLab, the only difference I saw was it didn’t have category matrix. It is open source, so a free download, if someone doesn’t have the resources. That way, someone is able to explore your code while reading your blog.
These articles focus on the mathematical techniques and you are welcome to implement them in any software you want. We are not partial to any products, nor are we compensated to endorse them. The only endorsement is of math itself, and you are welcome to contribute information about a calculation technique in Octave if you are so inclined.
On Sat, May 5, 2018 at 7:03 AM Math for Mere Mortals wrote:
>
Hello, nice development its working perfectly, I’m surveyor and it helps to increase data of a scattered surfaces. But i have a question, id like to calculate this smooth surface only in a position not in a regular grid, i mean i have input surveyed data (ie corn field) and i want to estimate the elevation for few points with know X-Y, can be calculated instead of a grid, in this kind of jobs if i want to determine a grid will be so big or the distance between nodes will be so big.
i will appreciate if you can help me.
Thanks
PS: until now i interpolate with the neighbours points weighting by the inverse of the distance, as you can imagine the result is so poor
Hi, Joan. The global grid is necessary so that the measured points are consistent with one another over the whole surface, but it is indeed possible to calculate the elevation anywhere inside the grid – not necessarily precisely over a grid point. Because the grid of output points guarantees a mathematically smooth, well-behaved surface, it is possible to interpolate over the output surface. I suggest using bicubic interpolation to estimate the elevation at arbitrary x,y coordinates. You can do this directly in Matlab with the interp2 function or in Excel as shown in this link. https://mathformeremortals.wordpress.com/2014/04/27/cubic-and-bicubic-interpolation-excel-functions/ The regularized output grid and the arbitrary x,y coordinates are the inputs to your interpolation function. The output of the interpolation function is the elevation between the grid points. The benefit of calculating it this way is that the interpolation function will follow the curvature of the smooth surface.
On Tue, Oct 2, 2018 at 3:01 AM Math for Mere Mortals wrote:
>