Calculating Area-Weighted Root Mean Square Error


The area-weighted root mean square error is very useful for determining how closely variables match one another while taking into account a normalization factor – in this case, area. We often normalize by area in GIS and cartography in order to better compare one analysis unit with another. Think about a map of U.S. states: the states are such vastly different sizes that most variables, such as incidence and population, are not comparable from state to state if you simply use the raw number. Instead, you must divide the variable by the size of the state to provide an adequate comparison across states.

In yesterday’s post about using ArcMap in a creative way to make a scatterplot, you can see that area is a significant factor in my research on watersheds. For some context, these are small basins in the Pacific Northwest that we are analyzing to determine what their current impervious surface percentage is with two different datasets. One dataset is actual imperviousness as measured by 1-meter NAIP imagery analysis. The other is a derived dataset using landuse codes from tax assessor parcels to predict what current imperviousness is. Those predictions are actually based on that initial dataset, the 1-meter imperviousness, where we came up with average coefficients for how much impervious, on average, is in each landuse group.

To figure out how close they come to being the same, the closer the better, I plotted the values in a scatterplot. However, it would be nice to get an actual measure, and that’s where the area-weighted root mean square error (or RMSE) comes in.

To calculate it, I added up the total area in all the basins first. Then for each basin you determine the difference between the two variables, in other words subtract the value for one from the other. I did all this in Excel. You square those differences in another column, then in another column multiply that answer by the area of the basin. You could definitely do all this in one column, I just liked to see them separately.

Sum that last column, divide the sum with the total area of all the basins, then take the square root of that value. I presented the value as a percentage, so I multiplied this by 100.

When I used 5-meter impervious data I got an area-weighted RMSE of 1.97% and when I used the 1-meter impervious data I got an area-weighted RMSE of 1.05%. That’s really great because it means that the 1-meter data gets me more precision for the model. It still doesn’t definitely tell me how close to accurate I am getting, however, so that’s the next thing for me to explore. There’s always something!

*I’d like to thank the preeminent William Huber for suggesting these analytical procedures a few years ago when I first started doing buildout studies. Note to other solo-consultants: hiring experts in statistics and other fields to review and advise is a small expense to pay to ensure that your project is of top quality.

Comments are closed.