Archive for February, 2014
In Our Defense
I bill myself as a data scientist. After all, 50% of any GIS or cartography project, in general, involves data wrangling. Knowledge of statistics and geo-specific analytics is imperative to getting complex maps right. Of course, as with many tech fields, tools are always changing and there always seems to be something new to learn.
However, I take issue with this little snippet in Sunday’s NY Times from David J. Hand. When speaking about geographic clusters* he wags his finger at us and pontificates, “…if you do see such a cluster, then you should work out the chance that you would see such a cluster purely randomly, purely by chance, and if it’s very low odds, then you should investigate it carefully.” See the short article here.
Granted, he’s probably reacting to the surfeit of maps that have been circulating the internet claiming to prove this, that or the other, when in fact they are mostly bogus. For example, Kenneth Fields tweeted this abomination this morning:
#McCartoCrap ~2500 years of cartography and this RT @Amazing_Maps: what a time to be alive pic.twitter.com/CnzVHLW26w
— Kenneth Field (@kennethfield) February 24, 2014
Jonah Atkins has created a github location for sharing remedies to bad maps like the above called Amazing-Er-Maps (this is itself in reaction to the name “Amazing Maps,” which has been given to a twitter account that showcases maps of questionable quality at times.)
Amazing-Er-Maps, as I understand it, is a place for you to upload a folder that contains the link to a bad map and a new map that is similar but does a better job. You include the data and the map as well as any code that goes with it. It’s a fabulous idea. Don’t just complain about bad maps, seek to make them better in a way that the whole community can gain inspiration from and learn from. Check it out, Jonah’s already got it going with several fun examples. Super warm-fuzzies.
Circling back to Mr. Hand, he has a point: we need to apply sound statistical and mathematical reasoning to our datasets and the maps we make from them. For example, when I was helping the Hood Canal Coordinating Council map septic system points, I didn’t just provide maps for them to visually inspect for clusters of too-old septics, I produced a map of statistically significant clusters of the too-old septics using hierarchical nearest neighbor clustering, which provides a confidence level for the chance that the cluster could be random.
The point is, those who are already practicing sound data mapping practices don’t like to be lumped in with the creators of maps that are produced–let’s face it–as sensational products. Our little map community is challenging those bad maps out there, creating great ones for our clients and bosses, and continuing to learn to make them better. Give us a bit more credit here and check out some of the really amazing things we’ve done.
*On an exciting note, “geographic clusters” makes main-stream news media!
Pairwise Primer
Note: This is a really basic primer that completely leaves out all the math behind this technique. I built a spreadsheet that does all the calculations several years ago that you could modify. Ask me if you want it…
Creating a GIS decision model often involves weighting criteria in order to reflect its relative contribution to the model or its effect on the variable being measured. To do this, we usually start by ranking the inputs to the model in order of importance, then we try to set some weights according to our ranking. The process of choosing the ranking and weights can be decided by one person or a group of people. Essentially the process winds up being a “whoever shouts out the loudest wins” kind of thing as opposed to a disciplined scientific ranking based on facts.
For example, let’s say you have a simple erosion model with some inputs: aspect, slope, and soil type. It’s tempting to use your subjective reasoning, based on intuition and prior experience, to give a weight of, say, 40% to slope, 10% to aspect, and 50% to soil type (I’m completely making these up). But do we know for sure that slope should be 40% and not perhaps 45%? While it isn’t possible to get around some subjectivity, it is possible to do this in a more rigorous manner.
I’ll go over the basics of the pairwise comparison method here (for a very thorough discussion and implementation plan, you must read GIS and Multicriteria Decision Analysis by Malczewski.) Essentially, you take all of the model’s criteria that you want to weight, put them in a matrix where they are repeated on both the horizontal and vertical axes, then fill in the cells where they meet with numbers representing their relative importance. The brilliance of this is that you are only comparing two criteria at a time instead of trying to rank the whole list at once.
For each comparison you ask yourself or your team: Is X criterion way more important relative to Y criterion, somewhat more important, or of equal importance? That’s essentially what you ask yourself, though when you run an actual pairwise comparison you use 9 gradations of importance running the gamut from way more important to equal importance, with 1 being that the two are of equal importance and 9 being that the horizontal criterion is of extreme importance relative to the vertical criterion.
Once all of the numbers are filled in to the matrix you run a bunch of calculations on them and at the end of it all you get an ordered list of criteria with weights for each. But wait, there’s more! You also get to do a test with the numbers to make sure they are significantly different from one another. If, for example, you have filled in the matrix with all the pairs being essentially equal to each other, then the test will fail. In other words, you need to have criteria that are different enough in importance to make a ranking worthwhile. Otherwise all the criteria should just go into the model without being weighted at all.
If this sounds like the path you want to follow for your next modeling endeavor, check out the Malczewski book linked above and follow his calculations. I recommend creating a spreadsheet with all the calculations in it so that you can go back and change your comparisons if you find that you need to.
Recent Comments