Archive for category Analysis
Farmers’ Market Study
A few months ago I took a look at the USDA dataset on farmers’ market locations (data date: 3/15/2011) as part of some exploratory analysis. I took some time out from client work and book writing to map out the data along with obesity rates.
The results of the farmers’ market analysis are reported over on gislounge.com:
Study Finds Three Times More Farmers’ Markets in Areas with the Lowest Obesity Rates.
Remember, There are many factors that affect obesity (see this interactive map we did for Urban Mapping to look at some variables and also this discussion about other factors).
What Should Managers Do With Their Spatial Data?
When a manager is faced with a heap of spatial data but doesn’t know how to make sense of it, let alone how to drive business-critical decisions with it, then it might as well not exist at all. However, it is in their best interest to break into that vault of information and become more wealthy* as a result.
Most GIS consulting clients already know about GIS in some way. They know enough to ask the consultant what they want and have an idea of what data they might need to get there. But there are a lot of business executives and managers who have spatial data but don’t know a thing about GIS who could benefit highly from the skills that GIS consultants can offer. Sometimes this comes about by reading about a GIS study that a competitor undertook, in a trade-magazine. But short of that, without a small amount of GIS knowledge a manager may have a tough time figuring out what to do with spatial data, even if there’s an understanding that it would be helpful to them.
Here’s where to start for a new data manager or executive:
1) If there’s an in-house data staff, ask for a debriefing that focuses only on the what. What you want is a high-level presentation that tells you what is in the data storage vaults at your organization. Ask if any of it is spatial data. If there’s a dedicated GIS team, obviously the presentation can focus only on the spatial data component. Be sure to inquire how it all fits within the larger context. It’s also helpful to know the history – why the data are collected, for example.
2) In very small organizations it is entirely possible that nobody knows what data is available. Sometimes data is hoarded by individuals for their particular purposes and are not shared with others. To get a handle on these datasets, a survey or individual talks will have to suffice to gather the requisite information.
3) At the very least, visualize the data! Spatial data is meant to be seen. Map it out. Get a cartographer to explore the data and make it meaningful. In this case we aren’t talking about full-fledged analysis, just maps of what’s available.
4) Now that you know what data exists, what part of it is spatial, and have seen it mapped out, you can start to explore analysis possibilities. The most basic way to do this is to think about how those data (combined with other data that may or may not exist yet, or that you may have to get elsewhere) can answer business-critical questions, drive innovation, or add value in some other way. If, through this thinking, even a small hint of a possibility arises…
5) Get your GIS team, your data team, or your GIS consulting team (if you don’t have one – get one) to explore the idea for you. Questions to ask: is my idea feasible with the data we have? are there other ideas that are related to mine that are feasible? what other data might we need?
*Wealth is money, time, efficiency, and/or doing good.
Introduction to Classifying Map Data
Posted by Gretchen in Analysis, Best Practices on January 25, 2012
Step 1) Determine if there is a standard for data classification that you want to use. For example, analyses on impervious surface in the Northwest using 30 meter resolution data are often split into class-breaks of 5%, 10%, and 20% due to these being important breakpoints for environmental degradation in the Northwest*. Likewise, if there are a set of intuitive classes that make sense for the visualization, use those. Otherwise proceed to Step 2.
Step 2) Graph the data values. Determine if the data are skewed or normally distributed.
Step 3) Consult this chart as a starting point.
Step 4) Read more about classifications in a GIS text. Other considerations when classifying data include whether or not to normalize the data and whether or not the data might be suitable for classification by spatial proximity.
*However, when using finer resolution data, we’ve found that these values may not be applicable.
Demand for GIS Analysts on the Rise?
Many thought that by the year 2012, GIS would not be a profession anymore. After all, it was more than 10 years ago that ArcView 3.x was released, a product which many thought was ushering in a new age of user-friendly GIS software that anyone could understand and use. It did turn out to be true that major improvements in the GUIs to GIS software (in open source land too) would make them easier to use. But along with those improvements came more analytical tools to understand, larger and more complex datasets to crunch, and a higher expectation for decent cartography.
These changes kept GIS professionals employed as long as they continued to be proficient in the skills listed above. And, according to a new McKinsey Institute report, GIS professionals may continue to be in demand for many years. Their take on it is that data analysts will be in major demand in the near future, which one can safely assume will include GIS analysts(see note at the end of the article), because organizations:
1) have a lot of data, and
2) that data is increasingly “an important factor in production.”
Another interesting take-away from the report is that, “Leaders in every sector will have to grapple with the implications of big data, not just a few data-oriented managers.” This implies a change in the clientele for GIS businesses. Perhaps GIS businesses will no longer deal with mid-level “data-oriented managers” who very often have the ability to say no but not the power to say yes, and instead be able to deal directly with the senior-level people in the larger organizations. This could catapult the field into a whole new level of importance, not only within those organizations, but also in terms of what it can achieve.
The report also cautions that there will be a shortage of workers, perhaps in the amount of 140,000 to 190,000 by the year 2018, who posses advanced analytical skills. An even larger shortage is predicted in management positions, where a sufficient skill set to enable the understanding of the potential of these large datasets will be needed. GIS analysts will surely be a significant subset of the workers needed to fill these gaps.
*Spatial analysis is mentioned in the report as one of the many techniques for analyzing “big data” on page 30.
How To Create an Inside Buffer Based on Elevation
This post describes how to create a buffer inside a polygon, of a variable-width, that is based on an increase or decrease in elevation from the perimeter of the polygon. One of the potential uses for this technique, though there are others, is flood modeling.
1) Prepare the elevation data by converting to an integer grid. Most elevation data is in floating point while the Euclidean allocation algorithm usually needs an integer grid to work properly. To retain specificity of vertical units, multiply by an appropriate amount such as 100 to retain two decimal places. Remember to re-divide by the same amount before subtracting or adding from the elevation grid again (step 6) – or just subtract/add by the elevation integer grid.
2) Convert the polygon to polylines. Here is an example of a portion of a polygon that needs buffering:
3) Convert the polylines to a raster grid using the same cell size and extent as the elevation data that you’ll eventually be using. For now it doesn’t matter what the values of the cells are.
4) Use the raster from step 3 as a mask on the integer elevation data so that you have a one-pixel wide raster of the elevations at the perimeter of the polygon.
5) Fill in all the cells inside the perimeter with the value of the nearest perimeter elevation by running a Euclidean allocation. You get a bonus for 70’s style colors.
6) If the aim is to decrement by a certain elevation, do an overlay calculation comparing your original elevation data with the Euclidean allocation data. Pixels where the elevation data are less than the Euclidean allocation data by a specified amount (1 meter, for example) would be assigned to an output dataset while the other pixels would be null. If you are using ArcGIS, you can do a CON raster calculator expression, followed by a reclassification to achieve this. Note that in this example you can see that the original polygon perimeter must not have followed an isoline and therefore the result is not a mostly parallel line. If the original perimeter was an isoline then it would have been easier to use contours instead.
Note: to create a raster grid showing cells that emanate out from a location to a certain elevation, as opposed to filling inward, a cost distance calculation is needed. See Simulate a Flood for more information.
What Data Have You Been Working With Lately?
Some of the datasets I’ve been working with lately include:
- NAIP, 1 meter, 4 band imagery – A colleague classified 3.5 county’s worth of NAIP images into between 4-7 categories and handed it to me to reclassify into “trees” and “not trees” pixels. Though I was not asked to do an error analysis, I loathe using classified imagery without a formal error analysis, so I did one. With 20 randomly chosen pixels in each county (since they were classified separately) checked by-eye to see if they were correctly identified or not, we got a 94% concurrence. That is an excellent error rate. Another error-check that should be done, however, is to randomly choose 20 non-forest pixels in each county to determine concurrence since the original error-analysis was heavily weighted toward “tree” pixels given the huge percentage of trees in the study area. That will be one of my next tasks if I have the time to undertake it.
- NOAA CCAP, 30 meter, landcover – This dataset covers the coastal regions of the U.S. but was problematic for my project’s needs in that it has a “Palustrine Forested” category whereas we wanted to know specifically what type of forest (coniferous, deciduous, mixed) that those pixels represented. The NOAA people were very responsive and sent me the Landsat mosaics that were used to produce each of the four CCAP year’s worth of data (1992, 1996, 2001, and 2006) so that I could mask out those palustrine forested pixels and reclassify them using a supervised classification. While there is very little way to error-test the results because the data are at least 5 years old, some visual assessment of the 2006 results showed a decent amount of concurrence with what we know to be true on-the-ground right now.
- Regions – I’m currently involved in a fun project involving by-eye digitizing, at a high resolution, some logically drawn regions (some might call these “territories”) based on demographics and existing political boundaries, but weighted more toward demographics and travel corridors when they cross political boundaries. This is a very fun exercise in the sense that it gives a level of geographic awareness that is only possible when immersed in such a task.
So…what data have you been working with lately?
Recent Comments