Is There a Place for Georeferenced User-Generated Content in Science?

Being surprised by an earthquake in Southern California is like being surprised that the sun rose. But surprised I was by the earthquake we experienced last night. Although it was “just” a 4.5 (downgraded from the originally reported 5.0) magnitude quake, and the duration was fairly short (at least where I was), the epicenter was just a few miles away. So it was a good shaker.

After the shaking stopped, I did what I usually do: surfed over to the USGS-Caltech Recent Earthquakes web mapping site.  Within a couple minutes, the earthquake had shown up on the map, which is always useful for getting an instant visual answer to the question “was that shaking a 3.0 underneath my house, or a 7.0 in downtown Los Angeles…?”


“Did you feel it?” Yes I did.

The other interesting feature of this web site is “Did you feel it?”, which collects information from people about the intensity of the quake across the region. This got me thinking again about the benefits and pitfalls of user-generated content. Of course one of the game-changing aspects of the web is that it gives people a mechanism to share information more easily, and there has certainly been an explosion of georeferenced user-generated content in recent years. But can this type of information serve any type of useful scientific purpose?

One of my personal experiences with georeferenced user-generated content has been using Panoramio to post photographs. I like to think I’m a pretty spatially-savvy person, as are the majority people who read this blog. When we go out and do something in the real world, we are the types of people who can usually go back to our desks, bring up a satellite image, and track where we went with near-GPS accuracy. So when I started posting georeferenced photos on Panoramio, I took it seriously. Accuracy was important. One of the first things I noticed was that there was a lot of garbage there—and I’m not talking about the crummy photos, I’m talking about photos placed in the wrong geographic location. Part of this can be attributed to scale: where I would zoom in to a section a Death Valley National Park and try to pinpoint the exact location where a specific photo was taken, it seemed like others were looking at the map at a very different scale, possibly just clicking somewhere within the little polygon that said “Death Valley National Park” to place their photo.

Over time, I started to get emails about the photos I posted on the site. No offers to pay me $5,000 to photograph a wedding, although I did get a couple “nice photo!” comments. But the majority of the comments were questioning my geographic literacy, and frankly were just plain wrong. For a while I took each question seriously enough to re-look at the placement of the photo on the map (I’m a geographer! This is my job!), but didn’t find any errors and after a while just gave up on the whole thing.

I read something recently that stated that all forms of participatory interactivity on the web were doomed, that over time all the idiots and haters raise their profile and become so active that the people using these outlets for positive purposes abandon ship because it’s too much effort to sift through all the crap. That’s a pretty pessimistic view, but I’ve seen this happen on some discussion forums and blogs, MySpace, etc. (what’s next, Twitter, FaceBook, … ?). And I stopped using Panoramio because I was sick of geographically-illiterate people telling me I didn’t know how to identify a location on a map. Sorry.

So what does my experience with Panoramio have to do with last night’s shaker and the USGS-Caltech Recent Earthquakes web mapping app?  I think the creators of the USGS-Caltech app have figured out a way to collect user-generated content in a manner that is potentially useful for scientific purposes.

First, the georeferencing: the application doesn’t ask you to identify your location on the map, like Panoramio does. It asks you for your ZIP Code. You could question the utility of collecting geographic location in this way—for example, the shape of and area covered by individual ZIP Code polygons varies quite a bit; a single ZIP Code polygon can overlay a number of different geological features which could affect the intensity of earthquake propagation in different ways—but at least it’s a consistent and accurate way of collecting location data.

Now perhaps the most interesting part: collecting information on the intensity of the earthquake. The way the user-generated data about earthquake intensity is presented back to web surfers is an average intensity (in numeric form, and also color-coded) for each ZIP Code polygon. But when I fill out the form about my personal experience, it doesn’t ask me to “rate the intensity of the earthquake on a scale of one to five” (which would be very subjective, depending on how sensitive I am, or for example whether I was driving on the freeway or sitting in my house when it happened). It instead walks me through a more objective, structured series of questions spanning six screens. I’ve included some examples below.



On the back end, the app determines how “intense” my experience was as an aggregate of my responses to individual questions. The web-based map is an interesting and useful service, but even more interesting to me is that the answers to the individual questions also form a very useful data set for further scientific analysis.

So back to the original question: Is there a place for georeferenced user-generated content in scientific applications? The potential certainly is there. The key to getting useful data is collecting that data in a structured way, and the USGS-Caltech Recent Earthquakes application serves as an interesting example of one way to do this.