Manfred Ehlers on the Main Challenge of GIScience

Further fodder for the “Is GIS a science?” debate: a brief YouTube video of Manfred Ehlers, Professor for GIS and Remote Sensing and Director, Research Center for Geoinformatics and Remote Sensing (FZG), University of Osnabrueck, Germany. Ehlers sees the main challenge as being “to establish our discipline as an innovative scientific field that we have to show is the equivalent of other new sciences that have evolved over the last 10 to 15 years.” He draws a comparison to computer science, which he says “used to be a part of mathematics. Now computer science is a scientific field in its own right.” The video was recorded at the International Society for Digital Earth (ISDE) meeting in Potsdam, Germany in November 2008.

Geospatial Science and Human Rights

The American Association for the Advancement of Science recently launched its new Science and Human Rights Coalition. Its mission is “to improve human rights practitioners’ access to scientific information and knowledge and to engage scientists in human rights issues, particularly those issues that involve scientists and the conduct of science.” The report mentions the “valuable tools and expertise” scientists have to contribute, including geospatial technologies, and encourages groups such as GIS Corps to continue volunteering.

Aggregated Live Feeds in ArcGIS

I recently spoke with ESRI’s Derrick Burke and Paul Dodd about a methodology they’ve developed to aggregate live data feeds in ArcGIS. 

dburke_pdodd

Derrick Burke (l) and Paul Dodd (r)

Derrick Burke is the Technology Team Lead in ESRI’s Technical Marketing department.  He holds a BA in Geography from SUNY Geneseo, a Masters in Geography, Urban Planning, and Sustainable Development from UNC Charlotte, and an MBA in Finance from the University of Redlands.  Derrick has worked at ESRI for more than eight years, first within professional services as a developer and then in technical marketing focusing on creating prototypes on new technology and presentations.

Paul Dodd is the GIS Systems Administration Team Lead in the Technical Marketing Department. A Computer Science Major at California State University, Paul has more than 25 years of experience in the computer industry working with Mainframe, Mini, and Micro computer systems and software. For more than 11 years at ESRI, Paul has working with ArcSDE in conjunction with various Oracle and Microsoft database products.

You’ve developed a technique you’re calling “Aggregated Live Feeds”…can you tell me more about it?

We have developed a methodology on how to aggregate Internet accessible data (eg. USGS/NOAA xml, GeoRSS, and so on) as services through ArcGIS Server. We load the aggregated data into layers within ArcSDE, expose them as Spatial Views, and serve through ArcGIS server in near real time. Any client that can consume ArcGIS Server services can leverage these services (ArcGIS Desktop, ArcGIS Server, ArcGIS Explorer, etc). This technique was developed in response to the desire to show near real-time data (such as weather data) and analysis within all of our ESRI products.

feedload A running load script.

What are the benefits of using this methodology?

Depending on the feed requirements, data is processed and loaded on the server side every few minutes to every few hours, taking the load off clients. Continuously needing to poll for fresh data can carry a heavy penalty, especially in browser-based applications. Processing feed data on the server side allows the client to poll for data only when needed, via standard ArcGIS Server protocols. Managing these feeds centrally can ease the demand on network resources by using a few systems to download feed content from the Internet, rather than potentially a large numbers of users. Clients can then access these local services as operational layers or fuse them with other base maps and operational content.

How was this built?

Our methodology uses simple batch scripting with a handful of public domain command line utilities to download and pre-process the feed data. The scripts then use ArcSDE command line functions to push this data to the database. The scripts also incorporate logic to track the process, making sure they run as expected. If a load should fail, an alert email is sent to an administrator. Scripts run every 5 minutes, 30 minutes, once an hour, or once a day depending on the appropriate need. The current methodology can handle shapefiles, raw ASCII (like CSV and some custom formats), and XML (like RSS, GeoRSS, and CAP). There are even utilities that allow the scripts to handle de-compressing files. Once the data is pushed into ArcSDE, ArcGIS Server services are authored and served, monitoring/notification on the availability of these services are provide through the service monitor available on arcscripts.esri.com.

noaa1Weather data feed.

Can you give the readers of this blog some examples of how could this methodology be used for scientific applications?

Sure, these aggregated live feeds have been used in applications ranging from homeland security to environmental analysis. For example, in one of our latest demonstrations a light weight browser application will call an analysis ArcGIS Server geoprocessing service to perform plume modeling (eg. a contaminant leak) based upon an aggregated ArcGIS Server service containing the latest wind velocity and direction. The analysis produces a plume which can then be chained to other ArcGIS Server analysis such as identifying the demographics of the area.

Do you plan on sharing this with the ESRI user community?

We plan to post this methodology on ESRI’s ArcScripts, blogs, and the new resource center. We’ve had many clients interested in doing this and in fact some clients have created their own tools that perform similar functions.

Any possibility this will lead to new functionality in ArcGIS?

This methodology was designed to work with previous, current, and future releases of our software because it’s more of a methodology than customizing our core tools. Because this idea is becoming more popular, the development team is thinking about supplying aggregator tools as part of the core software, but that project is still in design.

Transforming our Economy with Science and Technology

Just released, a new summary of the current draft of the American Recovery and Reinvestment Bill, popularly known as the “Stimulus Package.” I’m pasting below the entire section titled “Transforming our Economy with Science and Technology.” 

We need to put scientists to work looking for the next great discovery, creating jobs in cutting-edge technologies and making smart investments that will help businesses in every community succeed in a global economy.

Broadband to Give Every Community Access to the Global Economy
Wireless and Broadband Grants: $6 billion for broadband and wireless services in underserved areas to strengthen the economy and provide business and job opportunities in every section of America with benefits to e-commerce, education, and healthcare. For every dollar invested in broadband the economy sees a ten-fold return on that investment.

Scientific Research
National Science Foundation: $3 billion, including $2 billion for expanding employment opportunities in fundamental science and engineering to meet environmental challenges and to improve global economic competitiveness, $400 million to build major research facilities that perform cutting edge science, $300 million for major research equipment shared by institutions of higher education and other scientists, $200 million to repair and modernize science and engineering research facilities at the nation’s institutions of higher education and other science labs, and $100 million is also included to improve instruction in science, math and engineering.
National Institutes of Health Biomedical Research: $2 billion, including $1.5 billion for expanding good jobs in biomedical research to study diseases such as Alzheimer’s, Parkinson’s, cancer, and heart disease – NIH is currently able to fund less than 20% of approved applications – and $500 million to implement the repair and improvement strategic plan developed by the NIH for its campuses.
University Research Facilities: $1.5 billion for NIH to renovate university research facilities and help them compete for biomedical research grants. The National Science Foundation estimates a maintenance backlog of $3.9 billion in biological science research space. Funds are awarded competitively.
Centers for Disease Control and Prevention: $462 million to enable CDC to complete its Buildings and Facilities Master Plan, as well as renovations and construction needs of the National Institute for Occupational Safety and Health.
Department of Energy: $1.9 billion for basic research into the physical sciences including high-energy physics, nuclear physics, and fusion energy sciences and improvements to DOE laboratories and scientific facilities. $400 million is for the Advanced Research Project Agency – Energy to support high-risk, high-payoff research into energy sources and energy efficiency.
NASA: $600 million, including $400 million to put more scientists to work doing climate change research, including Earth science research recommended by the National Academies, satellite sensors that measure solar radiation critical to understanding climate change, and a thermal infrared sensor to the Landsat Continuing Mapper necessary for water management, particularly in the western states; $150 million for research, development, and demonstration to improve aviation safety and Next Generation air traffic control (NextGen); and $50 million to repair NASA centers damaged by hurricanes and floods last year.
Biomedical Advanced Research and Development, Pandemic Flu, and Cyber Security: $900 million to prepare for a pandemic influenza, support advanced development of medical countermeasures for chemical, biological, radiological, and nuclear threats, and for cyber security protections at HHS.
National Oceanic and Atmospheric Administration Satellites and Sensors: $600 million for satellite development and acquisitions, including climate sensors and climate modeling.
National Institute of Standards and Technology: $300 million for competitive construction grants for research science buildings at colleges, universities, and other research organizations and $100 million to coordinate research efforts of laboratories and national research facilities by setting interoperability standards for manufacturing.
Agricultural Research Service: $209 million for agricultural research facilities across the country. ARS has a list of deferred maintenance work at facilities of roughly $315 million.
U.S. Geological Survey: $200 million to repair and modernize U.S.G.S. science facilities and equipment, including improvements to laboratories, earthquake monitoring systems, and computing capacity.

Volunteered Geographic Information

Over the last year, my colleague Jim Baumann and I have had numerous early morning hallway conversations about the utility of volunteered geographic information. Jim recently interviewed Prof. Michael Goodchild about volunteered geographic information, and the interview is definitely worth reading.

So what exactly is volunteered geographic information? Goodchild gives a good example in the interview: “Names that are not officially recognized, such as ‘downtown Santa Barbara,’ and names that are meaningful to local communities, such as ‘the Riviera’ [the hilly area of Santa Barbara north of downtown], do not appear in any gazetteer. …[P]lace-names are one of the most successful forms of volunteered geographic information, and people are clearly willing to spend time providing them to Web sites. Volunteered gazetteers can provide much richer descriptive information than before; allow features to have multiple names; and include names for the smallest, least significant features.”

Goodchild points out that accuracy of volunteered geographic information, as with all types of user-generated content, is an issue. But he is more concerned about the challenges of preservation. “National mapping agencies can devote significant resources to preserving place-names, ensuring that future generations have access to today’s data, but no such mechanisms exist for volunteered geographic information.”

Those interesting in finding out how they might participate by volunteering some geographic information should check out the resources on the PPgis.net web site.

Is There a Place for Georeferenced User-Generated Content in Science?

Being surprised by an earthquake in Southern California is like being surprised that the sun rose. But surprised I was by the earthquake we experienced last night. Although it was “just” a 4.5 (downgraded from the originally reported 5.0) magnitude quake, and the duration was fairly short (at least where I was), the epicenter was just a few miles away. So it was a good shaker.

After the shaking stopped, I did what I usually do: surfed over to the USGS-Caltech Recent Earthquakes web mapping site.  Within a couple minutes, the earthquake had shown up on the map, which is always useful for getting an instant visual answer to the question “was that shaking a 3.0 underneath my house, or a 7.0 in downtown Los Angeles…?”

yesidid

“Did you feel it?” Yes I did.

The other interesting feature of this web site is “Did you feel it?”, which collects information from people about the intensity of the quake across the region. This got me thinking again about the benefits and pitfalls of user-generated content. Of course one of the game-changing aspects of the web is that it gives people a mechanism to share information more easily, and there has certainly been an explosion of georeferenced user-generated content in recent years. But can this type of information serve any type of useful scientific purpose?

One of my personal experiences with georeferenced user-generated content has been using Panoramio to post photographs. I like to think I’m a pretty spatially-savvy person, as are the majority people who read this blog. When we go out and do something in the real world, we are the types of people who can usually go back to our desks, bring up a satellite image, and track where we went with near-GPS accuracy. So when I started posting georeferenced photos on Panoramio, I took it seriously. Accuracy was important. One of the first things I noticed was that there was a lot of garbage there—and I’m not talking about the crummy photos, I’m talking about photos placed in the wrong geographic location. Part of this can be attributed to scale: where I would zoom in to a section a Death Valley National Park and try to pinpoint the exact location where a specific photo was taken, it seemed like others were looking at the map at a very different scale, possibly just clicking somewhere within the little polygon that said “Death Valley National Park” to place their photo.

Over time, I started to get emails about the photos I posted on the site. No offers to pay me $5,000 to photograph a wedding, although I did get a couple “nice photo!” comments. But the majority of the comments were questioning my geographic literacy, and frankly were just plain wrong. For a while I took each question seriously enough to re-look at the placement of the photo on the map (I’m a geographer! This is my job!), but didn’t find any errors and after a while just gave up on the whole thing.

I read something recently that stated that all forms of participatory interactivity on the web were doomed, that over time all the idiots and haters raise their profile and become so active that the people using these outlets for positive purposes abandon ship because it’s too much effort to sift through all the crap. That’s a pretty pessimistic view, but I’ve seen this happen on some discussion forums and blogs, MySpace, etc. (what’s next, Twitter, FaceBook, … ?). And I stopped using Panoramio because I was sick of geographically-illiterate people telling me I didn’t know how to identify a location on a map. Sorry.

So what does my experience with Panoramio have to do with last night’s shaker and the USGS-Caltech Recent Earthquakes web mapping app?  I think the creators of the USGS-Caltech app have figured out a way to collect user-generated content in a manner that is potentially useful for scientific purposes.

First, the georeferencing: the application doesn’t ask you to identify your location on the map, like Panoramio does. It asks you for your ZIP Code. You could question the utility of collecting geographic location in this way—for example, the shape of and area covered by individual ZIP Code polygons varies quite a bit; a single ZIP Code polygon can overlay a number of different geological features which could affect the intensity of earthquake propagation in different ways—but at least it’s a consistent and accurate way of collecting location data.

Now perhaps the most interesting part: collecting information on the intensity of the earthquake. The way the user-generated data about earthquake intensity is presented back to web surfers is an average intensity (in numeric form, and also color-coded) for each ZIP Code polygon. But when I fill out the form about my personal experience, it doesn’t ask me to “rate the intensity of the earthquake on a scale of one to five” (which would be very subjective, depending on how sensitive I am, or for example whether I was driving on the freeway or sitting in my house when it happened). It instead walks me through a more objective, structured series of questions spanning six screens. I’ve included some examples below.

eq2

eq4

On the back end, the app determines how “intense” my experience was as an aggregate of my responses to individual questions. The web-based map is an interesting and useful service, but even more interesting to me is that the answers to the individual questions also form a very useful data set for further scientific analysis.

So back to the original question: Is there a place for georeferenced user-generated content in scientific applications? The potential certainly is there. The key to getting useful data is collecting that data in a structured way, and the USGS-Caltech Recent Earthquakes application serves as an interesting example of one way to do this.

Geospatial Technology and the Citizen Scientist

“The purpose of the GIS and Science blog is to provide news, resources, commentary, and interviews on the use of GIS technology by the scientific community and for scientific applications.” When I originally wrote that, it was very carefully worded for a reason: scientists are not the only people doing science.

There are a lot of different ways to slice and dice the demographic makeup of the GIS and Science blog audience. Here’s one:

• Scientists: People doing science as a full-time job.
• Professionals Doing Science: Science is not their job, but it’s a component of their job.
• Citizen Scientists: People who have an interest in strong interest in science, but it’s not part of their job.

Looking at the citizen scientist in particular, words that come to mind are hobby; entertainment; volunteer; and amateur. The word “amateur” should really be taken with a grain of salt: citizen scientists can and do make important contributions to various fields of study.

Some citizen scientists work just fine all alone. These self-directed types might very well be in their garages developing “the next big thing.” But more often they are networked, working together with fellow citizen scientists. And this is where they become a powerful force to be taken seriously within the scientific community. Scientists, and “professionals doing science,” often are the ones organizing these networks; they realize the great value a group of eager volunteers can bring to a project.

A good, although somewhat controversial (depending on your belief in intelligent extraterrestrial life) example of a mass of volunteers carefully organized to work on an overwhelmingly humongous project is SETI@home.  As a volunteer, you download some software that utilizes the “idle time” on your home computer to scan through reams of radio telescope data and search for signs of extraterrestrial intelligence. If nothing else, it has served as a model for bringing large numbers of volunteers (more than five million participants worldwide) together to work collectively on a massive task.

Closer to home, CPDN and APS@home are two distributed computing projects with an earth science spin. CPDN is investigating how small changes affect climate models. APS@home is looking at atmospheric components of climate change. Although public participation in both CPDN and APS@home is not nearly at the same scale as SET@home, the potential is certainly there.

Is there an opportunity for the citizen scientist to leverage geospatial technologies in their quest for knowledge, entertainment, and contributing to society? Absolutely. With the relatively recent arrival of powerful (and free!) geospatial visualization tools such as Google Earth, ArcGIS Explorer, and NASA World Wind, it’s easier than ever for the citizen scientist to have some fun with maps while making a potentially important scientific contribution.

Amassing large numbers of volunteers to work on geospatial problems such as climate change is already taking place as shown by the CPDN and APS@home examples. What is needed next is something at a much larger scale, where not just physical, but also biological, social, cultural, economic, and political data and models are integrated to give a more accurate depiction of the complexities inherent in the anthropogenic Earth.

First we need to create an environment that successfully brings together a plethora of data sources and modeling systems—a noble vision for GIS, but not something to be tackled by citizen scientists. Once the data and technology is in place, and a clear framework is established, then comes the opportunity to organize a large group of volunteers who would do the “grunt work” of tackling one of the biggest challenges facing us.

Imagine a framework where tens or even hundreds of thousands of citizen scientists log in to a web site and download geospatial data sets and work task lists, then using a focused desktop geospatial application they also downloaded, they run different analysis and modeling scenarios as defined in the task list…then upload the results of their analysis back to the main data repository.

If properly structured and managed, such a project could significantly advance our understanding of the planet. At this scale, it would be difficult if not impossible to pull off without the participation of citizen scientists. They are out there, anxious to help… just waiting for us to create the framework.